Blog

Handling Affiliate Traffic Data

A concern many applications face today specifically in affiliate tracking software is the processing of big amounts of data that depends on the user’s behavior. There are cases where fast data storage services such as Redis and Couchbase are sufficient to store and aggregate data.

However, when there is a sudden spike in traffic (which is the case in performance marketing) that causes a lag in the aggregation of data, this could pose a problem in terms of business logic such as with an affiliate marketing software. There are a few other solutions out there that can process huge amounts of data in real-time.

In this string of articles I will be presenting and explaining such solutions that provide the right tools and capabilities to handle these situations. All articles will revolve around the resources provided by the solution as well as its autonomy.

Today’s focus will be AWS Kinesis, a serverless service that offers a pack of tools that allows developers to stream processed data, receive events in an efficient manner and aggregate data in real-time. All this while also storing or cooperating with other applications.

So where to start? The first and simplest step is to create a producer that will connect with AWS Kinesis Streams and begin storing the data into it. A stream is composed of shards which in layman’s terms are like partitions that scale horizontally. The streams can also store events that can be processed in the future and can stream data in the following outputs: 

  • Dedicated consumer (Customer software written with AWS SDK)
  • AWS Kinesis Analytics
  • AWS Kinesis Firehose
  • AWS Lambda
  • Many others

It is of great importance to note that with AWS Kinesis Stream:

  • There is no limit of streams
  • One stream can have up to 500 shards in regions US East (N. Virginia), US West (Oregon), and Europe (Ireland), in others 200 shards
  • One shard can receive 1000 records/s or 1 MB/s
  • The maximum payload per message is 1MB
  • Each shard can have 5 concurrent read transaction, where the transaction can read at ones 10’000 records
  • Each shard has a maximum read rate of 2 MB/s

Based on the above information we can build a case around 5,500 records per second. For this instance, 6 shards would be sufficient though we also need to keep in mind, when all events are sent to one shard while the others are dormant and causing the dreaded Hot Shard. 

This occurs when the application sends the same ID as a message and traffic would not be distributed evenly between the shards. Since we can’t control which shard is used we at Omarsys prevent this by tagging a unique ID and thus we are able to manage which shard is responsible to fetch  specific events. 

Now that I’ve explained AWS Kinesis streams in a nutshell, the fun begins with the next few steps. At Omarsys we decided to use an AWS integrated solution, AWK Kinesis Firehose as this produces the nearest to the real-time solution. When Firehose is connected to the stream, it gathers the events into batches and then automatically puts them into one of the following targets:

  • AWS S3
  • AWS Redshift
  • AWS ES
  • Splunk
  • Or dedicated HTTP endpoint

This is a simple and cost-effective method to backup all the events without using development resources.

AWS Kinesis Analytics is another added bonus that we use. It’s a tool that provides the possibility to filter, aggregate, and place further events within other tools such as Lambda or other streams. It provides two kinds of aggregations, one is through a typing method that is similar to the SQL language queries whereas the other is the Apache Flink application.

This is how Omarsys currently handles big data, this short introduction is the start to a series of articles as to how we at Omarsys use AWS Kinesis. In the world of affiliate marketing there is a lot going on and in the articles coming up, I will be going in-depth about Firehose and Analytics as well as show simple examples of how to realistically use these tools in real-life issues that we face. In addition, I will also compare Kinesis to more independent solutions such as Apache Kafka.

Till next time, Happy Coding!

Share This Post

Lukasz Jaworski

Lukasz Jaworski

Newsletter

Subscribe to our newsletter for news and articles!

We promise we won't spam you.

Related Articles

austin-distel-wD1LRb9OeEo-unsplash-1024x768

My Experience Joining Omarsys Team

Over the last 4 years, I have built my career in Sales, working for international companies. I count myself fortunate to have worked for some great employers, with great and talented colleagues. Just over four weeks into a new role as Sales and Marketing Manager for Omarsys, I can already see the calibre of business…

fotis-fotopoulos-LJ9KY8pIH3E-unsplash-1024x683

Omarsys Quality Assurance Testing

       As a Quality Assurance Engineer, I have gained professional experience in companies operating in various industries such as railway, cryptocurrency and online tracking. One of their commonalities is a fact, that customers expect a high quality product, on a par with functionality. Omarsys is aware of that. When I joined the team a year…

artificial-intelligence-3382507_1920

Handling Affiliate Traffic Data

A concern many applications face today specifically in affiliate tracking software is the processing of big amounts of data that depends on the user’s behavior. There are cases where fast data storage services such as Redis and Couchbase are sufficient to store and aggregate data. However, when there is a sudden spike in traffic (which…