Hotstar is the leading streaming media & video-on-demand service in India with a user base of approx. 200 million users.

It lets users stream popular shows in several different languages & genres over the web. But the primary, the most popular feature of the service is the streaming of live cricket matches. This is the real deal.

In the latest edition of IPL T20 Cricket tournament, the streaming platform received whooping record traffic of 10.3 Million concurrent users smashing its previous record of 8.26 Million concurrent users.

To my knowledge this is the most intense concurrent traffic load on any service, surpassing Fortnite which clocked a load of 8.3 million concurrent players in recent past.

For a full list of all the real-world software architecture posts on the blog here you go.


1. Technical Insights

The backend of the app is powered by AWS Amazon Web Services the CDN partner is Akamai

All the traffic is served by the EC2 instances & S3 Object store is used as the data store.

The services use a mix of on-demand & spot instances to keep the costs optimized.

Running machine learning & data analytics algorithms on spot instances helps the business cut down costs by quite an extent.

Terabytes of data in double digits is generated in any regular day and is processed by the AWS EMR Clusters.

AWS EMR is a managed Hadoop framework for processing massive amounts of data across the EC2 instances. Other popular frameworks like Apache Spark, Presto, HBase etc. can also be used with AWS EMR.


2. Infrastructure Setup for Load Testing the Platform for the Big Event

Speaking of the infrastructure setup for the load testing of the platform. It is 500+ AWS CPU instances which are C4.4X Large or C4.8X Large running at 75% utilization.

C4X instances are built for very high CPU intensive operations with low price per compute ratio. They provide high networking performance, increased storage performance at no additional cost.

C4.4X instances typically have 30 Gigs of RAM & C4.8X 60 Gigs of RAM

The entire setup has 16 TBs of RAM, 8000 CPU cores & at peak the data transfer was approx. 32Gbps.

The engineering team makes intelligent use of spot & on-demand instances to keep the costs low.

If you found the content helpful, check out the Zero to Software Architect learning track, a series of three courses I have written intending to educate you, step by step, on the domain of software architecture and distributed system design. The learning track takes you right from having no knowledge in it to making you a pro in designing large-scale distributed systems like YouTube, Netflix, Hotstar, and more.


3. Simulating Traffic

The traffic simulation had three major components

The traffic model, the simulation script & the load generation infrastructure. I’ve already talked about the infrastructure part.

3.1 Preparing the Traffic Model

The engineering team started with a very basic traffic model, hitting the API endpoints with a certain ratio of requests. Performed some initial runs.

There are typically two types of user interactions with the system; Users who have already signed up & are logging in & the users visiting the platform for the first time.

Keeping all the different user navigation scenarios in mind the team prepared a traffic model of the system.

Now the time was to introduce spikes in the model over the entire testing phase.

3.2 Writing the Simulation Script

A few tools that were used to write simulation are Gatling, Flood.io;

Gatling is an open-source load, performance testing tool for web applications built on Scala, Akka & Netty.

It was preferred over JMeter, which is a pretty popular testing tool, due to its capability of simulating more concurrent users at one point in time.

Gatling sessions were used to maintain the user state. Simulation scripts were designed for several workflows throughout the system, for instance, user login flow, streaming flow etc.

The simulation ran on an AWS distributed cluster & simulated 20 million users out of which 4 million were concurrent.

Another cloud-based testing tool Flood.io was used with Gatling for load testing.

Flood.io is a load testing platform that runs distributed performance tests with open source tools like JMeter, Gatling, Selenium etc.

All this test setup matured with the data obtained as the real cricket matches happened.

To educate yourself on software architecture from the right resources, to master the art of designing large scale distributed systems that would scale to millions of users, to understand what tech companies are really looking for in a candidate during their system design interviews. Read my blog post on master system design for your interviews or web startup.


4. Strategies to Scale

4.1 Traffic & Ladder Based Scaling

There are primarily two triggers to scale; Traffic-based & Ladder based

Traffic based scaling simply meant add new infrastructure to the pool as the number of requests being processed by the system increases.

This works with unpredictable workloads & for services which provide all the stats in detail. They expose the number of threads created; requests processed etc.

Ladder based scaling is preferred for services which do not give much detail on the processes. For tackle this, the team has pre-defined ladders per million concurrent users.

As the number adds on, new infrastructure is added to the pool. This technique works well for a predictable load.

The team has a concurrency buffer of 2 million concurrent users in place as adding new infrastructure to the pool takes around 90 seconds & the container and the application start takes around 75 seconds.

To beat this delay, the team has a pre-provisioned buffer. This buffer approach is preferred in contrast to auto-scaling. It helps the team handle unexpected traffic spikes without adding unnecessary latency to the response.

They have developed an internal app called the Infradashboard which helps them take these scaling decisions smoothly & quite ahead of time.


4.2 Intelligent Client

Protocols are in place which come into effect when the backend is overwhelmed with requests & the application client experiences increased latency in response.

Client avoids burdening the backend servers even more in these kinds of scenarios by increasing the time between subsequent requests.

Caching & intelligent protocols are in place to enhance the user experience.


4.3 Dealing with Nefarious Traffic

Popularity attracts unwanted attention. The application with a combination of whitelisting and industry best practices drives away nefarious traffic as soon as it is discovered avoiding the burden on the servers.


4.4 Do Not Do Real-time If It’s Not Business Critical

Recently Hotstar introduced a play-along feature, where the users can interact with their folks, play trivia & stuff on the same platform watching cricket. This feature being real-time spiked the concurrent traffic load on their backend servers.

Doing stuff in real-time has its costs associated. Real-time features spike the concurrent traffic by quite an extent. And to manage concurrent traffic we need powerful hardware which can be avoided if the feature isn’t real-time.

All these experiences are pretty insightful & help us plan, design & execute in a better & informed way.

Source for this write-up;

Recommended Read: Master System Design For Your Interviews Or Your Web Startup


Handpicked Resources to Learn Software Architecture and Large Scale Distributed Systems Design
I’ve put together a list of resources (online courses + books) that I believe are super helpful in building a solid foundation in software architecture and designing large-scale distributed systems like Facebook, YouTube, Gmail, Uber, and so on.  Check it out.


Subscribe to the newsletter to stay notified of the new posts.



More On the Blog 

Web Application Architecture & Software Architecture 101 Course

Data Analytics in E-Sports – Future Prospects – Jobs – Everything You Should Know

An Insight Into How Uber Scaled From A Monolith To A Microservice Architecture

Designing a video search service with AWS – Cloud architecture 

What database does Facebook use – a deep dive 

How does Linked-In identify it’s users online