Hotstar is the leading streaming media & video-on-demand service in India with a user base of approx. 200 million users.
It lets users stream popular shows in several different languages & genres over the web. But the primary, the most popular feature of the service is the streaming of live cricket matches. This is the real deal.
In the latest edition of IPL T20 Cricket tournament, the streaming platform received whooping record traffic of 10.3 Million concurrent users smashing its previous record of 8.26 Million concurrent users.
To my knowledge this is the most intense concurrent traffic load on any service, surpassing Fortnite which clocked a load of 8.3 million concurrent players in recent past.
For a full list of all the real-world software architecture posts on the blog here you go.
1. Technical Insights
The backend of the app is powered by AWS Amazon Web Services the CDN partner is Akamai
All the traffic is served by the EC2 instances & S3 Object store is used as the data store.
The services use a mix of on-demand & spot instances to keep the costs optimized.
Running machine learning & data analytics algorithms on spot instances helps the business cut down costs by quite an extent.
Terabytes of data in double digits is generated in any regular day and is processed by the AWS EMR Clusters.
AWS EMR is a managed Hadoop framework for processing massive amounts of data across the EC2 instances. Other popular frameworks like Apache Spark, Presto, HBase etc. can also be used with AWS EMR.
2. Infrastructure Setup for Load Testing the Platform for the Big Event
Speaking of the infrastructure setup for the load testing of the platform. It is 500+ AWS CPU instances which are C4.4X Large or C4.8X Large running at 75% utilization.
C4X instances are built for very high CPU intensive operations with low price per compute ratio. They provide high networking performance, increased storage performance at no additional cost.
C4.4X instances typically have 30 Gigs of RAM & C4.8X 60 Gigs of RAM
The entire setup has 16 TBs of RAM, 8000 CPU cores & at peak the data transfer was approx. 32Gbps.
The engineering team makes intelligent use of spot & on-demand instances to keep the costs low.
If you found the content helpful, check out the Zero to Software Architect learning track, a series of three courses I have written intending to educate you, step by step, on the domain of software architecture and distributed system design. The learning track takes you right from having no knowledge in it to making you a pro in designing large-scale distributed systems like YouTube, Netflix, Hotstar, and more.
3. Simulating Traffic
The traffic simulation had three major components
The traffic model, the simulation script & the load generation infrastructure. I’ve already talked about the infrastructure part.
3.1 Preparing the Traffic Model
The engineering team started with a very basic traffic model, hitting the API endpoints with a certain ratio of requests. Performed some initial runs.
There are typically two types of user interactions with the system; Users who have already signed up & are logging in & the users visiting the platform for the first time.
Keeping all the different user navigation scenarios in mind the team prepared a traffic model of the system.
Now the time was to introduce spikes in the model over the entire testing phase.
3.2 Writing the Simulation Script
A few tools that were used to write simulation are Gatling, Flood.io;
Gatling is an open-source load, performance testing tool for web applications built on Scala, Akka & Netty.
It was preferred over JMeter, which is a pretty popular testing tool, due to its capability of simulating more concurrent users at one point in time.
Gatling sessions were used to maintain the user state. Simulation scripts were designed for several workflows throughout the system, for instance, user login flow, streaming flow etc.
The simulation ran on an AWS distributed cluster & simulated 20 million users out of which 4 million were concurrent.
Another cloud-based testing tool Flood.io was used with Gatling for load testing.
Flood.io is a load testing platform that runs distributed performance tests with open source tools like JMeter, Gatling, Selenium etc.
All this test setup matured with the data obtained as the real cricket matches happened.
To educate yourself on software architecture from the right resources, to master the art of designing large scale distributed systems that would scale to millions of users, to understand what tech companies are really looking for in a candidate during their system design interviews. Read my blog post on master system design for your interviews or web startup.
4. Strategies to Scale
4.1 Traffic & Ladder Based Scaling
There are primarily two triggers to scale; Traffic-based & Ladder based
Traffic based scaling simply meant add new infrastructure to the pool as the number of requests being processed by the system increases.
This works with unpredictable workloads & for services which provide all the stats in detail. They expose the number of threads created; requests processed etc.
Ladder based scaling is preferred for services which do not give much detail on the processes. For tackle this, the team has pre-defined ladders per million concurrent users.
As the number adds on, new infrastructure is added to the pool. This technique works well for a predictable load.
The team has a concurrency buffer of 2 million concurrent users in place as adding new infrastructure to the pool takes around 90 seconds & the container and the application start takes around 75 seconds.
To beat this delay, the team has a pre-provisioned buffer. This buffer approach is preferred in contrast to auto-scaling. It helps the team handle unexpected traffic spikes without adding unnecessary latency to the response.
They have developed an internal app called the Infradashboard which helps them take these scaling decisions smoothly & quite ahead of time.
4.2 Intelligent Client
Protocols are in place which come into effect when the backend is overwhelmed with requests & the application client experiences increased latency in response.
Client avoids burdening the backend servers even more in these kinds of scenarios by increasing the time between subsequent requests.
Caching & intelligent protocols are in place to enhance the user experience.
4.3 Dealing with Nefarious Traffic
Popularity attracts unwanted attention. The application with a combination of whitelisting and industry best practices drives away nefarious traffic as soon as it is discovered avoiding the burden on the servers.
4.4 Do Not Do Real-time If It’s Not Business Critical
Recently Hotstar introduced a play-along feature, where the users can interact with their folks, play trivia & stuff on the same platform watching cricket. This feature being real-time spiked the concurrent traffic load on their backend servers.
Doing stuff in real-time has its costs associated. Real-time features spike the concurrent traffic by quite an extent. And to manage concurrent traffic we need powerful hardware which can be avoided if the feature isn’t real-time.
All these experiences are pretty insightful & help us plan, design & execute in a better & informed way.
Recommended Read: Master System Design For Your Interviews Or Your Web Startup
Handpicked Resources to Learn Software Architecture and Large Scale Distributed Systems Design
I’ve put together a list of resources (online courses + books) that I believe are super helpful in building a solid foundation in software architecture and designing large-scale distributed systems like Facebook, YouTube, Gmail, Uber, and so on. Check it out.
Subscribe to the newsletter to stay notified of the new posts.
More On the Blog
Web Application Architecture & Software Architecture 101 Course
Data Analytics in E-Sports – Future Prospects – Jobs – Everything You Should Know
An Insight Into How Uber Scaled From A Monolith To A Microservice Architecture
Designing a video search service with AWS – Cloud architecture
Shivang
Related posts
Zero to Software Architect Learning Track - Starting from Zero to Designing Web-Scale Distributed Applications Like a Pro. Check it out.
Master system design for your interviews. Check out this blog post written by me.
Zero to Software Architect Learning Track - Starting from Zero to Designing Web-Scale Distributed Applications Like a Pro. Check it out.
Recent Posts
- Application architecture explained in-depth with a real-world example
- System Design #3: Leveraging the Backends for frontends pattern to avert API gateway from becoming a system bottleneck
- System Design #2: Understanding API gateway and the need for it
- System Design #1: CDN and Load balancers (Understanding the request flow)
- System Design #5: How Actor model/Actors run in clusters facilitating asynchronous communication in distributed systems
Follow Me On Social Media