How Does PayPal Process Billions of Messages Per Day with Reactive Streams?
PayPal’s product performance tracking platform processes approx. 19 billion messages per day. The team leverages an Akka-based reactive framework called Squbs to handle such massive amounts of data and events concurrently with minimal latency and resource consumption.
LinkedIn, as well, uses the Play framework which uses Akka based actor model & a reactive event-driven architecture to identify its users online in addition to powering its messaging platform.
The PayPal engineering team moved to Squbs from an existing custom-written Spring-based module.
Why did they do this? Migrate to a reactive streaming framework? What were the issues/limitations with the existing Spring framework module?
Let’s find out.
For a complete list of similar articles on distributed systems and real-world architectures, here you go
Why Akka? Why a Reactive Stream Event-Driven Framework?
Reactive frameworks like Akka have an underlying architectural design quite different from the traditional servlet-based multi-threaded frameworks.
A typical use case of reactive frameworks is when we need persistent connections between the client and the server. We ideally use long polling or web sockets to establish persistent connections.
Akka is a non-blocking reactive streaming framework built to write modern web apps. Ideal for managing instances such as handling a large number of events, streaming data, concurrent real-time data, asynchronous data exchange, data ingestion etc.
The framework consumes minimum CPU and other resources in highly distributed scalable applications. With Akka, we can focus on our product features instead of worrying about managing low-level code with threads.
The framework uses minimal threads to process tasks keeping things simple. It provides a reliable, high performant & fault-tolerant behavior.
There are issues that come inherently when working with distributed systems such as messages getting lost, crashing of nodes, concurrency issues, etc. Akka helps us implement a multi-threaded behavior without the use of low-level concurrency constructs such as locks or atomics. We do not have to worry about memory issues much.
It helps us write better network communication code between different distributed modules.
Understanding the Actor model to build non-blocking, high-throughput distributed systems
How Actor model/Actors run in clusters facilitating asynchronous communication in distributed systems
What is Reactive Programming & Reactive Streams?
If have come across the term ‘Reactive’ for the first time, here is a gist of what it is. Reactive programming means reacting to events. When an event occurs, do something. Run a process, perform a task, send out a notification, etc. And this is a continual system behavior.
The sequence of events occurring over a period of time is called an event stream or a reactive stream. To react to a stream of events we need to keep listening to them or in other words monitor them.
A real-world use case of this is data ingestion, where massive amounts of data is continually ingested into the backend systems.
The backend logic keeps checking for parameters in the data and when they find the parameters true, they react to the event, run a task, send a notification, trigger a webhook and so on.
How Is This Different From a Regular Servlet Thread Based Model?
Traditional servlet thread request-response models are blocking in nature. When the backend receives a request, it holds the thread until the processing is done and the response is returned. There are times when the response is not ready immediately and requires time. In these scenarios, the system’s throughput goes down.
The Akka Actor model suits best for these instances. It uses minimum threads to process events. The main thread doesn’t halt for the response, rather passes the process to secondary threads and keeps processing the events received. There is a significant design difference between the traditional servlet model and the event-based reactive model.
Spring also has a separate module called the Spring Reactor to write event-based apps.
Result of Migrating to a Reactive Framework
After migrating to a Reactive framework, the engineering team at PayPal was able to smoothly handle billions of events per day.
CPU Utilization was cut down by 30%, they had fewer lines of code, reduced monitoring and VM footprint. Deployment time went down from 6 to 1.5 hours. They observed a significant reduction in event processing time and failure rate.
In the initial research for the development of the event-based system, the team considered Erlang, which seemed promising but since they had several services running on JVM, hence they picked Akka.
Squbs is a suite of components standardizing the Akka services in a large-scale distributed environment.
It was built with the below principles in mind:
- It had to be extremely lightweight
- Squbs API would act as an abstraction over the Akka APIs. Though for writing the Squbs API, the developers should not require any separate knowledge besides the Akka concepts.
- It had to be open source right from the first line of code with hooks available for plugging in PayPal’s modules.
Here is the GitHub repo for Squbs
Well, Folks! This is pretty much it. If you liked the write-up, share it with your network for more reach. I am Shivang. You can read more about me here.
Check out the Zero to Mastering Software Architecture learning path, a series of three courses I have written intending to educate you, step by step, on the domain of software architecture and distributed system design. The learning path takes you right from having no knowledge in it to making you a pro in designing large-scale distributed systems like YouTube, Netflix, Hotstar, and more.
Zero to Mastering Software Architecture Learning Path - Starting from Zero to Designing Web-Scale Distributed Applications Like a Pro. Check it out.
Master system design for your interviews. Check out this blog post written by me.
Zero to Mastering Software Architecture is a learning path authored by me comprising a series of three courses for software developers, aspiring architects, product managers/owners, engineering managers, IT consultants and anyone looking to get a firm grasp on software architecture, application deployment infrastructure and distributed systems design starting right from zero. Check it out.
- System Design Case Study #5: In-Memory Storage & In-Memory Databases – Storing Application Data In-Memory To Achieve Sub-Second Response Latency
- System Design Case Study #4: How WalkMe Engineering Scaled their Stateful Service Leveraging Pub-Sub Mechanism
- Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI
- A Discussion on Stateless & Stateful Services (Managing User State on the Backend)
- System Design Case Study #3: How Discord Scaled Their Member Update Feature Benchmarking Different Data Structures
CodeCrafters lets you build tools like Redis, Docker, Git and more from the bare bones. With their hands-on courses, you not only gain an in-depth understanding of distributed systems and advanced system design concepts but can also compare your project with the community and then finally navigate the official source code to see how it’s done.
Get 40% off with this link. (Affiliate)
DataCamp offers courses, skill tracks, and career tracks in data science, AI, and machine learning. With interactive exercises, short videos, and coding challenges, learners can master the data and AI skills they need.
With the data engineering courses, you can learn how to design and create the data infrastructure businesses need to scale and master one of the most lucrative skills worldwide. Check out the website here. (Affiliate)