PayPal’s product performance tracking platform processes approx. 19 billion messages per day. The team leverages an Akka-based reactive framework called Squbs to handle such massive amounts of data and events concurrently with minimal latency and resource consumption.

LinkedIn, as well, uses the Play framework which uses Akka based actor model & a reactive event-driven architecture to identify its users online in addition to powering its messaging platform.

The PayPal engineering team moved to Squbs from an existing custom-written Spring-based module.

Why did they do this? Migrate to a reactive streaming framework? What were the issues/limitations with the existing Spring framework module?

Let’s find out.

Distributed Systems
For a complete list of similar articles on distributed systems and real-world architectures, here you go


Why Akka? Why a Reactive Stream Event-Driven Framework?

Reactive frameworks like Akka have an underlying architectural design quite different from the traditional servlet-based multi-threaded frameworks.

A typical use case of reactive frameworks is when we need persistent connections between the client and the server. We ideally use long polling or web sockets to establish persistent connections.

Akka is a non-blocking reactive streaming framework built to write modern web apps. Ideal for managing instances such as handling a large number of events, streaming data, concurrent real-time data, asynchronous data exchange, data ingestion etc.

The framework consumes minimum CPU and other resources in highly distributed scalable applications. With Akka, we can focus on our product features instead of worrying about managing low-level code with threads.

The framework uses minimal threads to process tasks keeping things simple. It provides a reliable, high performant & fault-tolerant behavior.

There are issues that come inherently when working with distributed systems such as messages getting lost, crashing of nodes, concurrency issues, etc. Akka helps us implement a multi-threaded behavior without the use of low-level concurrency constructs such as locks or atomics. We do not have to worry about memory issues much.

It helps us write better network communication code between different distributed modules.

Recommended Reads
Understanding the Actor model to build non-blocking, high-throughput distributed systems
How Actor model/Actors run in clusters facilitating asynchronous communication in distributed systems


What is Reactive Programming & Reactive Streams?

If have come across the term ‘Reactive’ for the first time, here is a gist of what it is. Reactive programming means reacting to events. When an event occurs, do something. Run a process, perform a task, send out a notification, etc. And this is a continual system behavior.

The sequence of events occurring over a period of time is called an event stream or a reactive stream. To react to a stream of events we need to keep listening to them or in other words monitor them.

A real-world use case of this is data ingestion, where massive amounts of data is continually ingested into the backend systems.

The backend logic keeps checking for parameters in the data and when they find the parameters true, they react to the event, run a task, send a notification, trigger a webhook and so on.


How Is This Different From a Regular Servlet Thread Based Model?

Traditional servlet thread request-response models are blocking in nature. When the backend receives a request, it holds the thread until the processing is done and the response is returned. There are times when the response is not ready immediately and requires time. In these scenarios, the system’s throughput goes down.

The Akka Actor model suits best for these instances. It uses minimum threads to process events. The main thread doesn’t halt for the response, rather passes the process to secondary threads and keeps processing the events received. There is a significant design difference between the traditional servlet model and the event-based reactive model.

Spring also has a separate module called the Spring Reactor to write event-based apps.


Result of Migrating to a Reactive Framework

After migrating to a Reactive framework, the engineering team at PayPal was able to smoothly handle billions of events per day.

CPU Utilization was cut down by 30%, they had fewer lines of code, reduced monitoring and VM footprint. Deployment time went down from 6 to 1.5 hours. They observed a significant reduction in event processing time and failure rate.

In the initial research for the development of the event-based system, the team considered Erlang, which seemed promising but since they had several services running on JVM, hence they picked Akka.


Squbs

Squbs is a suite of components standardizing the Akka services in a large-scale distributed environment.

It was built with the below principles in mind:

  • It had to be extremely lightweight
  • Squbs API would act as an abstraction over the Akka APIs. Though for writing the Squbs API, the developers should not require any separate knowledge besides the Akka concepts.
  • It had to be open source right from the first line of code with hooks available for plugging in PayPal’s modules.

Here is the GitHub repo for Squbs

Source for this write-up;


Well, Folks! This is pretty much it. If you liked the write-up, share it with your network for more reach. I am Shivang. You can read more about me here.

Check out the Zero to Software Architecture Proficiency learning path, a series of three courses I have written intending to educate you, step by step, on the domain of software architecture and distributed system design. The learning path takes you right from having no knowledge in it to making you a pro in designing large-scale distributed systems like YouTube, Netflix, Hotstar, and more.