Distributed Systems, Real World Architecture

How Does PayPal Process Billions of Messages Per Day with Reactive Streams?

PayPal’s product performance tracking platform processes approx. 19 billion messages per day. The team leverages an Akka-based reactive framework called Squbs to handle such massive amounts of data and events concurrently with minimal latency and resource consumption.

LinkedIn, as well, uses the Play framework which uses Akka based actor model & a reactive event-driven architecture to identify its users online in addition to powering its messaging platform.

The PayPal engineering team moved to Squbs from an existing custom-written Spring-based module.

Why did they do this? Migrate to a reactive streaming framework? What were the issues/limitations with the existing Spring framework module?

Let’s find out.

Distributed Systems
For a complete list of similar articles on distributed systems and real-world architectures, here you go

Why Akka? Why a Reactive Stream Event-Driven Framework?

Reactive frameworks like Akka have an underlying architectural design quite different from the traditional servlet-based multi-threaded frameworks.

A typical use case of reactive frameworks is when we need persistent connections between the client and the server. We ideally use long polling or web sockets to establish persistent connections.

Akka is a non-blocking reactive streaming framework built to write modern web apps. Ideal for managing instances such as handling a large number of events, streaming data, concurrent real-time data, asynchronous data exchange, data ingestion etc.

The framework consumes minimum CPU and other resources in highly distributed scalable applications. With Akka, we can focus on our product features instead of worrying about managing low-level code with threads.

The framework uses minimal threads to process tasks keeping things simple. It provides a reliable, high performant & fault-tolerant behavior.

There are issues that come inherently when working with distributed systems such as messages getting lost, crashing of nodes, concurrency issues, etc. Akka helps us implement a multi-threaded behavior without the use of low-level concurrency constructs such as locks or atomics. We do not have to worry about memory issues much.

It helps us write better network communication code between different distributed modules.

Recommended Reads
Understanding the Actor model to build non-blocking, high-throughput distributed systems
How Actor model/Actors run in clusters facilitating asynchronous communication in distributed systems

What is Reactive Programming & Reactive Streams?

If have come across the term ‘Reactive’ for the first time, here is a gist of what it is. Reactive programming means reacting to events. When an event occurs, do something. Run a process, perform a task, send out a notification, etc. And this is a continual system behavior.

The sequence of events occurring over a period of time is called an event stream or a reactive stream. To react to a stream of events we need to keep listening to them or in other words monitor them.

A real-world use case of this is data ingestion, where massive amounts of data is continually ingested into the backend systems.

The backend logic keeps checking for parameters in the data and when they find the parameters true, they react to the event, run a task, send a notification, trigger a webhook and so on.

How Is This Different From a Regular Servlet Thread Based Model?

Traditional servlet thread request-response models are blocking in nature. When the backend receives a request, it holds the thread until the processing is done and the response is returned. There are times when the response is not ready immediately and requires time. In these scenarios, the system’s throughput goes down.

The Akka Actor model suits best for these instances. It uses minimum threads to process events. The main thread doesn’t halt for the response, rather passes the process to secondary threads and keeps processing the events received. There is a significant design difference between the traditional servlet model and the event-based reactive model.

Spring also has a separate module called the Spring Reactor to write event-based apps.

Result of Migrating to a Reactive Framework

After migrating to a Reactive framework, the engineering team at PayPal was able to smoothly handle billions of events per day.

CPU Utilization was cut down by 30%, they had fewer lines of code, reduced monitoring and VM footprint. Deployment time went down from 6 to 1.5 hours. They observed a significant reduction in event processing time and failure rate.

In the initial research for the development of the event-based system, the team considered Erlang, which seemed promising but since they had several services running on JVM, hence they picked Akka.

Squbs

Squbs is a suite of components standardizing the Akka services in a large-scale distributed environment.

It was built with the below principles in mind:

It had to be extremely lightweight
Squbs API would act as an abstraction over the Akka APIs. Though for writing the Squbs API, the developers should not require any separate knowledge besides the Akka concepts.
It had to be open source right from the first line of code with hooks available for plugging in PayPal’s modules.

Here is the GitHub repo for Squbs

Source for this write-up;

Well, Folks! This is pretty much it. If you liked the write-up, share it with your network for more reach. I am Shivang. You can read more about me here.

Check out the Zero to Software Architecture Proficiency learning path, a series of three courses I have written intending to educate you, step by step, on the domain of software architecture and distributed system design. The learning path takes you right from having no knowledge in it to making you a pro in designing large-scale distributed systems like YouTube, Netflix, Hotstar, and more.

Tags #PayPal

Architecture, Distributed Systems, Real World Architecture

An Insight into Bazaarvoice Scalable Architecture with Over 300 Million Visitors

Distributed Systems, Real World Architecture

How Hotstar Scaled With 10.3 Million Concurrent Users – An Architectural Insight

Shivang

Hello World! I am Shivang, I write about the architectures of large-scale internet services, scalability, distributed systems, databases, data engineering and backend engineering in general. You can connect with me on LinkedIn & Twitter.

About Me

Search

Categories

Trending News

System Design Case Study #5: In-Memory Storage & In-Memory Databases – Storing Application Data In-Memory To Achieve Sub-Second Response Latency

System Design Case Study #4: How WalkMe Engineering Scaled their Stateful Service Leveraging Pub-Sub Mechanism

Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI

A Discussion on Stateless & Stateful Services (Managing User State on the Backend)

System Design Case Study #3: How Discord Scaled Their Member Update Feature Benchmarking Different Data Structures

System Design Case Study #2: How GitHub Indexes Code For Blazing Fast Search & Retrieval

Recent Posts

Follow Me On Social Media

About Me

Latest

Popular

System Design Case Study #5: In-Memory Storage & In-Memory Databases – Storing Application Data In-Memory To Achieve Sub-Second Response Latency

System Design Case Study #4: How WalkMe Engineering Scaled their Stateful Service Leveraging Pub-Sub Mechanism

Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI

A Discussion on Stateless & Stateful Services (Managing User State on the Backend)

Instagram architecture & database – How does it store & search billions of images

YouTube database – How does it store so many videos without running out of storage space?

Facebook database [Updated] – A thorough insight into the databases used @Facebook

What is Grafana? Why Use It? Everything You Should Know About It

About Me

Search

Categories

Trending News

System Design Case Study #5: In-Memory Storage & In-Memory Databases – Storing Application Data In-Memory To Achieve Sub-Second Response Latency

System Design Case Study #4: How WalkMe Engineering Scaled their Stateful Service Leveraging Pub-Sub Mechanism

Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI

A Discussion on Stateless & Stateful Services (Managing User State on the Backend)

System Design Case Study #3: How Discord Scaled Their Member Update Feature Benchmarking Different Data Structures

System Design Case Study #2: How GitHub Indexes Code For Blazing Fast Search & Retrieval

How Does PayPal Process Billions of Messages Per Day with Reactive Streams?

Why Akka? Why a Reactive Stream Event-Driven Framework?

What is Reactive Programming & Reactive Streams?

How Is This Different From a Regular Servlet Thread Based Model?

Result of Migrating to a Reactive Framework

Squbs

Shivang

Related posts

Leveraging the Backends for frontends pattern to avert API gateway from becoming a system bottleneck

System Design Case Study #3: How Discord Scaled Their Member Update Feature Benchmarking Different Data Structures

Single-threaded Event Loop Architecture for Building Asynchronous, Non-Blocking, Highly Concurrent Real-time Services

Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI

System Design Case Study #2: How GitHub Indexes Code For Blazing Fast Search & Retrieval

An insight into how Uber scaled from a monolith to a microservice architecture

Recent Posts

Follow Me On Social Media

Latest

Popular

System Design Case Study #5: In-Memory Storage & In-Memory Databases – Storing Application Data In-Memory To Achieve Sub-Second Response Latency

System Design Case Study #4: How WalkMe Engineering Scaled their Stateful Service Leveraging Pub-Sub Mechanism

Why Stack Overflow Picked Svelte for their Overflow AI Feature And the Website UI

A Discussion on Stateless & Stateful Services (Managing User State on the Backend)

Instagram architecture & database – How does it store & search billions of images

YouTube database – How does it store so many videos without running out of storage space?

Facebook database [Updated] – A thorough insight into the databases used @Facebook

What is Grafana? Why Use It? Everything You Should Know About It