McDonald’s uses events across their distributed architecture for use cases such as asynchronous, transactional and analytical processing, including mobile-order progress tracking and sending marketing offers like deals and promotions to their customers.

As opposed to building event streaming solutions for every individual or a set of microservices in their system, they built a unified platform to consume events from multiple external and internal producers (services). This ensured consistency in terms of implementation and averted the operational complexity of maintaining different event-streaming modules.

A few design goals for this platform were scalability (the platform needed to auto-scale in case of increased load), availability, performance (the events needed to be delivered in real-time), security, reliability, simplicity and data integrity.

Kafka is leveraged to build their event-streaming platform and since their system was running on AWS, they leveraged AWS-managed streaming for Kafka (MSK) to implement the module. This averted the operational overhead of working with a different on-prem or public cloud service.

Since the unified platform consumed events from multiple producers and the events were consumed by multiple consumers, they achieved data consistency with the help of a schema registry that gave a well-defined contract to create events. The producers and the consumers adhered to the contract when working with events with the help of the unified event streaming platform SDKs.

In case the unified platform was down, an event store (AWS Dynamo DB) was kept as a temporary store for events until the platform bounced back online.

The events coming in through external applications pass through an event gateway for authentication and authorization.

All the failed events are moved to a dead letter queue Kafka topic. A dead letter queue is an implementation within a messaging system or a streaming platform that deals with events that are not processed successfully. The events in this topic are analyzed by the admins and are either re-tried or discarded.

A serverless cluster autoscaler function gets triggered when the message broker’s CPU consumption passes a specified threshold. The function adds more nodes to the cluster. And another workflow function is then triggered to partition events across Kafka topics. The topics are partitioned across shards based on the domain since the events flow in through several different producers.

If you wish to understand concepts like event streaming, designing real-time stream validation, serverless workflows, how Kafka topics are partitioned across brokers, Kafka performance, how distributed applications are deployed across different cloud regions and availability zones, designing distributed systems like Netflix, YouTube, ESPN and the like from the bare bones and more. Check out my Zero to Software Architect learning track.

The diagram below illustrates the flow of events through their unified event streaming platform.

McDonald's event-driven architecture

Information source: Behind the scenes Mcdonald’s event-driven architecture

If you find the content interesting, you can subscribe to my newsletter to get all the content posted by me on this blog and my social media handles right in your inbox.