Stripe Data Tech Stack
See what Stripe use in the backend to handle big data that has processed more than a trillion dollar worth of transactions.
Stripe has processed over a trillion dollar worth of transactions while maintaining 99.9999% uptime, as per this source. If you are interested to learn how their offline processes work with such great efficient and reliable systems then checkout today’s article where I will cover the Data Tech Stack from the data engineering perspective.
Content is based on multiple sources including Tech Blog, Open Source websites, news articles and job description.
Platform
AWS
Stripe leverage AWS as their cloud platform and utilizes several AWS services from front end to back end, from online to offline.
Storage
S3
S3 is the main data storage solution for offline processes. S3 and Iceberg work together to provide a seamless Lakehouse architecture.
Iceberg
Stripe use Iceberg open table format to provide ACID capabilities to the Data Lake along with other benefits like time travel and data compaction.
Pinot
For Low latency real time analytical purpose, Stripe provide Pinot as a service as part of their Platform.
📖 Recommended Reading: Stripe’s Journey to $18.6B of Transactions During Black Friday-Cyber Monday with Apache Pinot
Processing
Kafka
Stripe manage 50 Kafka clusters which processes 700 terabytes in Kafka publish throughput daily. They also leverage Temporal to build a state-of-the-art Kafka Control Plane, read more here.
Airflow
Airflow is used at Stripe for batch pipeline orchestration. It allows users to run Spark jobs easily which accessing data store in the lake.
Spark
Stripe use Spark not only for batch processing workloads but also for ingesting real time data into S3 from Kafka.
Trino
Trino is used to query data sitting in S3 using SQL. It is used for both purposes; performing quick adhoc analysis and running in a batch pipeline.
Dashboard
Tableau
Stripe use Tableau to empower their business unit. Considering the size of data, they might be using open source tool as well which I could not find a reference to.
Related Content:
💬Stripe was tough one because of less information available. If you think I missed important ones, feel free to comment below.