<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Junaid Effendi | Sharing knowledge for Engineers]]></title><description><![CDATA[Covering tech, career, data, growth experiences from my journey.]]></description><link>https://www.junaideffendi.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Ddb9!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png</url><title>Junaid Effendi | Sharing knowledge for Engineers</title><link>https://www.junaideffendi.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 20 Apr 2026 16:54:44 GMT</lastBuildDate><atom:link href="https://www.junaideffendi.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Junaid Effendi]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[junaideffendi@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[junaideffendi@substack.com]]></itunes:email><itunes:name><![CDATA[Junaid Effendi]]></itunes:name></itunes:owner><itunes:author><![CDATA[Junaid Effendi]]></itunes:author><googleplay:owner><![CDATA[junaideffendi@substack.com]]></googleplay:owner><googleplay:email><![CDATA[junaideffendi@substack.com]]></googleplay:email><googleplay:author><![CDATA[Junaid Effendi]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Lyft Data Tech Stack]]></title><description><![CDATA[Explore the high-scale data stack Lyft uses to support 25M+ active riders, ingesting millions of real-time events every second.]]></description><link>https://www.junaideffendi.com/p/lyft-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/lyft-data-tech-stack</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Wed, 15 Apr 2026 16:31:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0BQy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Lyft operates one of the most data-intensive real-time platforms in the world, processing millions of rider&#8211;driver interactions every minute. Behind the scenes is a modern, high-scale data stack built on AWS, Kafka, Flink, Trino, and a 100+ PB warehouse on S3. In this deep dive, we&#8217;ll explore how these tools power Lyft&#8217;s data, analytics, machine learning, and real-time decision systems.</p><h3>Metrics</h3><ul><li><p>28.7M active riders in Q3 2025, completing ~2.7M rides per day.</p></li><li><p>Apache Kafka processes millions of real-time events per second for streaming analytics.</p></li><li><p>Thousands of Airflow + Flyte pipelines orchestrate ETL and ML workflows.</p></li><li><p>Data warehouse exceeds 100+ PB stored in S3 with Hive Metastore.</p></li><li><p>Trino ETL executes ~250K queries/day, reading ~10 PB/day and writing ~100 TB/day.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0BQy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0BQy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!0BQy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!0BQy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!0BQy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0BQy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2766357,&quot;alt&quot;:&quot;Lyft Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/178362516?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Lyft Data Tech Stack" title="Lyft Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!0BQy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!0BQy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!0BQy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!0BQy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c006ace-4bb5-43e3-8096-783100678683_2367x1368.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Lyft Data Tech Stack</figcaption></figure></div><blockquote><p>Content is based on multiple sources including Lyft Blog, AWS Blog and other public articles etc. You will find references to dive deep as you read.</p></blockquote><h3><strong>Platform</strong></h3><h4>AWS</h4><p>Lyft&#8217;s data and infrastructure are fully hosted on AWS, leveraging managed services and elastic scaling to support real-time transportation and logistics workloads. AWS powers everything from API services to large-scale data analytics, enabling Lyft to handle millions of rider-driver interactions efficiently.</p><blockquote><p>&#128214;Further Reading: <a href="https://aws.amazon.com/solutions/case-studies/lyft/">Lyft Case Study</a></p></blockquote><h3>Messaging System</h3><h4>Kafka</h4><p>Lyft adopted Kafka relatively recently to address scaling challenges, as explained in the video below. Kafka facilitates real-time data streaming, processing millions of events per minute, and supports various use cases such as trip updates, driver location tracking, and telemetry ingestion.</p><div id="youtube2-Xwprhf9c6kI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Xwprhf9c6kI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Xwprhf9c6kI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h4>Kinesis</h4><p>Alongside Kafka, Lyft also leverages AWS Kinesis which was introduced at Lyft long before Kafka. There is not enough public information about the plans if Kinesis will be replaced by Kafka.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PRd0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PRd0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 424w, https://substackcdn.com/image/fetch/$s_!PRd0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 848w, https://substackcdn.com/image/fetch/$s_!PRd0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 1272w, https://substackcdn.com/image/fetch/$s_!PRd0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PRd0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png" width="1456" height="650" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9159503-295b-4911-9110-cdc527262a3a_1600x714.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:650,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;unnamed.png&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="unnamed.png" title="unnamed.png" srcset="https://substackcdn.com/image/fetch/$s_!PRd0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 424w, https://substackcdn.com/image/fetch/$s_!PRd0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 848w, https://substackcdn.com/image/fetch/$s_!PRd0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 1272w, https://substackcdn.com/image/fetch/$s_!PRd0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9159503-295b-4911-9110-cdc527262a3a_1600x714.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://clickhouse.com/blog/lyft-analytics-clickhouse-cloud">source</a></figcaption></figure></div><h3>Processing</h3><h4>Flink &amp; Beam</h4><p>Lyft adopted Apache Flink as the core real time and streaming engine to support wide variety of use cases. They leverage Apache Beam on top of the Flink runner due to portability and multi language capabilities.</p><p>Below is the video from the summit where you can learn more on how these tools are used at Lyft.</p><div id="youtube2-8k1iezoc5Sc" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;8k1iezoc5Sc&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/8k1iezoc5Sc?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h4>Spark</h4><p>Lyft leverages Spark mostly on the Machine Learning side, e.g. LyftLearn, a ML platform which streamline Spark-on-Kubernetes for improved resource allocation, SQL adaptability, and integration with ML libraries. This seamlessly integrates with the orchestration tool Flyte.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://eng.lyft.com/how-lyftlearn-democratizes-distributed-compute-through-kubernetes-spark-and-fugue-c0875b97c3d9">Spark At Lyft</a></p></blockquote><h4>Trino</h4><p>Lyft uses Trino it to power large-scale ETL workloads that read 10 PB and write 100 TB of data daily from its Hive warehouse. Trino also serves as a fast, user-friendly query layer for teams, with over 90% of queries completing in under three minutes.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://trino.io/blog/2022/12/12/trino-summit-2022-lyft-recap.html">Trino for large scale ETL at Lyft</a></p></blockquote><h3>Orchestrator</h3><h4>Airflow &amp; Flyte</h4><p>Lyft provides two orchestration platforms Airflow and Flyte, each having its pros and cons. Historically, Airflow has been widely used across Lyft for building data pipelines, while Lyft developed Flyte focusing on high intensive tasks like Machine Learning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tot6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tot6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 424w, https://substackcdn.com/image/fetch/$s_!Tot6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 848w, https://substackcdn.com/image/fetch/$s_!Tot6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 1272w, https://substackcdn.com/image/fetch/$s_!Tot6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tot6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png" width="606" height="285.77992744860944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:390,&quot;width&quot;:827,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:79816,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/178362516?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tot6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 424w, https://substackcdn.com/image/fetch/$s_!Tot6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 848w, https://substackcdn.com/image/fetch/$s_!Tot6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 1272w, https://substackcdn.com/image/fetch/$s_!Tot6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11e4eebd-e7bf-44c7-adcb-c337df82c430_827x390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>&#128214; Read more: <strong><a href="https://eng.lyft.com/orchestrating-data-pipelines-at-lyft-comparing-flyte-and-airflow-72c40d143aad">Comparing Flyte and Airflow at Lyft</a></strong></p></blockquote><h3><strong>Warehouse</strong></h3><h4>S3 &amp; Hive</h4><p>Lyft&#8217;s core data warehouse containing 100s of petabytes runs on top of Amazon S3, with Hive used as the metastore. While Lyft hasn&#8217;t publicly stated using modern open table formats like Iceberg or Delta, their S3 + Hive setup still powers large-scale historical storage.</p><h3><strong>Catalog</strong></h3><h4>Amundsen</h4><p>Developed in-house and open-sourced, Amundsen is Lyft&#8217;s metadata and data discovery platform. It provides a centralized catalog for datasets, dashboards, and pipelines, integrating with Hive Metastore, Trino, and Airflow to surface ownership, lineage, and documentation. Amundsen has since become a leading open metadata framework adopted across the industry.</p><blockquote><p>&#128214; Learn more: <strong><a href="https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9">Amundsen: Lyft&#8217;s Data Discovery Engine</a></strong></p></blockquote><h3><strong>Data Store</strong></h3><h4>ClickHouse</h4><p>Lyft faced key challenges like data deduplication using Apache Druid, which prompted their migration to ClickHouse. This switch not only resolved those issues but also delivered improved performance and operational efficiency for real-time analytics.</p><blockquote><p>&#128214; Recommended Reading: <strong><a href="https://eng.lyft.com/druid-deprecation-and-clickhouse-adoption-at-lyft-120af37651fd">Druid Deprecation and ClickHouse Adoption at Lyft</a></strong></p></blockquote><h3><strong>Dashboard</strong></h3><h4>Superset</h4><p>For visualization, Lyft primarily uses Apache Superset, which connects to Trino as its query engine, to deliver both analytics and operational dashboards.</p><p>As per <a href="https://eng.lyft.com/presto-infrastructure-at-lyft-b10adb9db01#:~:text=Presto%20Clients,apps%20querying%20through%20these%20clients">this article</a> from 2019, Lyft supported multiple dashboarding tools, there is no recent public information whether the tools are consolidated or still supported.</p><div><hr></div><p><strong>Related Content:</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f757df53-8704-4ce3-831c-49ec1e020f44&quot;,&quot;caption&quot;:&quot;Explore how Spotify processes over 1.4 trillion data points daily to power personalized experiences for hundreds of millions of users worldwide. This overview distills the essential tools, architectures, and innovations Spotify employs for data ingestion, processing, storage, and analytics.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Spotify Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-16T16:30:35.825Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!S_F0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/spotify-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:165484212,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ddb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;4c14c57d-b172-42b2-9693-63bbcd6b8974&quot;,&quot;caption&quot;:&quot;Learn how Shopify handles hundreds of millions of peak requests per minute, powering billions in sales through a robust, scalable infrastructure. This overview shares the key tools, architectures, and innovations Shopify leverages for data ingestion, processing, storage, and analytics to support global commerce.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Shopify Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-08T17:30:15.011Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!_pGP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/shopify-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:170024744,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:1,&quot;publication_id&quot;:2256445,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ddb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Overall, Lyft blends AWS with cloud-native and open-source systems to support massive real-time data processing, analytics, and machine learning across its platform.</p>]]></content:encoded></item><item><title><![CDATA[How Delta UniForm works]]></title><description><![CDATA[Learn how Delta UniForm enables read and write interoperability from Delta tables to Iceberg and Hudi formats.]]></description><link>https://www.junaideffendi.com/p/how-delta-uniform-works</link><guid isPermaLink="false">https://www.junaideffendi.com/p/how-delta-uniform-works</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 07 Mar 2026 17:30:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5daU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Delta UniForm (Delta Universal Format) is a unified table format layer built on Delta Lake that enables seamless read and write interoperability from Delta tables to Apache Iceberg and Apache Hudi, without manual conversions or format-specific logic.</p><div class="pullquote"><p>&#128161;UniForm was created at DataBricks and is part of the Delta Lake Open Source project.</p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5daU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5daU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 424w, https://substackcdn.com/image/fetch/$s_!5daU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 848w, https://substackcdn.com/image/fetch/$s_!5daU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 1272w, https://substackcdn.com/image/fetch/$s_!5daU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5daU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png" width="1456" height="987" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:987,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:386469,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/175956781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5daU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 424w, https://substackcdn.com/image/fetch/$s_!5daU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 848w, https://substackcdn.com/image/fetch/$s_!5daU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 1272w, https://substackcdn.com/image/fetch/$s_!5daU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc820bfea-42ff-464f-98a8-0a58486ef489_2367x1604.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How Delta UniForm works</figcaption></figure></div><h3>How does it work</h3><p>Traditionally, if you want to move data from one format to another, you need to read as source format and write as target format, which generates a copy of data while consuming resources. With Uniform, interoperability is possible as all the open table formats are built on top of Parquet files, the difference is how they store metadata. </p><blockquote><p>&#128161;One data copy with multiple format support through metadata files.</p></blockquote><p>UniForm is part of Delta Lake so it only supports Delta to Iceberg/Hudi conversion. To enable Uniform for Iceberg on a Delta Table: </p><pre><code>CREATE TABLE main.default.UniForm_demo_table (msg STRING)
TBLPROPERTIES(&#8217;delta.universalFormat.enabledFormats&#8217; = &#8216;iceberg&#8217;);</code></pre><p>Once the property is enabled, it generates and incrementally maintains the metadata files for the target version.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ztCv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ztCv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 424w, https://substackcdn.com/image/fetch/$s_!ztCv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 848w, https://substackcdn.com/image/fetch/$s_!ztCv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 1272w, https://substackcdn.com/image/fetch/$s_!ztCv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ztCv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png" width="1200" height="416" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Apache Parquet&quot;,&quot;title&quot;:&quot;Apache Parquet&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Apache Parquet" title="Apache Parquet" srcset="https://substackcdn.com/image/fetch/$s_!ztCv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 424w, https://substackcdn.com/image/fetch/$s_!ztCv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 848w, https://substackcdn.com/image/fetch/$s_!ztCv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 1272w, https://substackcdn.com/image/fetch/$s_!ztCv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cfc6267-f1d7-4e41-9697-0873440e6554_1200x416.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Showing how Uniform helps with multiple table format; <a href="https://www.databricks.com/blog/delta-uniform-universal-format-lakehouse-interoperability">source</a></figcaption></figure></div><blockquote><p>&#128214; Recommended Reading: <a href="https://www.junaideffendi.com/p/how-delta-lake-works">How Delta Lake Works</a></p></blockquote><p>Since source table is Delta, it supports Delta optimizations which also reflect in the target metadata files to give similar read performance. However, target specific optimizations are not possible today.</p><h4>Write Flow</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kQDa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kQDa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 424w, https://substackcdn.com/image/fetch/$s_!kQDa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 848w, https://substackcdn.com/image/fetch/$s_!kQDa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!kQDa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kQDa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png" width="1456" height="1069" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1069,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:339081,&quot;alt&quot;:&quot;Write flow with UniForm for Iceberg enabled&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/175956781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Write flow with UniForm for Iceberg enabled" title="Write flow with UniForm for Iceberg enabled" srcset="https://substackcdn.com/image/fetch/$s_!kQDa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 424w, https://substackcdn.com/image/fetch/$s_!kQDa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 848w, https://substackcdn.com/image/fetch/$s_!kQDa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 1272w, https://substackcdn.com/image/fetch/$s_!kQDa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F852a050d-d721-414b-bc4a-94a893b86e44_1907x1400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Write flow with UniForm for Iceberg enabled</figcaption></figure></div><ul><li><p>User submits a write query with Iceberg enabled through UniForm.</p></li><li><p>Native Delta Table logs and data are updated. Read detailed Delta write flow <a href="https://www.junaideffendi.com/i/165483998/write-flow">here</a>.</p></li><li><p>UniForm translates logical actions (insert/update/delete/merge) into target format commits.</p></li><li><p>Target metadata is incrementally &amp; asynchronously updated i.e. Iceberg metadata files. This makes sure Iceberg metadata is always in sync giving a unified view of the same data.</p></li></ul><p><strong>Impact:</strong> Slight overhead due to metadata translation which could degrade write performance upto <code>5%</code>.</p><h4>Read Flow</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nIhL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nIhL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 424w, https://substackcdn.com/image/fetch/$s_!nIhL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 848w, https://substackcdn.com/image/fetch/$s_!nIhL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 1272w, https://substackcdn.com/image/fetch/$s_!nIhL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nIhL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png" width="1456" height="1118" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1118,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:333062,&quot;alt&quot;:&quot;Read Flow with UniForm enabled&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/175956781?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Read Flow with UniForm enabled" title="Read Flow with UniForm enabled" srcset="https://substackcdn.com/image/fetch/$s_!nIhL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 424w, https://substackcdn.com/image/fetch/$s_!nIhL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 848w, https://substackcdn.com/image/fetch/$s_!nIhL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 1272w, https://substackcdn.com/image/fetch/$s_!nIhL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f58691a-8b60-4d3b-9813-9edda711a026_1819x1397.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Read Flow with UniForm enabled</figcaption></figure></div><p>Read is same as like reading a native table as shared for Delta Table <a href="https://www.junaideffendi.com/i/165483998/read-flow">here</a>.</p><ul><li><p>User submits a query to read from an Iceberg Table. </p></li><li><p>Query fetches the required metadata, in our case Iceberg from the below options.</p><ul><li><p><code>Delta &#8594; _delta_log/</code> </p></li><li><p><code>Iceberg &#8594; metadata/</code></p></li></ul></li></ul><ul><li><p>Construct the table snapshot and read the Parquet Data files using the native reader. Note: These are the same files that were generated by Delta during write operation.</p></li><li><p>Results returned via the compute engine to the user.</p></li></ul><p><strong>Impact:</strong> The table is read directly using Iceberg metadata so there is no impact on read performance.</p><h3>Use Case</h3><p>A simple scenario for using UniForm is when other systems require your Delta table to be accessible in Iceberg or Hudi format. In such cases, UniForm can seamlessly generate the necessary metadata files with almost negligible impact on performance.</p><p>Below is the architecture showing where does UniForm fits in the picture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RKbF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RKbF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 424w, https://substackcdn.com/image/fetch/$s_!RKbF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 848w, https://substackcdn.com/image/fetch/$s_!RKbF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 1272w, https://substackcdn.com/image/fetch/$s_!RKbF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RKbF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Flow diagram&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Flow diagram" title="Flow diagram" srcset="https://substackcdn.com/image/fetch/$s_!RKbF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 424w, https://substackcdn.com/image/fetch/$s_!RKbF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 848w, https://substackcdn.com/image/fetch/$s_!RKbF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 1272w, https://substackcdn.com/image/fetch/$s_!RKbF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81338daf-4dcd-4c13-9513-8c8c40a3b987_960x540.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://delta.io/_astro/fg3.BGAaxhWM_28TwpB.webp">source</a></figcaption></figure></div><p>In the above image, Databricks generates Delta Table with Iceberg enabled through Uniform. Delta is for Databricks and Iceberg is for Snowflake.</p><div class="pullquote"><p>&#128161;UniForm will become a critical tool with AWS investing heavily in Iceberg, the need to read Delta Tables as Iceberg will increase rapidly.</p></div><h3>Alternative</h3><p>Apache XTable is another option that supports interoperability across all major table formats, enabling conversions from any source to any target. In contrast, Delta UniForm only supports Delta as the source format.</p><blockquote><p>&#128214; UniForm initially supported interoperability from Delta to Iceberg. To extend this capability to Hudi, <a href="https://delta.io/blog/unifying-open-table/">Delta UniForm partnered with XTable</a>.</p></blockquote><h3>Conclusion</h3><p>By bridging the gap between modern lakehouse formats, Delta UniForm simplifies data sharing and format compatibility across platforms, positioning itself as a key enabler of open, multi-engine lakehouse interoperability alongside emerging alternatives like Apache XTable.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>References</h3><ul><li><p>Open Source Delta UniForm: https://delta.io/blog/unifying-open-table/</p></li><li><p>Databricks Delta UniForm: https://www.databricks.com/blog/delta-uniform-universal-format-lakehouse-interoperability</p></li><li><p>AWS Delta to Iceberg through UniForm: https://aws.amazon.com/blogs/big-data/expand-data-access-through-apache-iceberg-using-delta-lake-uniform-on-aws/</p></li><li><p>Apache XTable overview: https://xtable.apache.org/</p></li></ul><p></p><h3></h3>]]></content:encoded></item><item><title><![CDATA[Coinbase Data Tech Stack]]></title><description><![CDATA[See what Coinbase use in the backend to handle big data that processes billions of data every day for their 120 million plus users.]]></description><link>https://www.junaideffendi.com/p/coinbase-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/coinbase-data-tech-stack</guid><pubDate>Sat, 07 Feb 2026 17:30:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!UUza!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Explore how Coinbase ingests billions of events daily to power trading, custody, and compliance for one of the world&#8217;s largest cryptocurrency platforms. This article dives into the essential tools, architectures, and innovations Coinbase employs for data ingestion, processing, storage, and analytics.</p><h3>Metrics</h3><ul><li><p>120+ million verified users worldwide, <a href="https://www.demandsage.com/coinbase-users-statistics/">source</a>.</p></li><li><p>8.7+ million monthly transacting users (MTU), <a href="https://www.demandsage.com/coinbase-users-statistics/">source</a>.</p></li><li><p>$400+ billion in assets under custody, <a href="https://coinlaw.io/coinbase-users-statistics/">source</a>.</p></li><li><p>Billions of events processed daily across user activity, blockchain data, and market feeds.</p></li><li><p>30 Kafka brokers with ~17TB storage per broker.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UUza!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UUza!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!UUza!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!UUza!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!UUza!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UUza!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2132395,&quot;alt&quot;:&quot;Coinbase Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/173521174?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Coinbase Data Tech Stack" title="Coinbase Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!UUza!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!UUza!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!UUza!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!UUza!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd206e6f0-0d77-4411-9a9a-d6441b55f5c6_2367x1368.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Coinbase Data Tech Stack</figcaption></figure></div><blockquote><p>Content is based on multiple sources including Coinbase Blog, AWS Blog and other public articles etc. You will find references to dive deep as you read.</p></blockquote><h3>Platform</h3><h4>AWS</h4><p>Coinbase leverages several AWS cloud services to solve their complex large scale challenges. The company also partnered with AWS to modernize and optimize its cloud infrastructure, migrating legacy workloads to Amazon EC2 instances powered by AWS Graviton processors and adopting Amazon EKS for automated scaling and resource management. As a result of this partnership, they have saved cost by roughly 62% and infra scaling time by 50%.</p><blockquote><p>&#128214; Read More: <a href="https://aws.amazon.com/solutions/case-studies/coinbase-migration-case-study/">Coinbase Boosts Efficiency and Accelerates Development by Collaborating with AWS</a></p></blockquote><h3>Messaging System</h3><h4>Kafka</h4><p>For centralized messaging service, Coinbase uses Kafka through AWS managed offering, known as Managed Streaming for Apache Kafka (MSK). Kafka ingests billions of events everyday from user actions, applications, crypto feeds, and database change data capture (CDC).</p><p>With MSK, Coinbase reduced operational burden, achieved very low end-to-end latency (&lt;10ms) for many pipelines (versus ~200ms with previous systems), improved reliability across AZs, and made scaling more seamless.</p><p>It takes few steps to provision a new MSK cluster.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VAXA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VAXA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 424w, https://substackcdn.com/image/fetch/$s_!VAXA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 848w, https://substackcdn.com/image/fetch/$s_!VAXA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 1272w, https://substackcdn.com/image/fetch/$s_!VAXA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VAXA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png" width="1308" height="176" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90364b0e-6049-4876-b727-74f460003cfe_1308x176.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:176,&quot;width&quot;:1308,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;kafka image1&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="kafka image1" title="kafka image1" srcset="https://substackcdn.com/image/fetch/$s_!VAXA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 424w, https://substackcdn.com/image/fetch/$s_!VAXA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 848w, https://substackcdn.com/image/fetch/$s_!VAXA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 1272w, https://substackcdn.com/image/fetch/$s_!VAXA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90364b0e-6049-4876-b727-74f460003cfe_1308x176.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">source: <a href="https://www.coinbase.com/blog/kafka-infrastructure-renovation">link</a></figcaption></figure></div><blockquote><p>&#128214;Recommended Reading: <a href="https://www.coinbase.com/blog/how-we-scaled-data-streaming-at-coinbase-using-aws-msk">How we scaled data streaming at Coinbase using AWS MSK </a></p></blockquote><h3>Processing</h3><h4>Spark (SOON)</h4><p>Coinbase built <strong>SOON (Spark cOntinuOus iNgestion)</strong> on Databricks to replace slow, siloed Airflow &lt;&gt; Kafka &lt;&gt; Snowflake ETLs with a unified, low-latency streaming framework. Using Spark Structured Streaming and Delta Lake, SOON supports both append-only and merge (upsert/delete) ingestion, enabling scalable real-time data processing.</p><p>They also use Spark outside of SOON framework for batch processing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bd88!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bd88!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 424w, https://substackcdn.com/image/fetch/$s_!bd88!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 848w, https://substackcdn.com/image/fetch/$s_!bd88!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 1272w, https://substackcdn.com/image/fetch/$s_!bd88!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bd88!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png" width="1456" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;SOON1&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="SOON1" title="SOON1" srcset="https://substackcdn.com/image/fetch/$s_!bd88!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 424w, https://substackcdn.com/image/fetch/$s_!bd88!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 848w, https://substackcdn.com/image/fetch/$s_!bd88!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 1272w, https://substackcdn.com/image/fetch/$s_!bd88!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F516e3eca-d88b-4e16-b1e0-9e1549bec2a3_2048x942.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from the below source.</figcaption></figure></div><blockquote><p>&#128214; More on SOON: <a href="https://www.coinbase.com/blog/soon-for-near-real-time-data-at-coinbase-part-1">Spark cOntinuOus iNgestion for near real-time data</a> </p></blockquote><h3>Orchestrator </h3><h4>Airflow</h4><p>Coinbase adopted Airflow in 2017 when it was still gaining popularity. They made Airflow as their centralize orchestrator for data pipelines used by hundreds of data engineers and scientists. </p><p>With their adoption of Databricks, they are most likely leveraging Databricks Workflows, however there is no public information available.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://www.coinbase.com/blog/revamping-the-apache-airflow-based-workflow-orchestration-platform-at">Revamping the Apache Airflow</a></p></blockquote><h3>Warehouse</h3><h4>Snowflake</h4><p>Coinbase is also a customer of Snowflake, they have migrated the real time pipelines to Databricks, but other workflows still rely heavily on Snowflake as per this <a href="https://www.snowflake.com/webinars/customer-webinars/rapid-customer-insights-with-scalable-ml-workflows-in-snowflake-2025-08-28/">source</a>. Furthermore, their BI team that leverages Looker which is most likely fetching data from Snowflake.</p><div class="pullquote"><p>I could not find enough public information except one <a href="https://www.linkedin.com/pulse/brief-history-data-coinbase-small-step-towards-web3-era-michael-li/">article </a>from ex-Coinbase leader.</p></div><h3>Lakehouse</h3><h4>Delta Lake</h4><p>Delta is used through Databricks as their open table format. One of the usecase is for the Streaming pipeline that is built using SOON on Databricks, see image in Spark section.</p><h4>S3</h4><p>S3 is the object storage under Delta, however this is managed through Databricks. It is also used to dump full data snapshots from various Databases like PostgresSQL and DynamoDB.</p><h3>Data Store</h3><h4><strong>StarRocks</strong></h4><p>Coinbase uses StarRocks via CelerData to enable real-time analytics directly on their data lakehouse, avoiding complex ETL. This setup delivers sub-second query latency, supports high concurrency, and scales with growing data volumes, improving performance for analytics workloads.</p><div id="youtube2-3Z9jSCaHnYg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;3Z9jSCaHnYg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/3Z9jSCaHnYg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3>Dashboard</h3><h4>Looker</h4><p>As per this <a href="https://www.linkedin.com/pulse/brief-history-data-coinbase-small-step-towards-web3-era-michael-li/">source</a> from 2022, they onboarded Looker as their Business Intelligence (BI) Platform mainly due to its technical capabilities. </p><div><hr></div><p><strong>Related Content:</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0f96f149-5fd1-49ad-ad8c-51b5672ffaee&quot;,&quot;caption&quot;:&quot;Explore how Spotify processes over 1.4 trillion data points daily to power personalized experiences for hundreds of millions of users worldwide. This overview distills the essential tools, architectures, and innovations Spotify employs for data ingestion, processing, storage, and analytics.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Spotify Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-16T16:30:35.825Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!S_F0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/spotify-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:165484212,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ddb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;5e20e2c6-50fd-48b1-8dd1-6fbb5377e2c1&quot;,&quot;caption&quot;:&quot;Netflix handle massive scale, from event data in streams to data at rest in the warehouse. Netflix data stack is pretty solid, mostly built on top of open source solutions. The data stack processes trillions of data points everyday while the scale of data at rest is in hundreds of Petabytes based on&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Netflix Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-08T16:31:10.943Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!wUu5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2ea4efe-5128-4330-804b-a54c2f561e08_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/netflix-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144081570,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:35,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ddb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Coinbase is a modern tech company that heavily relies on commercial and cloud offerings such as MSK, Databricks, Snowflake, and CelerData, reflecting a tech culture that prefers buying solutions rather than building them in-house.</p>]]></content:encoded></item><item><title><![CDATA[Inside Data Engineering with Hasan Geren]]></title><description><![CDATA[Follow Hasan Geren as he explores the landscape of data engineering, offering insights, tackling challenges, and highlighting emerging industry trends.]]></description><link>https://www.junaideffendi.com/p/inside-data-engineering-with-hasan</link><guid isPermaLink="false">https://www.junaideffendi.com/p/inside-data-engineering-with-hasan</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 10 Jan 2026 16:31:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FGhM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, we're joined by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Hasan Geren&quot;,&quot;id&quot;:359992715,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f2cdc049-a9dc-46c9-a643-a71619d45bbe_200x200.jpeg&quot;,&quot;uuid&quot;:&quot;cb73ecb2-3453-430c-9156-e9d9d6c56b59&quot;}" data-component-name="MentionToDOM"></span>, who started out in industrial engineering and data science before moving into data engineering. For the past three years, he's been working across both academia through PhD research and industry, and he's now a data engineer at a high-growth startup.</p><p>To recap: the series follows a Q&amp;A format, featuring professionals who share their journeys, insights, and challenges.</p><h3><strong>What to Expect:</strong></h3><ul><li><p><strong>Practical insights</strong> &#8211; Get a clear view of what data engineers do in their day-to-day work.</p></li><li><p><strong>Emerging trends</strong> &#8211; Stay informed about new technologies and evolving best practices.</p></li><li><p><strong>Real-world challenges</strong> &#8211; Understand the obstacles data engineers face and how they overcome them.</p></li><li><p><strong>Myth-busting</strong> &#8211; Uncover common misconceptions about data engineering and its true impact.</p></li></ul><div class="pullquote"><p>&#11088; If you're curious about data engineering or considering it as a career, this series is for you!</p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FGhM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FGhM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!FGhM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!FGhM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!FGhM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FGhM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4767481,&quot;alt&quot;:&quot;Inside Data Engineering with Hasan Geren&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/168220332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside Data Engineering with Hasan Geren" title="Inside Data Engineering with Hasan Geren" srcset="https://substackcdn.com/image/fetch/$s_!FGhM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!FGhM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!FGhM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!FGhM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cb98cdb-cbc2-47ed-8fa8-04210af7b1a8_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inside Data Engineering with Hasan Geren</figcaption></figure></div><p>Let&#8217;s dive into Inside Data Engineering:</p><h4>How would you describe Data Engineering?</h4><p>Data engineering is about understanding what different teams need from data, aligning on definitions, and providing the systems and infrastructure that address those needs. It&#8217;s a mix of technical and social skills, because good data engineering often comes down to;</p><ul><li><p>Clear communication</p></li><li><p>Effective collaboration between teams</p></li><li><p>Near-optimal architectural choices</p></li><li><p>Systems people can trust</p></li></ul><h4>How did you end up being a Data Engineer?</h4><p>I started out in Industrial Engineering, exploring different paths through internships. While Industrial Engineering didn&#8217;t excite me much, I got into data mining and machine learning during grad school, which led to my first role as a Data Scientist. I was the third person to join an AI startup, and without a dedicated Data Engineer, Architect, or Cloud Engineer, I had to build the entire data foundation myself. That experience made me realise I enjoyed the Data Engineering parts the most. Therefore, I began a PhD on distributed stream processing, and that marked my full transition into Data Engineering almost 3 years ago.</p><h4>What's your day-to-day look like?</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KVfz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KVfz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!KVfz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!KVfz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!KVfz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KVfz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:168242,&quot;alt&quot;:&quot;Hasan Geren&#8217;s day to day&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/168220332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hasan Geren&#8217;s day to day" title="Hasan Geren&#8217;s day to day" srcset="https://substackcdn.com/image/fetch/$s_!KVfz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!KVfz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!KVfz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!KVfz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06d1fc2d-c601-43c0-87a8-11173da1a72d_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hasan Geren&#8217;s day to day</figcaption></figure></div><p>My average day-to-day probably looks like this:</p><ul><li><p>20% Meetings</p></li><li><p>60% Development/Coding</p></li><li><p>10% Documentation (Aiming to make this 20% and Coding 50%)</p></li><li><p>10% Learning/Reading</p></li></ul><p>It&#8217;s hard to put it into a fixed structure, since something unexpected almost always pops up, but I think these percentages reflect the general distribution quite accurately.</p><h4>What are some stakeholders that you work with?</h4><p>I work with a full range of stakeholders, it&#8217;s really the full package. I work with:</p><ul><li><p>Analytics</p></li><li><p>Product teams</p></li><li><p>AI/ML teams</p></li><li><p>HR</p></li><li><p>C-suite.</p></li></ul><h4>What kind of projects do you work on?</h4><p>The projects I work on vary quite a bit. Most frequently I build data pipelines that ingest data from APIs or databases, transforming and modelling it in the semantic layer, and orchestrating the entire process with workflow tools.</p><p>I often dive deep into semantic modelling to create metrics that meet criteria of different domains. From time to time, I also build dashboards on top of the pipelines I&#8217;ve created to support stakeholders directly.</p><p>On top of that, I handle DevOps-related tasks like implementing CI checks to maintain standards and manage our cloud infrastructure using Infrastructure as Code.</p><h4>What kind of data do you work with?</h4><p>I mostly work with tabular data and event data. Tabular data typically comes from APIs or transactional databases and event data from streaming tools which captures frontend or backend events.</p><h4>What data size do you work with?</h4><p>It is relatively small. I&#8217;d say a couple TBs the most.</p><h4>What tech stack do you use?</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XYpp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XYpp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!XYpp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!XYpp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!XYpp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XYpp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f56059c-a183-4878-905d-1854872d727f_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:955141,&quot;alt&quot;:&quot;Hasan Geren Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/168220332?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Hasan Geren Tech Stack" title="Hasan Geren Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!XYpp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!XYpp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!XYpp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!XYpp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f56059c-a183-4878-905d-1854872d727f_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Hasan Geren Tech Stack</figcaption></figure></div><h4>What tools do you leverage for GenAI?</h4><p>I mostly use ChatGPT and Claude for brainstorming and sometimes quick prototyping. I also use Warp, an AI-powered terminal, which I think is a great productivity boost for data engineers who spend a lot of time in the shell.</p><h4>What is your favorite area of Data Engineering?</h4><p>I&#8217;m not sure I have a single favourite area, but what I really enjoy about Data Engineering is the mixture it offers of learning, research, and engineering. For someone like me who loves deep reading and also building real systems to test and implement new ideas, it&#8217;s the perfect mix.</p><h4>What is the next big thing according to you in Data Engineering?</h4><p>I&#8217;d say a &#8220;real&#8221; self-serve analytics layer. Most tools today that claim to offer self-serve analytics still require a lot of dependency on data teams. But with the emergence of semantic models and the integration of GenAI, I believe we&#8217;re getting closer.</p><h4>What advice would you give your past self as a beginner Data Engineer?</h4><ul><li><p>Don&#8217;t hesitate to ask more questions!</p></li><li><p>Listen to advice, but always think critically and filter what aligns with your goals and understanding.</p></li><li><p>No one knows everything.</p></li><li><p>Optimal is the enemy of good.</p></li></ul><h4>What are some challenging aspects of Data Engineering?</h4><p>The first challenge I&#8217;d point out is how overwhelming the field can be for beginners. There are so many concepts, tools, and stakeholder dynamics involved that people can get stuck just trying to figure out where to start.</p><p>The second one is that many companies still lack data-literate managers or executives. This can lead to unrealistic expectations, poor prioritisation, and unnecessary pressure on data engineers.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>I hope this article was helpful for the readers. Thanks to Hasan for sharing his experience with my audience. Stay tuned for more!</p><p>Please reach out if you like:</p><ul><li><p>To be the guest and share your experiences &amp; journey.</p></li><li><p>To provide feedback and suggestions on how we can improve the quality of questions.</p></li><li><p>To suggest guests for the future articles.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Solving Spark’s Small File Problem for 100x Faster Reads]]></title><description><![CDATA[Understand the Spark common small file problem, learn how to solve in the modern open table formats through offline and online optimizations.]]></description><link>https://www.junaideffendi.com/p/solving-sparks-small-file-problem</link><guid isPermaLink="false">https://www.junaideffendi.com/p/solving-sparks-small-file-problem</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 06 Dec 2025 17:30:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TuHR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <em>small file problem</em> is one of the most common performance bottlenecks in Spark workloads. It&#8217;s deceptively simple: too many tiny files slow down your reads. But the implications run deep, affecting batch pipelines, streaming workloads, and ultimately user-facing query performance.</p><p>This article dives into what the small file problem is, why it happens, and how to solve it; with strategies that can deliver up to 100x faster reads especially for streaming jobs. </p><blockquote><p>&#128161;1000 files &#8594; compacted to 10 files will give 100x faster reads.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TuHR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TuHR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!TuHR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!TuHR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!TuHR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TuHR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:232596,&quot;alt&quot;:&quot;Solving Spark&#8217;s Small File Problem for 100x Faster Reads&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/172501238?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Solving Spark&#8217;s Small File Problem for 100x Faster Reads" title="Solving Spark&#8217;s Small File Problem for 100x Faster Reads" srcset="https://substackcdn.com/image/fetch/$s_!TuHR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!TuHR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!TuHR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!TuHR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F243015bd-9441-4955-a66e-b7d1811e6746_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Solving Spark&#8217;s Small File Problem for 100x Faster Reads</figcaption></figure></div><h3>What Is the Small File Problem?</h3><p>When Spark writes data, it does so in parallel. Each task typically outputs one file. With high parallelism (e.g., 100 tasks), you often end up with hundreds of small files rather than fewer large ones.</p><p>These small files make writes fast (parallelized, no need to coordinate much), but reads slow, because:</p><ol><li><p>Spark has to open and scan thousands of files.</p></li><li><p>Each file triggers task scheduling overhead.</p></li><li><p>Metadata operations dominate actual data scanning.</p></li></ol><h3>Why It Hurts Read and Write Performance</h3><ul><li><p><strong>Writes</strong>: Small files are easy to generate, but repartitioning to reduce them adds shuffle cost.</p></li><li><p><strong>Reads</strong>: File explosion causes Spark to spend more time on metadata + scheduling than actual compute</p></li></ul><blockquote><p>&#128161; <em>Reads happen more often than writes, just like code is read far more often than it is written.</em></p></blockquote><h3>Impact on Batch and Streaming Jobs</h3><ul><li><p><strong>Batch workloads</strong>: Small files accumulate, degrading performance of downstream queries and compaction jobs.</p></li><li><p><strong>Streaming workloads</strong>: Streaming prioritizes low-latency writes, which naturally produce small files. If compaction is forced inline, it adds latency; often unacceptable for SLA-driven pipelines.</p></li></ul><h3>Traditional Solutions in Spark</h3><p>Before open table formats, the common fix was to repartition/coalesce before write or reduce the number of cores/tasks when starting the job.</p><p>Before Write:</p><pre><code><code>df.repartition(10).write.format("parquet").save("...")
OR
df.coalesce(10).write.format("parquet").save("...")</code></code></pre><p>Setting Cores at Spark Job (Gives 10 in total tasks):</p><pre><code><code>spark.executor.cores 2 
spark.executor.instances 5</code></code></pre><ul><li><p>Repartition triggers a full shuffle (expensive).</p></li><li><p>Coalesce reduces files but can cause data skew.</p></li><li><p>With <code>N</code> cores, you typically get <code>N</code> output files (10 cores = 10 files).</p></li></ul><p>In all the above scenarios, we set the file count to 10, which requires you to manually determine the right number; unlike in open table format solutions.</p><h3>Open Table Format Solutions</h3><p>Modern table formats provide smarter, managed ways to handle small files. Since open table formats support versioning, they can perform these optimizations efficiently and smartly without disrupting the user experience.</p><h4>Offline Optimization (Table &amp; Partition Level)</h4><p>This feature is commonly known as compaction which periodically merges many small files into fewer large ones per table or partition.</p><ul><li><p>This is asynchronous process and conceptually similar to <code>coalesce</code>; fewer output files but efficient. It makes data available asap but makes it read efficient later.</p></li><li><p>Example: Merge 100 files into 10 larger ones. Number of files depend on the file size which can be configured through spark config if needed.</p></li><li><p>Still requires scanning files, so the more small files you start with, the more expensive compaction becomes.</p></li><li><p>Great for batch data where compaction can run right after the job is done.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!snbB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!snbB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 424w, https://substackcdn.com/image/fetch/$s_!snbB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 848w, https://substackcdn.com/image/fetch/$s_!snbB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 1272w, https://substackcdn.com/image/fetch/$s_!snbB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!snbB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png" width="1456" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:228682,&quot;alt&quot;:&quot;Allows to make data available quickly and perform offline optimization for 10x performance&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/172501238?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Allows to make data available quickly and perform offline optimization for 10x performance" title="Allows to make data available quickly and perform offline optimization for 10x performance" srcset="https://substackcdn.com/image/fetch/$s_!snbB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 424w, https://substackcdn.com/image/fetch/$s_!snbB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 848w, https://substackcdn.com/image/fetch/$s_!snbB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 1272w, https://substackcdn.com/image/fetch/$s_!snbB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87fd2c2e-53f1-4e79-bf6f-d6c3ae707b9f_3351x1151.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Allows to make data available quickly and perform offline optimization for 10x performance</figcaption></figure></div><h4>Online Optimization (Task Level)</h4><p>Instead of blindly writing one file per task, Spark can group writes more intelligently before write. Delta calls it <code>optimized write</code>.</p><ul><li><p>This is synchronous process and conceptually similar to <code>coalesce</code>; fewer output files but efficient. It adds latency but once data is available its read efficient.</p></li><li><p>Example: Instead of 100 small files, Spark writes 50 medium-sized files.</p></li><li><p>Makes later compaction faster, since it only needs to process 50 files instead of 100.</p></li><li><p>Streaming jobs that writes small files can take advantages of this along with compaction when the SLA permits.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gETM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gETM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 424w, https://substackcdn.com/image/fetch/$s_!gETM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 848w, https://substackcdn.com/image/fetch/$s_!gETM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 1272w, https://substackcdn.com/image/fetch/$s_!gETM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gETM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png" width="1456" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:252645,&quot;alt&quot;:&quot;Allows to make data available quickly with 2x read performance and perform offline optimization for 5x performance&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/172501238?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Allows to make data available quickly with 2x read performance and perform offline optimization for 5x performance" title="Allows to make data available quickly with 2x read performance and perform offline optimization for 5x performance" srcset="https://substackcdn.com/image/fetch/$s_!gETM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 424w, https://substackcdn.com/image/fetch/$s_!gETM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 848w, https://substackcdn.com/image/fetch/$s_!gETM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 1272w, https://substackcdn.com/image/fetch/$s_!gETM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70f33e02-f1bd-4c37-ab96-89cd7f8bbe83_3351x1151.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Allows to make data available quickly with 2x read performance and perform offline optimization for 5x performance</figcaption></figure></div><h3><strong>Decision Flow</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7QKe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7QKe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 424w, https://substackcdn.com/image/fetch/$s_!7QKe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 848w, https://substackcdn.com/image/fetch/$s_!7QKe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!7QKe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7QKe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png" width="1456" height="801" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:801,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:324107,&quot;alt&quot;:&quot;Decision flow: helping you pick the right optimization.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/172501238?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Decision flow: helping you pick the right optimization." title="Decision flow: helping you pick the right optimization." srcset="https://substackcdn.com/image/fetch/$s_!7QKe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 424w, https://substackcdn.com/image/fetch/$s_!7QKe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 848w, https://substackcdn.com/image/fetch/$s_!7QKe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!7QKe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41efe7be-8d30-48b5-a1ba-341dc2f409fc_2784x1532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Decision flow: helping you pick the right optimization.</figcaption></figure></div><h2>Conclusion</h2><p>The small file problem is a silent killer of Spark performance. While small files make writes faster, they heavily penalize reads. Open table formats like Delta, Iceberg, and Hudi provide optimization features that strike the right balance. Solving this problem can unlock up to a 100x boost in read performance, crucial for truly scalable Lakehouse analytics.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div>]]></content:encoded></item><item><title><![CDATA[Shopify Data Tech Stack]]></title><description><![CDATA[Explore what tech stack is used at Shopify to process 284 million peak requests per minute generating $11+ billions in sales.]]></description><link>https://www.junaideffendi.com/p/shopify-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/shopify-data-tech-stack</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 08 Nov 2025 17:30:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_pGP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Learn how Shopify handles hundreds of millions of peak requests per minute, powering billions in sales through a robust, scalable infrastructure. This overview shares the key tools, architectures, and innovations Shopify leverages for data ingestion, processing, storage, and analytics to support global commerce.</p><h3>Metrics</h3><ul><li><p>284 million peak requests per minute to Shopify&#8217;s app servers during peak sales events, powering $11.5billion in sales over four days in 2024, supported by 99.9% uptime.</p></li><li><p>$4.6million in peak sales per minute and up to 967,000 requests per second during Black Friday and Cyber Monday.</p></li><li><p>Shopify&#8217;s infrastructure is 100% powered by Google Cloud, leveraging the same dependable network as Gmail, Search, and YouTube.</p></li><li><p>Kafka has handled 66 million messages per second at peak.</p></li><li><p>76k Spark jobs with 300 TB processed per day.</p></li><li><p>Airflow has 10,000 DAGs with 400 tasks running at a given moment and over 150,000 runs executed per day. </p></li><li><p>DBT has 100+ models with 400+ unit tests running on average under ~3min.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_pGP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_pGP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 424w, https://substackcdn.com/image/fetch/$s_!_pGP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 848w, https://substackcdn.com/image/fetch/$s_!_pGP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 1272w, https://substackcdn.com/image/fetch/$s_!_pGP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_pGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png" width="1456" height="964" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:964,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2007189,&quot;alt&quot;:&quot;Shopify Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/170024744?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Shopify Data Tech Stack" title="Shopify Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!_pGP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 424w, https://substackcdn.com/image/fetch/$s_!_pGP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 848w, https://substackcdn.com/image/fetch/$s_!_pGP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 1272w, https://substackcdn.com/image/fetch/$s_!_pGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86c65bc8-4ea1-456b-bf98-9d8a514f8bc2_2367x1567.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Shopify Data Tech Stack</figcaption></figure></div><blockquote><p>Content is based on multiple sources including Shopify Blog, GCP Blog and other public articles etc. You will find references to dive deep as you read.</p></blockquote><h3>Platform</h3><h4>Google Cloud Platform</h4><p>Shopify operates fully on Google Cloud Platform, benefiting from GCP&#8217;s reliability, scalability, and security. The partnership enables Shopify to deliver <a href="https://shopify.engineering/shopify-infrastructure-collaboration-with-google">seamless merchant experiences</a> during surges like BFCM and holiday sales, with near-instant scaling to accommodate billions of app server requests. GCP&#8217;s advanced infrastructure is integral for Shopify&#8217;s uptime, performance, and integration of AI-powered solutions</p><blockquote><p>&#128214;Integration details: <a href="https://www.shopify.com/partners/featured-partner/google-cloud">Shopify Partner Spotlight: Google Cloud</a></p></blockquote><h3>Messaging System</h3><h4>Kafka</h4><p>Kafka is the backbone for Shopify&#8217;s messaging, providing real-time streaming and event-driven architecture vital for ecommerce scale. It supports order processing, inventory updates, and notifications, enabling high availability and reliable communications across distributed systems.</p><blockquote><p>&#128214;More on tech choices: <a href="https://blog.bytebytego.com/p/shopify-tech-stack">Shopify Tech Stack - ByteByteGo</a></p></blockquote><h3>Processing</h3><h4><strong>Beam/DataFlow</strong></h4><p>Shopify leverages Apache Beam running on DataFlow to orchestrate large data processing jobs in both batch and streaming modes. This enables scalable ETL workflows and near real-time analytics for customer events and operational metrics.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://shopify.engineering/how-shopify-improved-consumer-search-intent-with-real-time-ml">How Shopify Improved Consumer Search Intent with ML</a></p></blockquote><h4>Spark (Starscream)</h4><p>Shopify built Starscream, an internal PySpark-based data pipeline platform, to efficiently run tens of thousands of jobs processing hundreds of terabytes of data daily. It abstracts common complex patterns into reusable components.</p><p>It was designed to accelerate translation from SQL prototypes to scalable PySpark jobs, empowering teams to build robust data models quickly and at scale.</p><blockquote><p>&#128214; More on Starscream: <a href="https://shopify.engineering/complex-data-models-behind-shopify-tax-insights">The Complex Data Models Behind Shopify's Tax Insights Feature</a></p></blockquote><h4><strong>Trino</strong></h4><p>Trino serves as a query engine for interactive analytics, federating data across storage systems such as BigQuery. It excels at large-scale SQL workloads, offering low-latency access to Lakehouse for analysts.</p><blockquote><p>&#128214; Read More: <a href="https://shopify.engineering/faster-trino-query-execution-infrastructure">Shopify's Path to a Faster Trino Query Execution: Infrastructure</a></p></blockquote><h3>Orchestrator</h3><h4>Airflow</h4><p>Shopify&#8217;s adoption of Apache Airflow delivers robust DAG-driven automation for ETL, model training, and data pipeline management. The team has shared scaling lessons and customizations that support thousands of workflows and seamless production deployments, essential for complex scheduling requirements as data volumes grow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LW_b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LW_b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 424w, https://substackcdn.com/image/fetch/$s_!LW_b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 848w, https://substackcdn.com/image/fetch/$s_!LW_b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 1272w, https://substackcdn.com/image/fetch/$s_!LW_b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LW_b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png" width="768" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a49df65-a45c-4614-b093-fde3456c4068_768x388.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:768,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;System diagram showing Shopify's Airflow Architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="System diagram showing Shopify's Airflow Architecture" title="System diagram showing Shopify's Airflow Architecture" srcset="https://substackcdn.com/image/fetch/$s_!LW_b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 424w, https://substackcdn.com/image/fetch/$s_!LW_b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 848w, https://substackcdn.com/image/fetch/$s_!LW_b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 1272w, https://substackcdn.com/image/fetch/$s_!LW_b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a49df65-a45c-4614-b093-fde3456c4068_768x388.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Shopify&#8217;s Airflow Architecture</figcaption></figure></div><blockquote><p>&#128214; Recommended Reading: <a href="https://shopify.engineering/lessons-learned-apache-airflow-scale">Lessons Learned Scaling Apache Airflow</a></p></blockquote><h3>Warehouse</h3><h4>BigQuery</h4><p>BigQuery is Shopify&#8217;s central data warehouse, powering analytics, reporting, and dashboarding for business teams and merchants. Its seamless scalability and rapid-query response underpin key features such as real-time sales insights, optimization reports, and recommendation engines, with production-grade data managed via DBT.</p><blockquote><p>&#127897;&#65039;Recommended Podcast: <a href="https://www.dataengineeringpodcast.com/episodepage/how-shopify-is-building-their-production-data-warehouse-using-dbt">Data Warehouse at Shopify</a></p></blockquote><h3>Transformation</h3><h4>DBT</h4><p>Shopify relies heavily on DBT for scaling their transformation and unit testing used by 200+ Data Scientists across the company. </p><p>Along with Starscream, they also have a tool called Seamster, that allows users to skip the conversion to PySpark and build scalable pipelines using DBT and BigQuery. </p><blockquote><p>&#128214;More on Seamster: <a href="https://shopify.engineering/build-production-grade-workflow-sql-modelling">Production Grade Workflow with SQL Modelling</a> </p></blockquote><p>Below is a detailed video on how they perform unit testing at scale. </p><div id="youtube2-dlFYP7EJiUU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;dlFYP7EJiUU&quot;,&quot;startTime&quot;:&quot;29s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/dlFYP7EJiUU?start=29s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3>Lakehouse</h3><h4>Google Cloud Storage &amp; Iceberg</h4><p>Raw and staged data lands in GCS, forming the foundation for Shopify&#8217;s lake architecture. GCS works seamlessly with Iceberg to provide a Lakehouse architecture that is easily accessible by hundreds of downstream users and consumers.</p><blockquote><p>&#127909;Recommended Video: <a href="https://www.youtube.com/watch?v=z_cA9A-77Kk">Iceberg at Shopify</a> </p></blockquote><h3>Data Store</h3><h4>Druid</h4><p>Druid is adopted for high-performance, low-latency OLAP analytics. It serves operational dashboards and provides fast filtering, drill-down capabilities, and aggregated metrics for merchants and internal teams, supporting millions of queries daily, read more at <a href="https://druid.apache.org/druid-powered/#shopify">Apache Druid</a>.</p><h3>Dashboard</h3><h4>Polaris-Viz</h4><p>Polaris-viz is Shopify&#8217;s custom React visualization library, designed initially for both internal and external dashboards. While open source is <a href="https://github.com/Shopify/polaris-viz/blob/main/README.md">deprecated</a> but internally is still widely adopted.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j7zu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j7zu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 424w, https://substackcdn.com/image/fetch/$s_!j7zu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 848w, https://substackcdn.com/image/fetch/$s_!j7zu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 1272w, https://substackcdn.com/image/fetch/$s_!j7zu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j7zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png" width="1242" height="667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:667,&quot;width&quot;:1242,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j7zu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 424w, https://substackcdn.com/image/fetch/$s_!j7zu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 848w, https://substackcdn.com/image/fetch/$s_!j7zu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 1272w, https://substackcdn.com/image/fetch/$s_!j7zu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09f10d74-ac46-4d2a-bc9a-1230e21e1eb3_1242x667.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A glimpse of the viz from the source below.</figcaption></figure></div><blockquote><p>&#128214; Background: <a href="https://shopify.engineering/react-library-consistent-data-visualization">React Library for Consistent Visualization</a></p></blockquote><div><hr></div><p><strong>Related Content:</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0f96f149-5fd1-49ad-ad8c-51b5672ffaee&quot;,&quot;caption&quot;:&quot;Explore how Spotify processes over 1.4 trillion data points daily to power personalized experiences for hundreds of millions of users worldwide. This overview distills the essential tools, architectures, and innovations Spotify employs for data ingestion, processing, storage, and analytics.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Spotify Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-16T16:30:35.825Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!S_F0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/spotify-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:165484212,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ddb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;95dd0a7e-faeb-4acb-9b97-738b98b2d239&quot;,&quot;caption&quot;:&quot;DoorDash has been a leader in the food delivery service industry, with over 5 billion consumer orders, more than $100 billion in merchant sales, and over $35 billion earned by Dashers. A key factor in their success is their data-driven approach, ingesting massive amounts of event-driven data daily to make informed decisions.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DoorDash Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-26T16:30:41.413Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/doordash-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:159625272,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:20,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Shopify&#8217;s tech stack relies on GCP and other core technologies for building a modern scalable architecture that can scale well under extreme demands for both online and offline processes.</p>]]></content:encoded></item><item><title><![CDATA[Inside Data Engineering with Erfan Hesami]]></title><description><![CDATA[Join Erfan Hesami as he shares his experience in the world of data engineering, offering insights, exploring challenges, and highlighting emerging industry trends.]]></description><link>https://www.junaideffendi.com/p/inside-data-engineering-with-erfan</link><guid isPermaLink="false">https://www.junaideffendi.com/p/inside-data-engineering-with-erfan</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 11 Oct 2025 16:30:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sOJK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, we're joined by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;id&quot;:277538242,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;uuid&quot;:&quot;676cc000-5c06-4777-83fc-b1bfc0da947f&quot;}" data-component-name="MentionToDOM"></span> from <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Pipeline to Insights&quot;,&quot;id&quot;:42238863,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd98ddb69-fdec-4599-b3f2-906f7673c8de_408x408.png&quot;,&quot;uuid&quot;:&quot;7d9cfb14-b545-44f6-897b-05fb350db7ed&quot;}" data-component-name="MentionToDOM"></span>, who&#8217;s been working in Data and Analytics Engineering for the last 5 years.</p><p>To recap: the series follows a Q&amp;A format, featuring professionals who share their journeys, insights, and challenges.</p><h3><strong>What to Expect:</strong></h3><ul><li><p><strong>Behind the Scenes</strong> &#8211; Get a close-up view of the real work, rhythms, and responsibilities of data engineers in action.</p></li><li><p><strong>Getting Started</strong> &#8211; Dive into the essential skills, tools, and entry points that open doors to a data engineering career.</p></li><li><p><strong>Industry Watch</strong> &#8211; Stay informed on emerging trends, evolving tech stacks, and shifts driving the future of data engineering.</p></li><li><p><strong>The Real Work</strong> &#8211; Go beyond the theory to explore the gritty, unexpected challenges engineers solve in the wild.</p></li><li><p><strong>Debunking the Hype</strong> &#8211; Clear up common myths and misconceptions about what data engineers actually do.</p></li><li><p><strong>From the Trenches</strong> &#8211; Learn from the experiences, lessons, and advice of seasoned professionals working in the field.</p></li></ul><div class="pullquote"><p><em><strong>&#11088; If you're curious about data engineering or considering it as a career, this series is for you!</strong></em></p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sOJK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sOJK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!sOJK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!sOJK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!sOJK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sOJK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3212193,&quot;alt&quot;:&quot;Inside Data Engineering with Erfan Hesami&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/168216306?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside Data Engineering with Erfan Hesami" title="Inside Data Engineering with Erfan Hesami" srcset="https://substackcdn.com/image/fetch/$s_!sOJK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!sOJK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!sOJK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!sOJK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33daa3c-4b1e-4357-9a93-7e5cfe43ffa6_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inside Data Engineering with Erfan Hesami</figcaption></figure></div><p>Let&#8217;s dive into Inside Data Engineering:</p><h4>How would you describe Data Engineering?</h4><p>In one sentence, data engineering means making data accessible for stakeholders to create business value. Although it sounds simple, there are many factors involved: <strong>why </strong>the data matters, <strong>what </strong>data is needed, and <strong>how </strong>it should be handled. The <em>how</em> is especially important, which is why we need to understand how to design systems, architect solutions, and model data in a way that maximises business value while keeping costs optimised.</p><h4><strong>How did you end up being a Data Engineer?</strong></h4><p>I have a Bachelor's degree in IT Engineering and worked for about two years as a programmer. Later, I decided to pursue a Master's in Business Analytics with a specialisation in Data Science. After graduating, I started working as a Data Analyst. One of the companies I worked for needed me to maintain and improve their existing data pipelines and implement a data quality framework to address data quality challenges, that&#8217;s what sparked my interest. I realised that the work I was doing closely resembled that of a Data Engineer, and it motivated me to dive deeper. I started learning from the data engineers in the company, observing how they source data, the technologies they use, and even asking to explore their code repositories. Over time, I kept learning more and gradually began contributing like a Data Engineer.</p><p>Here you can read the full story of <a href="https://pipeline2insights.substack.com/p/from-analytics-to-data-engineering">My Journey from Data Analyst to Data Engineer</a>, along with my advice and recommendations for others looking to make the same transition. Also, I highly recommend checking out the below article:</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:157449239,&quot;url&quot;:&quot;https://pipeline2insights.substack.com/p/how-to-transition-from-data-analytics&quot;,&quot;publication_id&quot;:3044966,&quot;publication_name&quot;:&quot;Pipeline To Insights&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!bQ6b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39f0f325-1faa-4944-817f-c8ac03724007_382x382.png&quot;,&quot;title&quot;:&quot;How to Transition from Data Analytics to Data Engineering&quot;,&quot;truncated_body_text&quot;:&quot;In our previous post, one of the co-founders of Pipeline To Insights shared his reason for transitioning into data engineering. We also conducted a poll, and based on the results, we've decided to write this post specifically for data analysts and professionals in similar roles looking to switch to data engineering.&quot;,&quot;date&quot;:&quot;2025-02-21T10:58:22.297Z&quot;,&quot;like_count&quot;:19,&quot;comment_count&quot;:2,&quot;bylines&quot;:[{&quot;id&quot;:277538242,&quot;name&quot;:&quot;Erfan Hesami&quot;,&quot;handle&quot;:&quot;erfanhesami&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9e2692f-48e0-43a5-9f33-7eebb007bd6e_1641x1641.jpeg&quot;,&quot;bio&quot;:&quot;Former Data Analyst turned Data Engineer, sharing what I learn as I grow in the field.&quot;,&quot;profile_set_up_at&quot;:&quot;2024-10-15T05:14:33.231Z&quot;,&quot;reader_installed_at&quot;:&quot;2025-04-24T12:23:30.328Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:4960374,&quot;user_id&quot;:277538242,&quot;publication_id&quot;:3044966,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:3044966,&quot;name&quot;:&quot;Pipeline To Insights&quot;,&quot;subdomain&quot;:&quot;pipeline2insights&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Pipeline To Insights is a community-driven blog by passionate Data Engineers, sharing real-world experiences, technical tutorials, and personal reflections to inspire growth and continuous learning in the evolving world of data and AI.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39f0f325-1faa-4944-817f-c8ac03724007_382x382.png&quot;,&quot;author_id&quot;:42238863,&quot;primary_user_id&quot;:42238863,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2024-09-20T01:51:37.420Z&quot;,&quot;email_from_name&quot;:&quot;Pipeline to Insights&quot;,&quot;copyright&quot;:&quot;Erfan Hesami&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://pipeline2insights.substack.com/p/how-to-transition-from-data-analytics?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!bQ6b!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39f0f325-1faa-4944-817f-c8ac03724007_382x382.png" loading="lazy"><span class="embedded-post-publication-name">Pipeline To Insights</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">How to Transition from Data Analytics to Data Engineering</div></div><div class="embedded-post-body">In our previous post, one of the co-founders of Pipeline To Insights shared his reason for transitioning into data engineering. We also conducted a poll, and based on the results, we've decided to write this post specifically for data analysts and professionals in similar roles looking to switch to data engineering&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 19 likes &#183; 2 comments &#183; Erfan Hesami</div></a></div><h4><strong>What's your day-to-day look like?</strong></h4><p>My day-to-day typically involves gathering requirements to understand business problems, identifying whether the necessary data sources exist, and determining if any new automation is needed. I also maintain and improve existing pipelines and help address issues related to data quality, governance, or infrastructure. I regularly attend meetings with different stakeholders to understand their needs and explore how our team can support them.</p><h4><strong>Who do you typically work with across teams?</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XVzB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XVzB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!XVzB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!XVzB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!XVzB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XVzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:175612,&quot;alt&quot;:&quot;Upstream and Downstream Teams&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/168216306?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Upstream and Downstream Teams" title="Upstream and Downstream Teams" srcset="https://substackcdn.com/image/fetch/$s_!XVzB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!XVzB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!XVzB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!XVzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F669717d1-bc3c-43fe-a870-5ef9887f2fa8_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Upstream and Downstream Teams</figcaption></figure></div><ul><li><p><strong>Software Engineers</strong>: We work together to understand how internal/external applications function and explore ways to improve data capture, especially for storing historical data and enhancing data quality at the source. I also support them in identifying what information needs to be collected to solve specific business problems and assist with data modelling for transactional databases.</p></li><li><p><strong>Vendors</strong>: External data engineers or teams who share data with us from outside the company. I collaborate with them to ensure the data is reliable, well-documented, and integrated smoothly into our systems.</p></li><li><p><strong>Engineering Executives</strong>: I consult with them on architectural decisions, technology choices, and long-term solutions to align with strategic goals.</p></li><li><p><strong>BI Developers</strong>: I collaborate with them to provide the necessary data in the right format and at the right frequency for reporting and analytics.</p></li><li><p><strong>ML/AI Engineers</strong>: I work with them to understand what data they need, in what format, and how often. Their focus is on building machine learning models and AI applications to solve business problems.</p></li></ul><h4><strong>What real-world business problems do you solve through data?</strong></h4><p>One of the projects I'm most proud of was creating a 360-degree view of customer data as a single source of truth. Previously, data was scattered across different systems, and teams were generating reports based on inconsistent sources. There was no clear data dictionary, no data lineage, and very limited governance; many people didn&#8217;t even know where certain data lived. I took the initiative to centralise and model this data, creating a unified customer view that could be used company-wide. This helped teams build their own data marts more confidently and consistently.</p><h4><strong>What kind of projects do you work on?</strong></h4><p>I work on building and optimising data infrastructure, ensuring reliable data pipelines and high data quality, and designing systems that support analytics and AI. This includes integrating data from various sources, managing complex transformations, and improving data accessibility for business teams. Some examples include sourcing and modelling data for an AI Copilot project to support account managers, and helping stakeholders detect unprofitable contracts so they can take timely action.</p><h4><strong>What kind of data do you work with?</strong></h4><p>I work with both structured and semi-structured data, including traditional fields and columns, as well as open text.</p><h4><strong>What data size do you work with?</strong></h4><p>I have experience working with data at terabyte and even petabyte scale, though in my current role, the data volume is typically a few terabytes.</p><h4><strong>What tech stack do you use?</strong></h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d-T8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d-T8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 424w, https://substackcdn.com/image/fetch/$s_!d-T8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 848w, https://substackcdn.com/image/fetch/$s_!d-T8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!d-T8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d-T8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png" width="1456" height="923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:923,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:999915,&quot;alt&quot;:&quot;Erfan&#8217;s Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/168216306?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Erfan&#8217;s Tech Stack" title="Erfan&#8217;s Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!d-T8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 424w, https://substackcdn.com/image/fetch/$s_!d-T8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 848w, https://substackcdn.com/image/fetch/$s_!d-T8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!d-T8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d9b1abc-79ea-49bf-8227-d28d4ab015c7_2367x1500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Erfan&#8217;s Tech Stack</figcaption></figure></div><ul><li><p>Azure: All open-source components are hosted and managed within the Azure cloud environment.</p></li><li><p>Azure storage: Used to store raw and processed data for analytics and AI use cases.</p></li><li><p>Postgres: Used as a storage layer for analytical purposes.</p></li><li><p>Data load tool (dlt): Used to ingest data from various sources efficiently.</p></li><li><p>Dagster: Automate and schedule data workflows</p></li><li><p>Data build tool (dbt): Used for transforming and modelling data.</p></li><li><p>Great Expectations: Used for data quality.</p></li></ul><h4><strong>What programming languages do you use?</strong></h4><p>Python and SQL are primary languages for transformation, tooling, etc.</p><h4><strong>What tools do you leverage for GenAI?</strong></h4><p>I use Claude for programming assistance, as well as Ollama and Llama models for running and experimenting with generative AI locally.</p><h4><strong>What is your favorite area of Data Engineering?</strong></h4><p>My favourite areas of data engineering are data quality and data modelling because they are fundamental to building reliable, trustworthy, and scalable data systems.</p><h4><strong>How can Data Engineering benefit from GenAI?</strong></h4><p>I use generative AI models to create test cases and generate documentation. Some open-source tools have model-powered documentation that allows you to quickly search and find relevant information, making development and troubleshooting much more efficient.</p><h4><strong>What advice would you give your past self as a beginner Data Engineer?</strong></h4><p>For beginners, I&#8217;d suggest focusing on the fundamentals first and not getting lost chasing certifications alone. Early on, I thought that collecting certifications for specific platforms would make me stand out. But I soon realised that not all companies use or require those platforms. More importantly, it&#8217;s crucial to understand how these technologies work and to explore the open-source ecosystem. Working on side projects can really help build practical skills. I&#8217;m not saying certifications aren&#8217;t valuable, but always ask yourself why you&#8217;d choose one technology over another; understanding your options is key.</p><h4><strong>What are some challenging aspects of Data Engineering?</strong></h4><p>One of the biggest challenges is making decisions across many areas, especially when it comes to architecture. It&#8217;s critical to think carefully and always consider the overall system design. Choosing the right tools can also be difficult because new tools constantly emerge, and deciding which one fits best requires experience and judgment. Additionally, there are many ways to implement solutions, and every engineer or team tends to find their own approach, which adds complexity and variability.</p><p><strong>What are some common misconceptions about data engineering?</strong></p><ul><li><p>Data engineering is just about creating data pipelines.</p></li><li><p>Data engineering is only writing SQL.</p></li><li><p>Data engineers should only focus on engineering tasks.</p></li></ul><p><strong>Reality:</strong></p><ul><li><p>Data engineers should also understand analytics and be good analysts, especially in smaller teams.</p></li><li><p>When stakeholders bring data problems, data engineers often analyze feasibility and provide insights.</p></li><li><p>Creating dashboards, charts, and telling data-driven stories can also be part of the role.</p></li><li><p>Data governance and data quality are critical responsibilities that data engineers must manage to ensure trustworthy and compliant data.</p></li></ul><h4><strong>What advice do I have for new beginners?</strong></h4><ul><li><p>Don&#8217;t jump into trending technologies just for the sake of it, some tools are powerful, but not every company uses them, and not every data problem requires them.</p></li><li><p>Focus on mastering the fundamentals first; don&#8217;t sacrifice core understanding just to follow hype.</p></li><li><p>Learn from others by reading articles, attending meetups and conferences, and reading books. A book like <em>Fundamentals of Data Engineering by </em><span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Joe Reis&quot;,&quot;id&quot;:3531217,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e4716b1-c223-41e3-b943-def0291bf217_1175x783.jpeg&quot;,&quot;uuid&quot;:&quot;9b354c4c-a39a-4541-bdfb-165cbbad99fd&quot;}" data-component-name="MentionToDOM"></span> <em>and Matt Housley re</em>ally helped me connect the dots and build a strong foundation.</p></li><li><p>Most importantly, apply what you learn through hands-on projects to build real understanding and confidence.</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>I hope this article was helpful for the readers. Thanks to Erfan for sharing his experience with my audience. Stay tuned for more!</p><p>Please reach out if you like:</p><ul><li><p>To be the guest and share your experiences &amp; journey.</p></li><li><p>To provide feedback and suggestions on how we can improve the quality of questions.</p></li><li><p>To suggest guests for the future articles.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[How Delta Lake Works]]></title><description><![CDATA[Understand how Delta Lake handles reads and writes using the transaction log, ensuring ACID guarantees through snapshot isolation and optimistic concurrency control.]]></description><link>https://www.junaideffendi.com/p/how-delta-lake-works</link><guid isPermaLink="false">https://www.junaideffendi.com/p/how-delta-lake-works</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 06 Sep 2025 16:30:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_1xV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Delta Lake is an open-source storage layer that brings ACID transactions, schema enforcement, and unified streaming + batch processing to cloud data lakes. Originally developed by Databricks, it was created to address data consistency and reliability issues in traditional data lakes.</p><p>Delta allows compute engines like Apache Spark, Trino, and Flink to access the same data reliably via a transaction log that records every change made to a table.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_1xV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_1xV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 424w, https://substackcdn.com/image/fetch/$s_!_1xV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 848w, https://substackcdn.com/image/fetch/$s_!_1xV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 1272w, https://substackcdn.com/image/fetch/$s_!_1xV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_1xV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png" width="1456" height="987" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:987,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:560968,&quot;alt&quot;:&quot;Delta Lake Components&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/165483998?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Delta Lake Components" title="Delta Lake Components" srcset="https://substackcdn.com/image/fetch/$s_!_1xV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 424w, https://substackcdn.com/image/fetch/$s_!_1xV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 848w, https://substackcdn.com/image/fetch/$s_!_1xV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 1272w, https://substackcdn.com/image/fetch/$s_!_1xV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49c2f7fa-67fc-401b-bd8d-4bc7b6712d9a_2367x1604.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Delta Lake Components, icon: <a href="https://www.flaticon.com/">source</a></figcaption></figure></div><h3>Architecture &amp; Components</h3><p>Delta Lake&#8217;s architecture centers around a small set of well-defined components:</p><ul><li><p><strong>Parquet Data Files</strong>: The actual table data is stored in columnar Parquet format within a directory.</p></li><li><p><strong>_delta_log/</strong>: A transaction log directory containing a series of JSON and Parquet files that track all changes to the table. This log ensures ACID guarantees.</p><ul><li><p><strong>Table Metadata</strong>: Stored in the JSON files, includes schema, partitioning, and table properties (example shared later).</p></li><li><p><strong>Checkpoint</strong>: Stored in the Parquet file, includes the optimized snapshot of table state. </p></li><li><p><strong>Commit Protocol</strong>: Ensures atomic commits using optimistic concurrency control. Writers coordinate through exclusive commit file naming (e.g. 00001.json).</p></li></ul></li><li><p><strong>Readers/Writers</strong>: Clients read consistent snapshots by replaying the log, and writers create new log entries to record changes. Each transaction creates a new JSON file.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nTX5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nTX5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 424w, https://substackcdn.com/image/fetch/$s_!nTX5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 848w, https://substackcdn.com/image/fetch/$s_!nTX5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 1272w, https://substackcdn.com/image/fetch/$s_!nTX5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nTX5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png" width="624" height="421.6216216216216" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:592,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:84020,&quot;alt&quot;:&quot;Delta Log Directory&quot;,&quot;title&quot;:&quot;Delta Log Directory&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/165483998?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Delta Log Directory" title="Delta Log Directory" srcset="https://substackcdn.com/image/fetch/$s_!nTX5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 424w, https://substackcdn.com/image/fetch/$s_!nTX5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 848w, https://substackcdn.com/image/fetch/$s_!nTX5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 1272w, https://substackcdn.com/image/fetch/$s_!nTX5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a562d6-a557-40c4-8f25-ab935a27544d_592x400.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Delta Log Directory: <a href="https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf">Source</a></figcaption></figure></div><div class="pullquote"><p>&#128161;You can create a Delta table directly without the need of a catalog.</p></div><h3>Internal Mechanics</h3><h4>Write Flow</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lhhU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lhhU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 424w, https://substackcdn.com/image/fetch/$s_!lhhU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 848w, https://substackcdn.com/image/fetch/$s_!lhhU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 1272w, https://substackcdn.com/image/fetch/$s_!lhhU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lhhU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png" width="624" height="574.7142857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1341,&quot;width&quot;:1456,&quot;resizeWidth&quot;:624,&quot;bytes&quot;:381813,&quot;alt&quot;:&quot;Delta Lake Writer Flow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/165483998?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Delta Lake Writer Flow" title="Delta Lake Writer Flow" srcset="https://substackcdn.com/image/fetch/$s_!lhhU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 424w, https://substackcdn.com/image/fetch/$s_!lhhU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 848w, https://substackcdn.com/image/fetch/$s_!lhhU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 1272w, https://substackcdn.com/image/fetch/$s_!lhhU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc1b6865-5b40-4174-9d6c-fb54cd5cf051_1907x1757.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Delta Lake Writer Flow, icon: <a href="https://www.flaticon.com/">source</a></figcaption></figure></div><ol><li><p><strong>Plan:</strong> Read latest snapshot (if needed) to identify affected files.</p></li><li><p><strong>Stage:</strong> Write new Parquet files for all changes.</p></li><li><p><strong>Log:</strong> Create JSON "add" (and optional "remove"/"metadata") actions for each change.</p><pre><code><code>{
  "add": {
    "path": "part-00001-1234abcd.snappy.parquet",
    "partitionValues": {
      "year": "2024",
      "month": "06"
    },
    "size": 345678,
    "modificationTime": 1686780000000,
    "dataChange": true,
    "stats": "{\"numRecords\": 50000, \"minValues\": {\"id\": 1}, \"maxValues\": {\"id\": 50000}}"
  }
}</code></code></pre></li><li><p><strong>Validate:</strong> Check for conflicting concurrent commits using optimistic concurrency.</p></li><li><p><strong>Commit:</strong> If no conflict, write a new log file (e.g., <code>00000123.json</code>) to <code>_delta_log/</code>.</p></li><li><p><strong>Checkpoint:</strong> Periodically write Parquet checkpoints to speed up reads.</p></li></ol><div class="pullquote"><p>&#128161;Writes use optimistic concurrency, if two writers try to commit overlapping changes, only one succeeds.</p></div><h4>Read Flow</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S-48!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S-48!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 424w, https://substackcdn.com/image/fetch/$s_!S-48!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 848w, https://substackcdn.com/image/fetch/$s_!S-48!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 1272w, https://substackcdn.com/image/fetch/$s_!S-48!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S-48!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png" width="642" height="550.7266483516484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1249,&quot;width&quot;:1456,&quot;resizeWidth&quot;:642,&quot;bytes&quot;:298297,&quot;alt&quot;:&quot;Delta Lake Read Flow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/165483998?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Delta Lake Read Flow" title="Delta Lake Read Flow" srcset="https://substackcdn.com/image/fetch/$s_!S-48!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 424w, https://substackcdn.com/image/fetch/$s_!S-48!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 848w, https://substackcdn.com/image/fetch/$s_!S-48!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 1272w, https://substackcdn.com/image/fetch/$s_!S-48!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee68b652-bb6c-4f52-957c-d0e02761a432_1819x1561.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Delta Lake Read Flow, icon: <a href="https://www.flaticon.com/">source</a></figcaption></figure></div><ol><li><p><strong>Load:</strong> Load the latest checkpoint (if available) and all subsequent JSON logs.</p></li><li><p><strong>Reconstruct:</strong> Reconstruct the current table state (active files, schema, metadata).</p></li><li><p><strong>Plan:</strong> Apply query filters and partition pruning based on metadata.</p></li><li><p><strong>Read:</strong> Read the relevant Parquet files from storage.</p></li><li><p><strong>Return:</strong> Return the final filtered and projected result to the user.</p></li></ol><div class="pullquote"><p>&#128161;Snapshot isolation: Readers see a consistent snapshot, even as new writes occur.</p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>References</h3><ul><li><p>Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores: <a href="https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf">https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf</a></p></li><li><p>Delta Lake Docs: <a href="https://docs.delta.io/latest/">https://docs.delta.io/latest/</a></p></li><li><p>GitHub: <a href="https://github.com/delta-io/delta">https://github.com/delta-io/delta</a></p></li><li><p>Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics: <a href="https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf">https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf</a></p></li></ul><div><hr></div><p>I hope this article was helpful in understanding how the Delta Transaction log works. Please share your feedback and suggestions</p><p>Stay tuned for deep dives into Z-Ordering, Liquid Clustering, and advanced concurrency control in future posts.</p><div class="poll-embed" data-attrs="{&quot;id&quot;:332208}" data-component-name="PollToDOM"></div>]]></content:encoded></item><item><title><![CDATA[Spotify Data Tech Stack]]></title><description><![CDATA[Learn how Spotify ingests 1.4T+ events daily on GCP via 38K+ data pipelines, leveraging BigQuery, Dataflow, and Flyte to power ~5K dashboards and scale data-driven insights.]]></description><link>https://www.junaideffendi.com/p/spotify-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/spotify-data-tech-stack</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 16 Aug 2025 16:30:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!S_F0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Explore how Spotify processes over 1.4 trillion data points daily to power personalized experiences for hundreds of millions of users worldwide. This overview distills the essential tools, architectures, and innovations Spotify employs for data ingestion, processing, storage, and analytics.</p><h3>Metrics</h3><ul><li><p>1.4+ trillion events processed daily.</p></li><li><p>670+ million monthly active users.</p></li><li><p>38,000+ Data Pipelines active in production environment.</p></li><li><p>Spotify runs the largest Hadoop cluster in Europe.</p></li><li><p>1800+ different event types representing interactions from Spotify users.</p></li><li><p>~5k dashboards serving to ~6k users.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S_F0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S_F0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!S_F0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!S_F0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!S_F0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S_F0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1406372,&quot;alt&quot;:&quot;Spotify Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/165484212?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Spotify Data Tech Stack" title="Spotify Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!S_F0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!S_F0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!S_F0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!S_F0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F586df7f6-5b17-425c-a53c-fb4bbcf47ec0_2367x1368.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Spotify Data Tech Stack (Icon source: flaticon.com)</figcaption></figure></div><blockquote><p>Content is based on multiple sources including Spotify Blog, Open Source websites, Job descriptions and other public articles etc. You will find references to dive deep as you read.</p></blockquote><h3><strong>Platform</strong></h3><h4><strong>Google Cloud Platform (GCP)</strong></h4><p>GCP is Spotify&#8217;s core cloud provider, supporting both compute and advanced analytics. Spotify migrated from AWS in the mid-2010s to leverage GCP's scalable infrastructure, big data, and machine learning services.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://cloud.google.com/customers/spotify">Spotify Case Study</a></p></blockquote><h3><strong>Messaging System</strong></h3><h4><strong>PubSub</strong></h4><p>Spotify moved from Kafka to GCP Pubsub for ingesting their massive amount of event driven data back in 2016.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!foTz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!foTz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 424w, https://substackcdn.com/image/fetch/$s_!foTz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 848w, https://substackcdn.com/image/fetch/$s_!foTz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 1272w, https://substackcdn.com/image/fetch/$s_!foTz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!foTz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png" width="1075" height="599" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:599,&quot;width&quot;:1075,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/165484212?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!foTz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 424w, https://substackcdn.com/image/fetch/$s_!foTz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 848w, https://substackcdn.com/image/fetch/$s_!foTz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 1272w, https://substackcdn.com/image/fetch/$s_!foTz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60712828-a7ae-480b-bbdf-ae1ef2a80c9f_1075x599.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from 2016 article, <a href="https://cloud.google.com/blog/products/gcp/spotifys-journey-to-cloud-why-spotify-migrated-its-event-delivery-system-from-kafka-to-google-cloud-pubsub?utm_source=chatgpt.com">source</a></figcaption></figure></div><p>As per recent <a href="https://engineering.atspotify.com/2024/5/data-platform-explained-part-ii">article</a>, their data platform supports automatic deployment of PubSub, anonymization pipelines and streaming jobs.</p><h3><strong>Processing</strong></h3><h4><strong>Apache Beam</strong></h4><p>Apache Beam (GCP Dataflow) is the primary processing tool used at Spotify for handling real time and batch workloads. Spotify has their open source Scala API implementation called <a href="https://engineering.atspotify.com/2017/10/big-data-processing-at-spotify-the-road-to-scio-part-1">Scio</a>. </p><div class="pullquote"><p>Scio is a high level Scala API for the Beam Java SDK created by Spotify to run both batch and streaming pipelines at scale. We run Scio mainly on the Google Cloud Dataflow runner, a fully managed service, and process data stored in various systems including most Google Cloud products, HDFS, Cassandra, Elasticsearch, PostgreSQL and more.</p><p>&#8212; <a href="https://engineering.atspotify.com/2017/10/big-data-processing-at-spotify-the-road-to-scio-part-1">source</a></p></div><h4>Apache Flink</h4><p>While most pipelines leverage Scio (Beam), Data Platform also supports Apache Flink. There is not enough public information on how exactly they leverage Flink.</p><h3><strong>Orchestrator</strong></h3><h4>Flyte</h4><p>Spotify migrated from Luigi and Flo to Flyte starting in 2019 to address challenges like fragmented orchestration logic, limited visibility, and lack of extensibility. Flyte offered a centralized service with a thin SDK, better workflow visibility, caching, and multi-language support. </p><p>Today, Spotify uses Flyte to manage and introspect data workflows at scale (38k+ jobs), while execution remains on Kubernetes via their existing Styx scheduler.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IWld!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IWld!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 424w, https://substackcdn.com/image/fetch/$s_!IWld!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 848w, https://substackcdn.com/image/fetch/$s_!IWld!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 1272w, https://substackcdn.com/image/fetch/$s_!IWld!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IWld!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png" width="1456" height="341" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:341,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;attachment_e268f0c7959d85909621e5d13a499e22&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="attachment_e268f0c7959d85909621e5d13a499e22" title="attachment_e268f0c7959d85909621e5d13a499e22" srcset="https://substackcdn.com/image/fetch/$s_!IWld!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 424w, https://substackcdn.com/image/fetch/$s_!IWld!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 848w, https://substackcdn.com/image/fetch/$s_!IWld!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 1272w, https://substackcdn.com/image/fetch/$s_!IWld!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36881dde-910d-4e60-99c6-a65c8c9d3777_2048x479.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><blockquote><p>&#128214; Recommended Reading: <a href="https://engineering.atspotify.com/2022/3/why-we-switched-our-data-orchestration-service">Why We Switched Our Data Orchestration Service</a></p></blockquote><h3><strong>Warehouse</strong></h3><h4>BigQuery</h4><p>With their migration to GCP, they also moved to BigQuery as the centralized warehouse, processing SQL based workflows through <a href="https://www.getdbt.com/resources/coalesce-on-demand/how-the-content-analytics-team-at-spotify-avoids-data-indigestion-in-bigquery-with-dbt">DBT</a> while storing all the analytical data served through dashboards tools e.g. Looker.</p><h3>Storage</h3><h4>HDFS / GCS</h4><p>Spotify maintains the largest Hadoop cluster in Europe, with the on going migration to Cloud, they serve and store data on both on premise HDFS and Google Cloud Storage.</p><blockquote><p>There&#8217;s no public information confirming whether Spotify uses a lakehouse architecture.</p></blockquote><h3><strong>Management</strong></h3><p>Spotify has in house tooling for data management as part of their Data Platform, solving the problems from the following common areas.</p><ul><li><p>Metadata</p></li><li><p>Lineage</p></li><li><p>Retention</p></li><li><p>Access Control</p></li></ul><blockquote><p>&#128214;Read more: <a href="https://engineering.atspotify.com/2024/5/data-platform-explained-part-ii">Data Management &amp; Data Processing</a></p></blockquote><h3><strong>Dashboard</strong></h3><h4><strong>Looker / Tableau</strong></h4><p><a href="https://stage.engineering.atspotify.com/2024/8/unlocking-insights-with-high-quality-dashboards-at-scale">Spotify provides both Looker and Tableau</a> as the dashboarding platforms. As per 2023, Spotify had 4900+ dashboards serving to 6000+ users across the company. </p><ul><li><p>Tableau is used for complex, highly customized dashboards; so all design flexibility is available for deep-dive internal products with specific user needs.</p></li><li><p>Looker Studio is preferred for fast, lightweight dashboards; especially among engineering and product teams, thanks to its tight integration with BigQuery and ease of SQL&#8209;to&#8209;visualization workflows.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!InXQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!InXQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 424w, https://substackcdn.com/image/fetch/$s_!InXQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 848w, https://substackcdn.com/image/fetch/$s_!InXQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 1272w, https://substackcdn.com/image/fetch/$s_!InXQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!InXQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png" width="632" height="553.5266666666666" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1051,&quot;width&quot;:1200,&quot;resizeWidth&quot;:632,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;image2&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="image2" title="image2" srcset="https://substackcdn.com/image/fetch/$s_!InXQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 424w, https://substackcdn.com/image/fetch/$s_!InXQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 848w, https://substackcdn.com/image/fetch/$s_!InXQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 1272w, https://substackcdn.com/image/fetch/$s_!InXQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1669a3b6-d370-43dc-873f-9b2a0a857a0c_1200x1051.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from Spotify blog: <a href="https://engineering.atspotify.com/2024/08/unlocking-insights-with-high-quality-dashboards-at-scale">source</a></figcaption></figure></div><div><hr></div><p><strong>Related Content:</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f02a0a77-c6f5-4633-8697-ebded700095a&quot;,&quot;caption&quot;:&quot;Snapchat is a tech company that handles complex, large-scale challenges in the data space. Today, we will explore the tools and technologies Snapchat uses for data ingestion, transformation, governance, and more.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Snapchat Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-06-07T16:30:25.135Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!uzI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/snapchat-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:163349652,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:24,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!Ddb9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;437c668d-ef2a-479e-b860-0e1b18f2e2d1&quot;,&quot;caption&quot;:&quot;DoorDash has been a leader in the food delivery service industry, with over 5 billion consumer orders, more than $100 billion in merchant sales, and over $35 billion earned by Dashers. A key factor in their success is their data-driven approach, ingesting massive amounts of event-driven data daily to make informed decisions.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DoorDash Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-26T16:30:41.413Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/doordash-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:159625272,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:20,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Spotify&#8217;s cloud journey has been unique; starting with on-premise then leveraging AWS and then migrating to GCP in 2016, while still operating some on-prem systems like Hadoop. Today, they rely heavily on GCP-native tools alongside in-house platforms that empower internal teams. I may have missed details, feel free to share in the comments!</p>]]></content:encoded></item><item><title><![CDATA[Inside Data Engineering with Julien Hurault]]></title><description><![CDATA[Consultant Julien Hurault takes you inside the world of data engineering, sharing practical insights, real-world challenges, and his perspective on where the field is headed.]]></description><link>https://www.junaideffendi.com/p/inside-data-engineering-with-julien</link><guid isPermaLink="false">https://www.junaideffendi.com/p/inside-data-engineering-with-julien</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 26 Jul 2025 16:30:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1AFj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, we're joined by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Julien Hurault&quot;,&quot;id&quot;:35734446,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcd13909-dd93-49c5-97e0-9890b91d2d81_1380x1380.png&quot;,&quot;uuid&quot;:&quot;0a50080b-1201-4e50-9515-f0d4e382abdc&quot;}" data-component-name="MentionToDOM"></span> from <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Ju Data Engineering Newsletter&quot;,&quot;id&quot;:1211981,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/juhache&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62cc8e88-2a37-4b92-8ff9-dae996ed985d_1280x1280.png&quot;,&quot;uuid&quot;:&quot;db29e7d5-bfb8-4c0f-90ec-e125996c938d&quot;}" data-component-name="MentionToDOM"></span>, who&#8217;s been working in Data Engineering for the last 10 years, mainly as a consultant, he will dive deep into his experience from the consultancy perspective.</p><p>To recap: the series follows a Q&amp;A format, featuring professionals who share their journeys, insights, and challenges.</p><h3><strong>What to Expect:</strong></h3><ul><li><p><strong>Inside the Role</strong> &#8211; Get a real-world look at what data engineers do day in and day out.</p></li><li><p><strong>Getting Started</strong> &#8211; Discover the essential skills, tools, and career routes to break into the field.</p></li><li><p><strong>Tech Trends</strong> &#8211; Stay in the loop with evolving technologies and shifts shaping data engineering.</p></li><li><p><strong>Debunking Myths</strong> &#8211; Clear up common misconceptions about the data engineering profession.</p></li><li><p><strong>Voices from the Field</strong> &#8211; Hear firsthand insights and experiences from seasoned data engineers.</p></li></ul><div class="pullquote"><p>&#11088; If you're curious about data engineering or considering it as a career, this series is for you!</p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1AFj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1AFj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!1AFj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!1AFj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!1AFj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1AFj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3199281,&quot;alt&quot;:&quot;Inside Data Engineering with Julien Hurault&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/164371224?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside Data Engineering with Julien Hurault" title="Inside Data Engineering with Julien Hurault" srcset="https://substackcdn.com/image/fetch/$s_!1AFj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!1AFj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!1AFj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!1AFj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7085e80-e325-43ee-bbb3-27da72fbba70_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inside Data Engineering with Julien Hurault</figcaption></figure></div><p>Let&#8217;s dive into Inside Data Engineering:</p><h4>How would you describe Data Engineering?</h4><p>Data engineering is the art of moving data from point A to point B as efficiently as possible. &#8220;Efficient&#8221; means choosing the tools, designs and processes that best fit the context, your company&#8217;s scale, the nature of the data, and the latency required.</p><h4>How did you end up being a Data Engineer?</h4><p>I started out in 2015 as a data scientist. Like many in that role, I spent about 90 % of my time cleaning, reshaping and piping data for my models. At some point I realised I was already doing data engineering work, just without the title. From there, I leaned fully into that specialty and transitioned step-by-step into a dedicated data engineering position, which is where I am today.</p><h4>What's your day-to-day look like?</h4><p>I work as a freelance data engineer. I spend most mornings on client projects, and in the afternoons, I focus on sales, marketing, or content creation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Azsm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Azsm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!Azsm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!Azsm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!Azsm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Azsm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png" width="608" height="351.1868131868132" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:608,&quot;bytes&quot;:144538,&quot;alt&quot;:&quot;Julien day to day calendar&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/164371224?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Julien day to day calendar" title="Julien day to day calendar" srcset="https://substackcdn.com/image/fetch/$s_!Azsm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!Azsm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!Azsm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!Azsm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb53530f-f153-4846-8d50-9d8ddf8c2ddc_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Julien day to day calendar</figcaption></figure></div><h4>What are some stakeholders that you work with?</h4><p>When I engage with a client, my first contacts are usually the Head of Engineering, Head of Data, or Data Team Lead. After the initial discussions, I work mainly with the data engineering team.</p><h4>What kind of projects do you work on?</h4><p>I support startups by defining their data strategy, selecting the right tools, and designing data architecture. I also work with larger companies in a more hands-on role, helping them build data platforms.</p><h4>What kind of data do you work with?</h4><p>Most of my projects involve tabular data:</p><ul><li><p>Time-series</p></li><li><p>Events</p></li><li><p>User data/profiles</p></li></ul><h4>What data size do you work with?</h4><p>It largely depends on the company and its data volume, which can range anywhere from tens of <code>GB to 100+ TB</code>.</p><h4>What tech stack do you use?</h4><p>Tech Stack depends on client/projects, typically:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D65F!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D65F!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 424w, https://substackcdn.com/image/fetch/$s_!D65F!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 848w, https://substackcdn.com/image/fetch/$s_!D65F!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 1272w, https://substackcdn.com/image/fetch/$s_!D65F!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D65F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png" width="678" height="421.8873626373626" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:906,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:2484196,&quot;alt&quot;:&quot;Julien tech stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/164371224?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Julien tech stack" title="Julien tech stack" srcset="https://substackcdn.com/image/fetch/$s_!D65F!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 424w, https://substackcdn.com/image/fetch/$s_!D65F!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 848w, https://substackcdn.com/image/fetch/$s_!D65F!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 1272w, https://substackcdn.com/image/fetch/$s_!D65F!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8807ef67-2668-4e4f-ba85-0bfbdd5a4e93_2367x1473.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Julien tech stack</figcaption></figure></div><h4>What is your favorite area of Data Engineering?</h4><p>Data Platform Design</p><h4><strong>How can Data Engineering benefit from GenAI?</strong></h4><p>V0 of a data pipeline: With a solid foundation in place, GenAI can generate the v0 of a pipeline. However, GenAI struggles with anything beyond <code>~20</code> lines of code unless it&#8217;s tightly constrained.</p><p>That&#8217;s exactly why I launched <a href="https://www.boringdata.io/">boringdata.io</a>. The templates provide <code>80%</code> of the pipeline code out of the box; GenAI fine-tunes another <code>10%</code>, and the engineer refines the final <code>10%</code>.</p><div id="youtube2-9YpMo1fS5SU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;9YpMo1fS5SU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/9YpMo1fS5SU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h4>What advice would you give your past self as a beginner Data Engineer?</h4><p>If you hate AWS IAM, that's normal &#8212; you're not alone. &#128516;</p><h4>What are some challenging aspects of Data Engineering?</h4><p>Design your pipeline with maintenance in mind. Building a pipeline's v0 is easy; making it robust is not.</p><h4>What is the next big thing according to you in Data Engineering?</h4><p>Open table formats are transforming data platform design: We can keep all data in a single storage layer while choosing the best compute engine for each task.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xy5s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xy5s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 424w, https://substackcdn.com/image/fetch/$s_!xy5s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 848w, https://substackcdn.com/image/fetch/$s_!xy5s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 1272w, https://substackcdn.com/image/fetch/$s_!xy5s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xy5s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png" width="515" height="347.75393419170246" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:472,&quot;width&quot;:699,&quot;resizeWidth&quot;:515,&quot;bytes&quot;:4450632,&quot;alt&quot;:&quot;Data Lakehouse: Presenting the popular open table formats.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/164371224?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Data Lakehouse: Presenting the popular open table formats." title="Data Lakehouse: Presenting the popular open table formats." srcset="https://substackcdn.com/image/fetch/$s_!xy5s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 424w, https://substackcdn.com/image/fetch/$s_!xy5s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 848w, https://substackcdn.com/image/fetch/$s_!xy5s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 1272w, https://substackcdn.com/image/fetch/$s_!xy5s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b1c6898-8628-4a0d-9ff6-0ce4fd462809_699x472.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Lakehouse: Presenting the popular open table formats.</figcaption></figure></div><h4>What are some common misconceptions about data engineering?</h4><p>Moving data from A to B may sound boring, but there are countless ways to design the same pipeline, which makes it fascinating. Technology evolves so quickly that you must keep learning new tools and approaches.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Reach out if you like:</h3><ul><li><p>To be the guest and share your experiences &amp; journey.</p></li><li><p>To provide feedback and suggestions on how we can improve the quality of questions.</p></li><li><p>To suggest guests for the future articles.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Benchmarking Spark - Open Source vs EMRs]]></title><description><![CDATA[Diving into four approaches from Spark Operator to EMR (EKS, EC2, and Serverless), sharing benchmarking results and key insights to help you choose the best option.]]></description><link>https://www.junaideffendi.com/p/benchmarking-spark-open-source-vs</link><guid isPermaLink="false">https://www.junaideffendi.com/p/benchmarking-spark-open-source-vs</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 05 Jul 2025 16:30:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!p1M1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Recently, I've been exploring different Spark options and benchmarking batch jobs to evaluate their setup complexity, cost-effectiveness, and performance.</p><p>I wanted to share my findings to help you decide which option to choose if you're in a similar situation.</p><p>The article covers:</p><ul><li><p>Benchmarking a single batch job across Spark Operator, EMR on EC2, EMR on EKS, and EMR Serverless.</p></li><li><p>Key considerations for selecting the right option and when to use each.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p1M1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p1M1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!p1M1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!p1M1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!p1M1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p1M1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:554623,&quot;alt&quot;:&quot;Benchmarking Spark - Open Source vs EMRs&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/156748661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Benchmarking Spark - Open Source vs EMRs" title="Benchmarking Spark - Open Source vs EMRs" srcset="https://substackcdn.com/image/fetch/$s_!p1M1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!p1M1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!p1M1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!p1M1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F834d6834-1814-4592-ada4-a71097a513d9_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Benchmarking Spark - Open Source vs EMRs</figcaption></figure></div><h3>Job</h3><p>To benchmark these options, I used a heavy-shuffling Spark batch job written in Scala. Since it was run on a proprietary dataset, I can't share specific details.</p><h3>Inputs</h3><h4>Dataset A</h4><pre><code>Size: ~270 GB 
File Count: 500 
Row Count: ~350 Million</code></pre><h4>Dataset B</h4><pre><code><code>Size: ~1 TB 
File Count: 500
Row Count: ~3 Billion</code></code></pre><h3>Resource</h3><p>I used same resources across each approach:</p><ul><li><p>Job runs in us-east-1 region.</p></li><li><p>Using On-Demand because with Spot the cost and performance will vary depending on how often the job lose the spot.</p></li><li><p>Ignoring the Driver cost to simplify calculation.</p></li><li><p>Not enabling or considering the autoscaling feature when calculating the cost, it makes easy to compare and calculate cost.</p></li><li><p>Using 150 executors, with:</p><ul><li><p>50 GB memory</p></li><li><p>8 cores</p></li></ul></li></ul><h3>Result</h3><p>Some Important points to keep in mind while looking at results:</p><ul><li><p>Spark Operator uses the 3.2 version with no custom optimization. It is the existing environment we use, so its not apples to apples, but gives enough idea.</p></li><li><p>AWS says EMR is <a href="https://aws.amazon.com/blogs/big-data/run-apache-spark-3-5-1-workloads-4-5-times-faster-with-amazon-emr-runtime-for-apache-spark/?utm_source=chatgpt.com">4x faster than open source Spark</a>.</p></li><li><p>Cost is not the actual bill, calculation shared in the next section.</p></li><li><p>On paper EMR Serverless might look like the most expensive option, but in reality due to better resource consumption, it is actually the fastest.</p></li><li><p>Benchmarking can give idea in terms of cost and performance, but there are more factors to consider which are shared later.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ddna!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ddna!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 424w, https://substackcdn.com/image/fetch/$s_!Ddna!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 848w, https://substackcdn.com/image/fetch/$s_!Ddna!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 1272w, https://substackcdn.com/image/fetch/$s_!Ddna!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ddna!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png" width="1456" height="403" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:403,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83479,&quot;alt&quot;:&quot;Benchmarking results: run time vs cost&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/156748661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Benchmarking results: run time vs cost" title="Benchmarking results: run time vs cost" srcset="https://substackcdn.com/image/fetch/$s_!Ddna!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 424w, https://substackcdn.com/image/fetch/$s_!Ddna!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 848w, https://substackcdn.com/image/fetch/$s_!Ddna!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 1272w, https://substackcdn.com/image/fetch/$s_!Ddna!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9e6fcc8-c068-4e44-bfaa-d5c34044ae3f_1596x442.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Benchmarking results: run time vs cost</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6GKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6GKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 424w, https://substackcdn.com/image/fetch/$s_!6GKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 848w, https://substackcdn.com/image/fetch/$s_!6GKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 1272w, https://substackcdn.com/image/fetch/$s_!6GKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6GKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin" width="1456" height="947" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:947,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Run time and cost per minute for different spark approaches.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Run time and cost per minute for different spark approaches." title="Run time and cost per minute for different spark approaches." srcset="https://substackcdn.com/image/fetch/$s_!6GKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 424w, https://substackcdn.com/image/fetch/$s_!6GKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 848w, https://substackcdn.com/image/fetch/$s_!6GKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 1272w, https://substackcdn.com/image/fetch/$s_!6GKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3df43878-2bbe-4720-bd55-9aad7cf1a088_1696x1103.bin 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Run time and cost per minute for different spark approaches.</figcaption></figure></div><blockquote><p>Based on just benchmarking results for the specific job, EMR Serverless is the right choice in terms of both cost and performance, but it may not be true in all cases.</p></blockquote><h2>Cost Calculation</h2><p>Calculation is done through <a href="https://calculator.aws/#/createCalculator/EMR">AWS calculator</a> based on r5.2xlarge (8 cores, 64GB memory)<code>.</code> The actual bill may differ.</p><h4>Spark Operator</h4><pre><code><code>r5.2xlarge Instance fee = $0.504
EKS cost per hour = $0.1

Cost Per Hour: 
(Instance fee) * (Number of pods * hours) + (EKS cost * hours)
$0.504 * (150 * 1) + (0.1 * 1) = $75.70

Cost Per Minute: 
cost_per_hour * (mins / 60)
$75.70 * (22 / 60) = $27.75</code></code></pre><h4>EMR on EKS</h4><pre><code><code>r5.2xlarge Instance fee = $0.504
EMR fee per hour = $0.15208 (based on 8 cores and 64GB memory)
EKS cost per hour = $0.1

Cost Per Hour: 
(Instance fee + emr fee) * (Number of instances * hours) + (EKS cost * hours)
($0.504 + $0.15208) * (150 * 1) + (0.1 * 1) = $98.51

Cost Per Minute: 
cost_per_hour * (mins / 60)
$98.51 * (18 / 60) = $31.19</code></code></pre><h4>EMR on EC2</h4><pre><code><code>r5.2xlarge Instance fee = $0.504
EMR fee per hour = $0.126

Cost Per Hour: 
(Instance + emr fee) * (Number of instances * hours)
($0.504 + $0.126) * (150 * 1) = $94.5

Cost Per Minute: 
cost_per_hour * (mins / 60)
$94.5 * (17 / 60) = $26.7</code></code></pre><h4>EMR Serverless</h4><pre><code>EMR fee per hour = $0.7908

Cost Per Hour: 
(emr fee) * (resources * hours)
$0.7908 * (150 * 1) = $118.62

Cost Per Minute: 
cost_per_hour * (mins / 60)
$118.62 * (11 / 60) = $21.75</code></pre><h3>Which one to Pick?</h3><p>The biggest challenge as a Team is to pick the right compute infra for your use case. Benchmarking can help, but there are lot of other factors that needs to considered, e.g. complexity, operational overhead, workflow requirements, team dynamics, etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1GuC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1GuC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!1GuC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!1GuC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!1GuC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1GuC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:196418,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/156748661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1GuC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!1GuC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!1GuC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!1GuC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf76d02-69c3-4399-b8ee-49ed261a7515_2547x1532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Complexity &amp; Operational Overhead vs Approaches</figcaption></figure></div><p>The following key points will help you decide which one to pick and will save time by narrowing down the benchmarking options.</p><h4>Spark Operator</h4><ul><li><p>Complex setup requiring heavy investment upfront with high operational overhead. </p><ul><li><p>Great for dedicated teams which has the expertise of running open source tool and Kubernetes.</p></li></ul></li><li><p>Fully customizable, however requires additional setup from Spark UI to performance optimization, e.g. setting up a magic committer for write performance.</p></li><li><p>For large scale, this will be the cheapest option in the long run if implemented correctly.</p></li><li><p>Can support both Spot and On Demand EC2 Instances.</p></li></ul><h4>EMR on EKS</h4><ul><li><p>Bit easier than Spark Operator but still a complex setup requiring similar investment upfront but only on the Kubernetes part.</p></li><li><p>Spark is fully managed and optimized through EMR giving additional features:</p><ul><li><p>EMR virtual containers console: providing restricted view only mode for Spark jobs with monitoring, logging and Spark UI.</p></li></ul></li><li><p>Depending on job type, EMR could be expensive due to additional EMR Service Fee. EMR fee is per resource (memory &amp; cores).</p></li><li><p>Can support both Spot and On Demand EC2 Instances.</p></li></ul><h4>EMR on EC2</h4><ul><li><p>Easier than the previous one as no Kubernetes is required, runs directly on EC2. However, requires the expertise of EC2 instances, types, pricing, etc.</p></li><li><p>Spark is fully managed and optimized through EMR giving additional features: </p><ul><li><p>EMR console: providing boarder control of resources, applications, monitoring and logging, Spark UI.</p></li></ul></li><li><p>Depending on job type, EMR could be expensive due to additional EMR Service Fee. EMR Fee is per instance type.</p></li><li><p>Can support both Spot and On Demand EC2 Instances.</p></li></ul><h4>EMR Serverless</h4><ul><li><p>Easiest in the list, with no complex infra setup required, no EC2 or EKS. </p><ul><li><p>Go with serverless if small team with small amount of data processing.</p></li></ul></li><li><p>Spark is fully managed and optimized through EMR giving additional features:</p><ul><li><p>EMR Studio console: providing control of applications, monitoring and logging, Spark UI.</p></li></ul></li><li><p>It is a pay per use model. Depending on job type this could be the most expensive option. EMR Fee is per resource (memory &amp; cores). </p><ul><li><p>Serverless seems to better suited for small size jobs with strict SLAs, which we saw earlier in the benchmarking results.</p></li></ul></li><li><p>Only supports On Demand option at the moment.</p></li></ul><p>To simplify in the image form, the following can be used at a high level to narrow down the option to get the most cost effective results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BYfU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BYfU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!BYfU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!BYfU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!BYfU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BYfU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:254153,&quot;alt&quot;:&quot;High level decision flow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/156748661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="High level decision flow" title="High level decision flow" srcset="https://substackcdn.com/image/fetch/$s_!BYfU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!BYfU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!BYfU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!BYfU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3e7dd81-eb26-4931-9758-ccbbbf460021_2547x1532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">High level decision flow</figcaption></figure></div><blockquote><p>After narrowing down the options, you can perform similar benchmarking on multiple jobs to get an idea.</p></blockquote><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><div class="poll-embed" data-attrs="{&quot;id&quot;:276248}" data-component-name="PollToDOM"></div><div><hr></div><p>&#128172; Let me know what else would you add.</p>]]></content:encoded></item><item><title><![CDATA[Snapchat Data Tech Stack]]></title><description><![CDATA[Learn how Snapchat ingests ~2 trillions of events per day using Google Cloud Platform.]]></description><link>https://www.junaideffendi.com/p/snapchat-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/snapchat-data-tech-stack</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 07 Jun 2025 16:30:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uzI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Snapchat is a tech company that handles complex, large-scale challenges in the data space. Today, we will explore the tools and technologies Snapchat uses for data ingestion, transformation, governance, and more.</p><h3>Metrics</h3><ul><li><p>Ingesting 4+ TB of data into BQ every day, <a href="https://www.youtube.com/watch?app=desktop&amp;v=xVSwcwq3N4Q">source</a>.</p></li><li><p>Ingesting 1.8 trillion events per day at peak, <a href="https://www.youtube.com/watch?app=desktop&amp;v=xVSwcwq3N4Q">source</a>.</p></li><li><p>Datawarehouse contains more than 200 PB of data in 30k GCS bucket, <a href="https://www.youtube.com/watch?app=desktop&amp;v=xVSwcwq3N4Q">source</a>.</p></li><li><p>Snapchat receives 5 billions Snaps per day, <a href="https://aws.amazon.com/solutions/case-studies/innovators/snap/">source</a>.</p></li><li><p>Snapchat has 3,000 Airflow DAGS with 330,000 tasks, <a href="http://che-airflow/airflow-evolution-at-snap-c988cdd95abd">source</a>.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uzI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uzI5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!uzI5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!uzI5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!uzI5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uzI5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2038244,&quot;alt&quot;:&quot;Snapchat Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/163349652?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Snapchat Data Tech Stack" title="Snapchat Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!uzI5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!uzI5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!uzI5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!uzI5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab10293d-ac50-4ce9-acd9-bfabbe0225cf_2367x1368.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Snapchat Data Tech Stack</figcaption></figure></div><blockquote><p>Content is based on multiple sources including Snap Blog, Open Source websites, Job descriptions and other public articles etc. You will find references to dive deep as you read.</p></blockquote><h3>Platform</h3><h4>GCP</h4><p>Snapchat leverages GCP as their platform for all data infra, processing and analytical needs. They have been a big user of variety of GCP services.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ucpb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ucpb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 424w, https://substackcdn.com/image/fetch/$s_!Ucpb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 848w, https://substackcdn.com/image/fetch/$s_!Ucpb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 1272w, https://substackcdn.com/image/fetch/$s_!Ucpb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ucpb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png" width="1456" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:580333,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/163349652?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ucpb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 424w, https://substackcdn.com/image/fetch/$s_!Ucpb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 848w, https://substackcdn.com/image/fetch/$s_!Ucpb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 1272w, https://substackcdn.com/image/fetch/$s_!Ucpb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa79e491-a0dd-4252-8cae-3ccf56de5ffb_1773x643.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot taken from this <a href="https://www.youtube.com/watch?v=gzyMB4zOz_8">video</a>.</figcaption></figure></div><blockquote><p>&#128161;Snapchat&#8217;s multi-cloud architecture includes AWS; however, there is no publicly available information about its use for offline data processing.</p></blockquote><h3>Messaging System</h3><h4>PubSub</h4><p>Real time events are ingested through GCP native PubSub service. The fully managed services scales seamlessly during peak hours and works well with the GCP Dataflow service.</p><h3>Processing</h3><h4>Beam</h4><p>Snapchat relies on GCP Dataflow, a fully managed Apache Beam service, to process data in both streaming and batch modes. </p><h4>Spark</h4><p>Snapchat extensively uses Spark (GCP DataProc) for both batch and stream processing, making it a core component of the feature generation architecture that powers its recommendation systems.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://eng.snap.com/speed-up-feature-engineering">Speed Up Feature Engineering</a></p></blockquote><h3>Orchestrator</h3><h4>Airflow</h4><p>Snapchat has been using Airflow since 2016 and has faced numerous challenges over the years, which were addressed by upgrading to Airflow 2+.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UN-N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UN-N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 424w, https://substackcdn.com/image/fetch/$s_!UN-N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 848w, https://substackcdn.com/image/fetch/$s_!UN-N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 1272w, https://substackcdn.com/image/fetch/$s_!UN-N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UN-N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png" width="800" height="404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:404,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UN-N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 424w, https://substackcdn.com/image/fetch/$s_!UN-N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 848w, https://substackcdn.com/image/fetch/$s_!UN-N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 1272w, https://substackcdn.com/image/fetch/$s_!UN-N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F454b8a9e-8476-4eea-8e54-f788f40108cd_800x404.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image source: <a href="https://medium.com/apache-airflow/airflow-evolution-at-snap-c988cdd95abd">link</a></figcaption></figure></div><p>As per 2024, Snap runs over 3,000 DAGs that execute more than 330,000 task instances daily, covering ETL, reporting/analytics, machine learning workloads, and more. Snap relies on 200+ custom operators to support a wide range of services and use cases, serving over 1,000 active Airflow users.</p><h3>Warehouse</h3><h4>BigQuery</h4><p>Snapchat heavily relies on BigQuery for its data warehousing needs. All event-driven data from various sources ends up in the warehouse.</p><p>BigQuery is used by many teams across different organizations for both ad hoc and scheduled workflows, orchestrated through Airflow. </p><h3>Lakehouse</h3><h4>GCS &amp; Iceberg</h4><p>Alongside centralized data warehousing, Snapchat uses a lakehouse architecture by combining Google Cloud Storage (GCS) with Apache Iceberg. This allows them to efficiently access large datasets without duplication in BigQuery. </p><p>With BigLake integration, GCS data becomes directly accessible in BigQuery, giving users a native-like experience.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://cloud.google.com/blog/products/data-analytics/announcing-apache-iceberg-support-for-biglake">Apache Iceberg support for BigLake</a></p></blockquote><p>ML teams are among the primary users of Iceberg and BigQuery tables in their <a href="https://eng.snap.com/introducing-bento">Bento Platform</a>.</p><h3>Governance</h3><h4>Dataplex</h4><p>No surprise, as Snapchat is a GCP-heavy user, they also use GCP's Dataplex service for data management. It enables the creation of a logical data organization layer spanning more than 1,500 GCP projects, allowing them to manage and make decisions without the need to move data physically.</p><h3>Dashboard</h3><h4>Looker</h4><p>Looker plays a central role in enabling data-driven decisions across the company. Integrated with BigQuery, it empowers teams to self-serve their data needs. </p><p>Snapchat has made significant investments in Looker, partnering with GCP to upskill teams worldwide, as per this <a href="https://cloud.google.com/blog/topics/training-certifications/snap-partners-with-google-cloud-to-upskill-teams-around-the-globe">source</a>.</p><div><hr></div><p><strong>Related Content:</strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;df86dcca-1589-4b12-b335-d091da3e0a1c&quot;,&quot;caption&quot;:&quot;Netflix handle massive scale, from event data in streams to data at rest in the warehouse. Netflix data stack is pretty solid, mostly built on top of open source solutions. The data stack processes trillions of data points everyday while the scale of data at rest is in hundreds of Petabytes based on&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Netflix Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-08T16:31:10.943Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2ea4efe-5128-4330-804b-a54c2f561e08_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/netflix-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144081570,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f5d9413e-7018-4852-9e26-6f052c7ea367&quot;,&quot;caption&quot;:&quot;DoorDash has been a leader in the food delivery service industry, with over 5 billion consumer orders, more than $100 billion in merchant sales, and over $35 billion earned by Dashers. A key factor in their success is their data-driven approach, ingesting massive amounts of event-driven data daily to make informed decisions.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;DoorDash Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-04-26T16:30:41.413Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/doordash-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:159625272,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:20,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p><strong>Interesting Read:</strong></p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:165057537,&quot;url&quot;:&quot;https://architecturenotes.co/p/the-cost-of-ai-in-code-goodbye-fingerspitzengefu&quot;,&quot;publication_id&quot;:2464248,&quot;publication_name&quot;:&quot;Architecture Notes &#8212; System Design &amp;  Software Development&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F649a8bc0-3d23-4a77-a9ee-da7c79a71e77_347x347.png&quot;,&quot;title&quot;:&quot;The Cost of AI in Code: Goodbye, Fingerspitzengef&#252;hl?&quot;,&quot;truncated_body_text&quot;:&quot;AI coding tools such as Cursor, Claude Code, and OpenAI&#8217;s Codex are transforming software development. They can generate, refactor, and even test code autonomously, significantly boosting productivity. For instance, Claude Code operates as a command-line interface, allowing developers to interact with their codebase through natural language prompts. Cur&#8230;&quot;,&quot;date&quot;:&quot;2025-06-03T00:48:20.122Z&quot;,&quot;like_count&quot;:18,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:1061252,&quot;name&quot;:&quot;Mahdi Yusuf&quot;,&quot;handle&quot;:&quot;myusuf3&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95c19f54-fabb-40e5-b1fb-cc436c432a1a_1536x1536.png&quot;,&quot;bio&quot;:&quot;speaker, writer, and home labber. engineering manager at @apple ex-senior staff engineer at @1password, and previously cto @gyroscope_app&quot;,&quot;profile_set_up_at&quot;:&quot;2024-03-07T00:20:34.918Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-02-06T13:38:07.119Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2492410,&quot;user_id&quot;:1061252,&quot;publication_id&quot;:2464248,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:2464248,&quot;name&quot;:&quot;Architecture Notes &#8212; System Design &amp;  Software Development&quot;,&quot;subdomain&quot;:&quot;arcnotes&quot;,&quot;custom_domain&quot;:&quot;architecturenotes.co&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;In your inbox every Sunday! &#9889;&#65039;&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/649a8bc0-3d23-4a77-a9ee-da7c79a71e77_347x347.png&quot;,&quot;author_id&quot;:1061252,&quot;primary_user_id&quot;:1061252,&quot;theme_var_background_pop&quot;:&quot;#A33ACB&quot;,&quot;created_at&quot;:&quot;2024-03-27T23:34:49.163Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Mahdi Yusuf&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://architecturenotes.co/p/the-cost-of-ai-in-code-goodbye-fingerspitzengefu?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!Yyd0!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F649a8bc0-3d23-4a77-a9ee-da7c79a71e77_347x347.png" loading="lazy"><span class="embedded-post-publication-name">Architecture Notes &#8212; System Design &amp;  Software Development</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">The Cost of AI in Code: Goodbye, Fingerspitzengef&#252;hl?</div></div><div class="embedded-post-body">AI coding tools such as Cursor, Claude Code, and OpenAI&#8217;s Codex are transforming software development. They can generate, refactor, and even test code autonomously, significantly boosting productivity. For instance, Claude Code operates as a command-line interface, allowing developers to interact with their codebase through natural language prompts. Cur&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">a year ago &#183; 18 likes &#183; Mahdi Yusuf</div></a></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Snapchat primarily use GCP for Data Analytics and Platform, but with their recent adoption of AWS, there is likely some information that I am unaware of. Please let me know in the comments.</p>]]></content:encoded></item><item><title><![CDATA[Inside Data Engineering with Daniel Beach]]></title><description><![CDATA[Veteran data engineer Daniel Beach takes you inside the world of data engineering, sharing hard-earned insights, day-to-day challenges, and what&#8217;s on the horizon for the field.]]></description><link>https://www.junaideffendi.com/p/inside-data-engineering-with-daniel</link><guid isPermaLink="false">https://www.junaideffendi.com/p/inside-data-engineering-with-daniel</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 24 May 2025 16:30:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!i2z7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, we're joined by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;id&quot;:21715962,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;uuid&quot;:&quot;9daf841c-3c77-4d5f-8b96-f8cb8875128b&quot;}" data-component-name="MentionToDOM"></span> from <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Data Engineering Central&quot;,&quot;id&quot;:1224799,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/dataengineeringcentral&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/880c179a-d4f4-4f41-a70c-48e557c48f38_256x256.png&quot;,&quot;uuid&quot;:&quot;228c0503-4ac5-4b9b-90f0-b6936673e3d0&quot;}" data-component-name="MentionToDOM"></span>, who&#8217;s been working in Data Engineering since before it was cool, he will share his journey and insights.</p><p>To recap: the series follows a Q&amp;A format, featuring professionals who share their journeys, insights, and challenges.</p><h3><strong>What to Expect:</strong></h3><ul><li><p><strong>Inside the Day-to-Day</strong> &#8211; See what life as a data engineer really looks like on the ground.</p></li><li><p><strong>Breaking In</strong> &#8211; Explore the skills, tools, and career paths that can get you started.</p></li><li><p><strong>Tech Pulse</strong> &#8211; Keep up with the latest trends, tools, and industry shifts shaping the field.</p></li><li><p><strong>Real Challenges</strong> &#8211; Uncover the obstacles engineers tackle beyond the textbook.</p></li><li><p><strong>Myth-Busting</strong> &#8211; Set the record straight on common data engineering misunderstandings.</p></li><li><p><strong>Voices from the Field</strong> &#8211; Get inspired by stories and insights from experienced pros.</p></li></ul><div class="pullquote"><p>&#11088; If you're curious about data engineering or considering it as a career, this series is for you!</p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i2z7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i2z7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!i2z7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!i2z7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!i2z7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i2z7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3015095,&quot;alt&quot;:&quot;Inside Data Engineering with Daniel Beach&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/160646398?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside Data Engineering with Daniel Beach" title="Inside Data Engineering with Daniel Beach" srcset="https://substackcdn.com/image/fetch/$s_!i2z7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!i2z7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!i2z7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!i2z7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d99d71e-cf3a-471c-9475-693535c5f1cc_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inside Data Engineering with Daniel Beach</figcaption></figure></div><p>Let&#8217;s dive into Inside Data Engineering:</p><h4>How would you describe Data Engineering?</h4><p>Prior to the rise of AI, Data Engineering has become about being the best at Python, SQL, or whatever new hot tool was released. AI coding assistants, like it or not, have lowered the bar and made Data Engineering less about coding and more about providing business value from data. More than ever Data Engineering is about &#8230;</p><ul><li><p>High-level architectural and data platform designs and maintenance.</p></li><li><p>Reducing data processing costs and complexity</p></li><li><p>Communication with non-engineering groups</p></li><li><p>Leading and upskilling others</p></li><li><p>Project planning and implementation</p></li></ul><h4>How did you end up being a Data Engineer?</h4><p>I came from a non-traditional background, I never took a computer science class in my life. I taught myself during college to write PHP, Perl, MySQL, etc. After a few years of working as an engineer, I decided I wanted to move into tech. I taught myself SQL, got SQL Server certified, and got a job as a Data Analyst on a Business Intelligence team. </p><p>This was when Data Engineering was just becoming a thing, so I continued to hone my programming skills and taught myself things like Spark before it was popular. The rest is history.</p><h4>What's your day to day look like?</h4><p>I work at a small startup, so it can vary greatly, but it is typically made up of the following different tasks.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0f2u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0f2u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!0f2u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!0f2u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!0f2u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0f2u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:152444,&quot;alt&quot;:&quot;Daniel&#8217;s day to day&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/160646398?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Daniel&#8217;s day to day" title="Daniel&#8217;s day to day" srcset="https://substackcdn.com/image/fetch/$s_!0f2u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!0f2u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!0f2u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!0f2u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95e4a2c2-106c-496c-b514-dfcbf9ff7e97_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Daniel&#8217;s day to day</figcaption></figure></div><ul><li><p>Project planning (planning new features, writing docs, implementation plans, etc., creating JIRA tasks etc.)</p></li><li><p>Answering questions from others and helping unblock others as needed.</p></li><li><p>Focused coding time.</p><ul><li><p>Could be AWS infra, Spark, Databricks, Postgres, Docker, etc.</p></li></ul></li></ul><h4>What are some stakeholders that you work with?</h4><p>I work with:</p><ul><li><p>Data Science</p></li><li><p>Product (the business)</p></li><li><p>Data Analysts</p></li><li><p>C-Suite (CTO, etc.)</p></li></ul><h4>What real-world business problems do you solve through data?</h4><p>We work on Machine Learning pipelines at scale that can predict debit card/credit card fraud. This requires building reliable systems that scale and ingest large quantities of data with a small team.</p><h4>What kind of projects do you work on?</h4><p>A very large variety of projects, working at startups requires a wide range of skills. AWS infrastructure, lots of Databricks/Spark pipelines, even LLM fine-tuning and RAGs. I enjoy working on a wide variety of technologies, it keeps me engaged and always learning something new.</p><h4>What kind of data do you work with?</h4><p>Mostly tabular data.</p><h4>What data size do you work with?</h4><p>300TBish.</p><h4>What technologies do you use?</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HK9H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HK9H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 424w, https://substackcdn.com/image/fetch/$s_!HK9H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 848w, https://substackcdn.com/image/fetch/$s_!HK9H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 1272w, https://substackcdn.com/image/fetch/$s_!HK9H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HK9H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png" width="1456" height="906" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:906,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1351181,&quot;alt&quot;:&quot;Daniel&#8217;s tech stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/160646398?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Daniel&#8217;s tech stack" title="Daniel&#8217;s tech stack" srcset="https://substackcdn.com/image/fetch/$s_!HK9H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 424w, https://substackcdn.com/image/fetch/$s_!HK9H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 848w, https://substackcdn.com/image/fetch/$s_!HK9H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 1272w, https://substackcdn.com/image/fetch/$s_!HK9H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c622acb-cb5c-49d7-a476-9f1f0ee4fb5c_2367x1473.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Daniel&#8217;s tech stack</figcaption></figure></div><h4>What is your favorite area of Data Engineering?</h4><p>Rust-based Data Engineering tools like Polars and Daft, I think it&#8217;s the future of Data Engineering.</p><h4>How can Data Engineering benefit from GenAI?</h4><p>Increased efficiency in producing results, let GenAI assist in writing tests, finding bugs, and bouncing ideas off it. It can shorten the time horizon in many areas of the Software Development Lifecycle. </p><p>It&#8217;s fair that some people are skeptical and worried about losing fundamental skills, but if you are a continuous learner before AI, the chances that AI will change who you are are low.</p><h4>What advice would you give your past self as a beginner Data Engineer?</h4><p>Work on soft skills as hard as you work on programming and technical skills. Writing, speaking, communicating, project planning, etc. </p><blockquote><p>Never stop learning, always push yourself to do things you don&#8217;t understand. Find people smarter than yourself and then work closely with them.</p></blockquote><h4>What are some challenging aspects of Data Engineering?</h4><p>Working with the business in a way that will build relationships while at the same time being strict about best practices and approaching problems in a reasonable manner. Also, in the fast-paced, changing landscape of tech and Data Engineering, it is important to stay focused on the basics and provide reliable and scalable solutions that simply work well.</p><h4>What is the next big thing in Data Engineering?</h4><p>DuckDB, Polars etc., they will become distributed in nature and start to eat Spark&#8217;s market share (over a long period of time).</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:141997113,&quot;url&quot;:&quot;https://dataengineeringcentral.substack.com/p/duckdb-vs-polars-thunderdome&quot;,&quot;publication_id&quot;:1224799,&quot;publication_name&quot;:&quot;Data Engineering Central&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F880c179a-d4f4-4f41-a70c-48e557c48f38_256x256.png&quot;,&quot;title&quot;:&quot;DuckDB vs Polars - Thunderdome.&quot;,&quot;truncated_body_text&quot;:&quot;If you know me, you know I like to stir the pot, the big boiling and smoldering cauldron of Data Tools pot. Yes, that&#8217;s the one, blackened and burned pot from years of conjurers pouring myriads of Modern Data Stack tools into it, which have since bubbled and encrusted us all with the refuse of a thousand promises to be the cure-all for our ailments.&quot;,&quot;date&quot;:&quot;2024-03-04T14:18:51.150Z&quot;,&quot;like_count&quot;:19,&quot;comment_count&quot;:2,&quot;bylines&quot;:[{&quot;id&quot;:21715962,&quot;name&quot;:&quot;Daniel Beach&quot;,&quot;handle&quot;:&quot;dataengineeringcentral&quot;,&quot;previous_name&quot;:&quot;dataengineeringdude&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;bio&quot;:&quot;Long time data engineer, with a passion. &quot;,&quot;profile_set_up_at&quot;:&quot;2022-12-03T22:11:41.368Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:1180716,&quot;user_id&quot;:21715962,&quot;publication_id&quot;:1224799,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:1224799,&quot;name&quot;:&quot;Data Engineering Central&quot;,&quot;subdomain&quot;:&quot;dataengineeringcentral&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Long Live the Data Engineer. No holds barred.&quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/880c179a-d4f4-4f41-a70c-48e557c48f38_256x256.png&quot;,&quot;author_id&quot;:21715962,&quot;theme_var_background_pop&quot;:&quot;#FF9900&quot;,&quot;created_at&quot;:&quot;2022-12-03T22:13:49.443Z&quot;,&quot;email_from_name&quot;:&quot;Data Engineering Central&quot;,&quot;copyright&quot;:&quot;dataengineeringdude&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false}},{&quot;id&quot;:2346115,&quot;user_id&quot;:21715962,&quot;publication_id&quot;:2325879,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:2325879,&quot;name&quot;:&quot;F.I.R.E. Finance&quot;,&quot;subdomain&quot;:&quot;firefinance&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;A Substack dedicated to the idea of Financial Independence, Retire Early. Tips, tricks, and longing for that day!&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F81caaeec-9053-487c-a59c-ba5f8e4644ad_256x256.jpeg&quot;,&quot;author_id&quot;:21715962,&quot;theme_var_background_pop&quot;:&quot;#9A6600&quot;,&quot;created_at&quot;:&quot;2024-02-05T17:38:43.471Z&quot;,&quot;email_from_name&quot;:&quot;F.I.R.E. Finance&quot;,&quot;copyright&quot;:&quot;Daniel Beach&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false}},{&quot;id&quot;:4385666,&quot;user_id&quot;:21715962,&quot;publication_id&quot;:4299365,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:4299365,&quot;name&quot;:&quot;Prairie Meditations&quot;,&quot;subdomain&quot;:&quot;prairiemeditations&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Thoughts on life and more from the heart of the Midwest.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b2f5d6c-91bb-4080-8f99-1087569e9a3b_256x256.png&quot;,&quot;author_id&quot;:21715962,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2025-03-06T00:38:52.071Z&quot;,&quot;email_from_name&quot;:null,&quot;copyright&quot;:&quot;Daniel Beach&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;newspaper&quot;,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://dataengineeringcentral.substack.com/p/duckdb-vs-polars-thunderdome?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!pIVQ!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F880c179a-d4f4-4f41-a70c-48e557c48f38_256x256.png" loading="lazy"><span class="embedded-post-publication-name">Data Engineering Central</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">DuckDB vs Polars - Thunderdome.</div></div><div class="embedded-post-body">If you know me, you know I like to stir the pot, the big boiling and smoldering cauldron of Data Tools pot. Yes, that&#8217;s the one, blackened and burned pot from years of conjurers pouring myriads of Modern Data Stack tools into it, which have since bubbled and encrusted us all with the refuse of a thousand promises to be the cure-all for our ailments&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 years ago &#183; 19 likes &#183; 2 comments &#183; Daniel Beach</div></a></div><h4>What are some common misconceptions about data engineering?</h4><p>You can simply learn SQL and Python and be a good Data Engineer. Now, with GenAI that is even less true than it was before. We need Engineers who can <a href="https://www.junaideffendi.com/p/building-a-collaborative-engineering">work well on teams</a>, stay focused, make good tradeoffs, communicate well, and can do more than just write code.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Reach out if you like:</h3><ul><li><p>To be the guest and share your experiences &amp; journey.</p></li><li><p>To provide feedback and suggestions on how we can improve the quality of questions.</p></li><li><p>To suggest guests for the future articles.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Data Governance in Lakehouse Using Open Source Tools]]></title><description><![CDATA[Discover how to build a complete data governance ecosystem in a Lakehouse architecture using leading open-source tools. Explore access control, metadata management, lineage, quality and more.]]></description><link>https://www.junaideffendi.com/p/data-governance-in-lakehouse-using</link><guid isPermaLink="false">https://www.junaideffendi.com/p/data-governance-in-lakehouse-using</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 10 May 2025 16:30:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-e8m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As organizations adopt the Lakehouse architecture which blends the flexibility of data lakes with the reliability of data warehouses, the need for robust data governance becomes critical. But good governance doesn&#8217;t have to mean expensive vendor tools. With a smart selection of open-source tools, you can enforce policies, ensure data quality, and maintain compliance across your entire data platform.</p><div class="pullquote"><p>&#11088; While these tools are not limited to Lakehouses and can apply to lakes, warehouses, and hybrid architectures, they are especially critical for Lakehouses where governance capabilities must often be assembled independently.</p></div><p>The goal of this article is to provide a high-level overview of Lakehouse governance using open-source tools. In future articles, I will dive deeper into these technologies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-e8m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-e8m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 424w, https://substackcdn.com/image/fetch/$s_!-e8m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 848w, https://substackcdn.com/image/fetch/$s_!-e8m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 1272w, https://substackcdn.com/image/fetch/$s_!-e8m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-e8m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png" width="1456" height="987" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:987,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1674022,&quot;alt&quot;:&quot;Data Governance in Lakehouse Using Open Source Tools&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/160721695?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Data Governance in Lakehouse Using Open Source Tools" title="Data Governance in Lakehouse Using Open Source Tools" srcset="https://substackcdn.com/image/fetch/$s_!-e8m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 424w, https://substackcdn.com/image/fetch/$s_!-e8m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 848w, https://substackcdn.com/image/fetch/$s_!-e8m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 1272w, https://substackcdn.com/image/fetch/$s_!-e8m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2d417f0-86b4-4028-a607-57dbcf75bb11_2367x1604.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Governance in Lakehouse Using Open Source Tools</figcaption></figure></div><div><hr></div><h3>Access Control &amp; Security</h3><blockquote><p><strong>Goal</strong>: Control who can access what data and under what conditions.</p></blockquote><ul><li><p><strong>Apache Ranger</strong>: Centralized policy management for fine-grained access control at various levels (database, table, column, row) in Hive Metastore.</p><ul><li><p>Ranger can work with Lakehouse through Hive Metastore.</p></li><li><p>Works well with Apache Atlas.</p></li></ul></li><li><p><strong>Keycloak</strong>: Not specifically for data, but for data services and applications. It integrates with tools like Trino, Airflow, and Superset to manage access to UIs and APIs through SSO and role-based access control.</p></li><li><p><strong>Open Policy Agent (OPA)</strong>: General-purpose policy engine for access control enforcement, especially when integrated with metadata platforms like DataHub.</p></li></ul><h3>Data Lineage &amp; Tracking</h3><blockquote><p><strong>Goal</strong>: Understand how data moves, transforms, and is consumed across systems.</p></blockquote><ul><li><p><strong>Amundsen</strong>: Basic lineage tracking through metadata relationships and a user-friendly UI, emphasizing search rather than detailed lineage.</p></li><li><p><strong>Apache Atlas</strong>: Rich lineage capabilities, tracking data flow across systems like Hive, HDFS, Kafka, Lakehouse (through Hive Metastore) providing insights into data transformations and processing workflows.</p></li><li><p><strong>OpenLineage + Marquez</strong>: OpenLineage defines lineage metadata for pipelines; Marquez provides metadata service with UI for search and graph-based exploration. </p><ul><li><p>OpenLineage is also used by Amundsen and DataHub under the hood.</p></li></ul></li><li><p><strong>Spline</strong>: Captures runtime data lineage from Apache Spark applications with minimal code changes.</p></li></ul><h3>Metadata Management &amp; Discovery</h3><blockquote><p><strong>Goal</strong>: Make data assets easily discoverable, understandable, and governed.</p></blockquote><ul><li><p><strong>Amundsen</strong>: Metadata and search engine with user-friendly UI, showing dataset schemas and ownership. Built on Neo4j and Elasticsearch.</p></li><li><p><strong>Apache Atlas</strong>: Metadata catalog with support for taxonomies, classifications, and integration with security tools.</p></li><li><p><strong>DataHub</strong>: Comprehensive metadata platform supporting powerful metadata discovery, faceted search, schema versioning, data ownership, and impact analysis.</p></li><li><p><strong>Metacat</strong>: Metadata and dataset catalog from Netflix, supporting multiple storage systems and integration with Hive and Presto.</p></li></ul><h3>Data Quality &amp; Observability</h3><blockquote><p><strong>Goal</strong>: Ensure data meets business expectations through validation and monitoring.</p></blockquote><ul><li><p><strong>Great Expectations</strong>: Framework for defining "expectations" on datasets (e.g., no nulls, unique keys). Integrates into pipelines for runtime validations.</p></li><li><p><strong>Soda Core</strong>: CLI-based tool for rule-based data profiling and monitoring, detecting freshness, duplicates, and other issues.</p></li><li><p><strong>Deequ</strong>: Library for setting constraints on datasets (completeness, uniqueness) and working at scale with Spark Dataframes.</p></li><li><p><strong>DQX</strong>: Framework for orchestrating data quality checks across engines, integrated with Spark, Delta Lake, and Lakehouse ecosystems.</p></li></ul><h3>Data Versioning &amp; Auditing</h3><blockquote><p><strong>Goal</strong>: Enable reproducibility, rollback, and audit trails for datasets.</p></blockquote><ul><li><p><strong>Delta Lake</strong>: Provides ACID-compliant versioning, intuitive time travel by version or timestamp, rollback to previous states, and full audit logs via a transaction log.</p></li><li><p><strong>Apache Hudi:</strong> Enables versioned data with commit timelines, supports time travel through instant queries, allows rollback of failed writes, and maintains an auditable history.</p></li><li><p><strong>Apache Iceberg:</strong> Tracks data through immutable snapshots, supports time travel and rollback via snapshot IDs, and logs detailed metadata for auditing.</p></li><li><p><strong>LakeFS</strong>: Git-like version control for data lakes, enabling branching, committing, and merging of data.</p></li></ul><h3>Data Classification &amp; Tagging </h3><blockquote><p><strong>Goal</strong>: Categorize and protect sensitive or regulated datasets appropriately.</p></blockquote><ul><li><p><strong>Amundsen</strong>: Offers lightweight tagging and ownership attribution, helping teams document and search datasets, though with limited support for hierarchical glossaries or complex policies.</p></li><li><p><strong>Apache Atlas</strong>: Provides advanced data classification with custom tags, business glossaries, and automated lineage-based tag propagation for enterprise-grade metadata governance.</p><ul><li><p>Commonly used with Apache Ranger.</p></li></ul></li><li><p><strong>DataHub</strong>: Supports dynamic classification through tags and glossary terms, with policy enforcement and access control via seamless integration with Open Policy Agent (OPA).</p></li></ul><h3><strong>Users</strong>:</h3><p>Some of these tools are used by large companies on a large scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E_I-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E_I-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 424w, https://substackcdn.com/image/fetch/$s_!E_I-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 848w, https://substackcdn.com/image/fetch/$s_!E_I-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 1272w, https://substackcdn.com/image/fetch/$s_!E_I-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E_I-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png" width="1456" height="1366" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1366,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3206234,&quot;alt&quot;:&quot;List of Companies using these Data Governance Tools&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/160721695?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="List of Companies using these Data Governance Tools" title="List of Companies using these Data Governance Tools" srcset="https://substackcdn.com/image/fetch/$s_!E_I-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 424w, https://substackcdn.com/image/fetch/$s_!E_I-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 848w, https://substackcdn.com/image/fetch/$s_!E_I-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 1272w, https://substackcdn.com/image/fetch/$s_!E_I-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d3ff3d-364f-4bce-a0ae-ce5d2e7a242b_2255x2116.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">List of Companies using these Data Governance Tools</figcaption></figure></div><div class="pullquote"><p>&#11088; <strong>Unity Catalog</strong>: The current open-source release offers basic features, but once it incorporates full governance capabilities, it will become a powerful, modern tool with all the mentioned functionalities.</p></div><h3>Final Thoughts</h3><p>Governance doesn&#8217;t have to be expensive or complex. By stitching together the right open-source tools, you can build a highly governed, secure, and observable Lakehouse architecture without locking into proprietary platforms.</p><h3>Next Steps for Readers</h3><p>If you're interested in learning more, I encourage you to follow future deep dives. </p><p>If you are looking to adopt, consider the following:</p><ul><li><p>Identify governance gaps in your current architecture.</p></li><li><p>Pilot lightweight tools (e.g., Soda Core, Great Expectations) before scaling.</p></li><li><p>Gradually build a centralized metadata and lineage system.</p></li><li><p>Automate access controls and classification tagging as your data grows.</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Let me know in the comments if I missed something.</p><p></p>]]></content:encoded></item><item><title><![CDATA[DoorDash Data Tech Stack]]></title><description><![CDATA[Learn about the Data Tech Stack used by DoorDash to process hundreds of Terabytes of data every day.]]></description><link>https://www.junaideffendi.com/p/doordash-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/doordash-data-tech-stack</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 26 Apr 2025 16:30:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!nzrq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>DoorDash has been a leader in the food delivery service industry, with over 5 billion consumer orders, more than $100 billion in merchant sales, and over $35 billion earned by Dashers. A key factor in their success is their data-driven approach, ingesting massive amounts of event-driven data daily to make informed decisions. </p><p>Their Data Platform is powered mainly by open-source tools like Kafka, Flink, and Delta, alongside commercial products like Sigma, which we will explore today.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nzrq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nzrq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 424w, https://substackcdn.com/image/fetch/$s_!nzrq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 848w, https://substackcdn.com/image/fetch/$s_!nzrq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 1272w, https://substackcdn.com/image/fetch/$s_!nzrq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nzrq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png" width="1456" height="844" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2524518,&quot;alt&quot;:&quot;DoorDash Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/159625272?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="DoorDash Data Tech Stack" title="DoorDash Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!nzrq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 424w, https://substackcdn.com/image/fetch/$s_!nzrq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 848w, https://substackcdn.com/image/fetch/$s_!nzrq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 1272w, https://substackcdn.com/image/fetch/$s_!nzrq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0674425f-ec26-41a5-ae32-ef97f9bf563d_2547x1477.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">DoorDash Data Tech Stack</figcaption></figure></div><blockquote><p>Content is based on multiple sources including DoorDash Engineering Blog, Careers&#8217; Page and third party articles. You will find references as you read.</p></blockquote><h3>Platform</h3><h3>AWS</h3><p>DoorDash leverage AWS as their cloud platform and utilizes several AWS services. One of the use case has been shared by AWS <a href="https://aws.amazon.com/solutions/case-studies/doordash-serverless-case-study/">here</a>.</p><h3>Storage</h3><h4>S3</h4><p>S3 is the core component of their Lakehouse, storing Petabytes of data for offline processes, heavily used in Flink and Spark processing.</p><h4>Delta</h4><p>DoorDash primarily provide Delta table format on S3 to build a strong Lakehouse architecture. However, according to this <a href="https://careersatdoordash.com/jobs/software-engineer-data-mobility/6041407/">job description</a> they also use Iceberg.</p><p>Last year, they showcased a <a href="https://www.youtube.com/watch?v=DdX2AuS4ZjQ">Flink + Delta real-time architecture</a> at the Databricks Summit.</p><h4>Pinot</h4><p>DoorDash uses Apache Pinot for real-time analytics and low-latency queries. It powers two use cases:</p><ul><li><p>Mx Portal Ads Campaign Reporting by tracking impressions, clicks, and orders.</p></li><li><p>Risk Platform dashboards built with Superset on top of Pinot Tables.</p></li></ul><blockquote><p>&#128249;Recommended Video: <a href="https://startree.ai/resources/doordash-supporting-multiple-pinot-use-cases-at-scale">Supporting Multiple Pinot Use Cases at Scale</a></p></blockquote><h4>Snowflake</h4><p>Snowflake is the Data Warehouse primarily to support reporting and metrics in Sigma to drive data-driven decision-making. Data is inserted from S3 using Snowpipe.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://www.sigmacomputing.com/customers/doordash-logs-a-30-increase-in-queries-while-keeping-the-snowflake-cost-constant-with-sigma">DoorDash Logs a 30% Increase in Queries while Keeping the Snowflake Cost Constant with Sigma</a></p></blockquote><h3>Processing</h3><h4>Kafka</h4><p>DoorDash uses Apache Kafka as a distributed event streaming platform to handle billions of real-time events. They leverage Kafka for message queuing, real-time event processing, and have implemented multi-tenancy awareness for both producers and consumes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jYZ3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jYZ3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 424w, https://substackcdn.com/image/fetch/$s_!jYZ3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 848w, https://substackcdn.com/image/fetch/$s_!jYZ3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 1272w, https://substackcdn.com/image/fetch/$s_!jYZ3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jYZ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp" width="1024" height="371" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:371,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28556,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/159625272?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jYZ3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 424w, https://substackcdn.com/image/fetch/$s_!jYZ3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 848w, https://substackcdn.com/image/fetch/$s_!jYZ3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 1272w, https://substackcdn.com/image/fetch/$s_!jYZ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6046cc6-3f57-4697-b0f4-e29d2bba9255_1024x371.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://careersatdoordash.com/blog/building-scalable-real-time-event-processing-with-kafka-and-flink/#:~:text=To%20meet%20those%20objectives%2C%20we,an%20Infrastructure%20As%20Code%20environment">source</a></figcaption></figure></div><blockquote><p>&#128214; Recommended Reading: <a href="https://careersatdoordash.com/blog/building-scalable-real-time-event-processing-with-kafka-and-flink/#:~:text=To%20meet%20those%20objectives%2C%20we,an%20Infrastructure%20As%20Code%20environment">R</a><strong><a href="https://careersatdoordash.com/blog/building-scalable-real-time-event-processing-with-kafka-and-flink/#:~:text=To%20meet%20those%20objectives%2C%20we,an%20Infrastructure%20As%20Code%20environment">eal time event processing with Kafka and Flink</a></strong></p></blockquote><h4>Flink</h4><p>Flink is used as a primary stream processing engine, processing <code>220 TB</code> per day into their data lake using the Flink Delta Sink.</p><div id="youtube2-DdX2AuS4ZjQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;DdX2AuS4ZjQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/DdX2AuS4ZjQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h4>Airflow</h4><p>DoorDash use Airflow as its data platform orchestration tool, allowing users to seamlessly orchestrate complex pipelines with Spark while accessing data from their Lakehouse.</p><h4>Spark</h4><p>Spark is the core Batch Processing Engine, enabling users to perform large scale data transformations and aggregations.</p><h4>Trino</h4><p>Trino allows DoorDash&#8217;s data team to seamlessly query data from the Lakehouse using SQL. It supports both batch jobs and ad-hoc analysis, enabling efficient exploration and processing of large datasets.</p><blockquote><p>&#128249;Recommended Video: <a href="https://www.youtube.com/watch?v=LzXkDIRBiTE">How DoorDash is Realizing a Unified Query Engine</a></p></blockquote><h3>Dashboard</h3><h4>Sigma</h4><p>Sigma is DoorDash's primary Business Intelligence tool, empowering ~12,000 internal users and 5,000 dashboards with real-time data, self-service capabilities, and advanced analytical insights.</p><h4>Superset</h4><p>DoorDash use Apache Superset for their Risk Platform, enabling real-time analytics on top of Apache Pinot, according to this <a href="https://startree.ai/resources/doordash-supporting-multiple-pinot-use-cases-at-scale">source</a>.</p><div><hr></div><p><strong>Related Content: <a href="https://www.junaideffendi.com/t/tech-stack">Tech Stack Series</a></strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a4ee1ff9-d04d-4f15-a501-51328b632f0e&quot;,&quot;caption&quot;:&quot;Netflix handle massive scale, from event data in streams to data at rest in the warehouse. Netflix data stack is pretty solid, mostly built on top of open source solutions. The data stack processes trillions of data points everyday while the scale of data at rest is in hundreds of Petabytes based on&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Netflix Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-08T16:31:10.943Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2ea4efe-5128-4330-804b-a54c2f561e08_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/netflix-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144081570,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;16b74504-9560-4e9a-948d-b4f8de29f8c9&quot;,&quot;caption&quot;:&quot;Meta is one of the largest tech companies, relying heavily on data to make informed decisions since its early days. It hosts exabyte-scale data in its warehouse while processing terabytes per second from millions of producers.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Meta Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-03-08T17:30:59.097Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/meta-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:156748730,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:7,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F430b8ab1-0a9b-4396-9e0f-e161364cb75a_350x350.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Let me know in the comments if I missed something.</p>]]></content:encoded></item><item><title><![CDATA[Inside Data Engineering with Vu Trinh]]></title><description><![CDATA[Join Vu Trinh as he navigates the world of data engineering, sharing insights, challenges, and emerging industry trends.]]></description><link>https://www.junaideffendi.com/p/inside-data-engineering-with-vu-trinh</link><guid isPermaLink="false">https://www.junaideffendi.com/p/inside-data-engineering-with-vu-trinh</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 05 Apr 2025 16:30:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bsfg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Continuing the series &#8216;Inside Data Engineering&#8217; with the second article with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Vu Trinh&quot;,&quot;id&quot;:167177248,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4805f673-db97-4f7c-85c4-44b345a8de80_256x256.png&quot;,&quot;uuid&quot;:&quot;345a2278-66b0-43cb-9dcd-5348175debc4&quot;}" data-component-name="MentionToDOM"></span> , who is a Data Engineer working in mobile gaming industry. He shares his knowledge at <a href="https://vutr.substack.com/">VuTrinh</a>. If you missed the first article from the series then checkout <a href="https://www.junaideffendi.com/p/inside-data-engineering-with-yordan?r=cqjft&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false">here</a>.</p><p>To recap: the series follows a Q&amp;A format, featuring professionals who share their journeys, insights, and challenges.</p><h3><strong>What to Expect:</strong></h3><ul><li><p><strong>Real-world insights</strong> &#8211; Learn what data engineers actually do on a daily basis.</p></li><li><p><strong>Industry trends</strong> &#8211; Stay updated on evolving technologies and best practices.</p></li><li><p><strong>Challenges </strong>&#8211; Discover what real-world challenges engineers face.</p></li><li><p><strong>Common misconceptions</strong> &#8211; Debunk myths about data engineering and clarify its role.</p></li></ul><div class="pullquote"><p>&#11088; If you're curious about data engineering or considering it as a career, this series is for you!</p></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bsfg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bsfg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!bsfg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!bsfg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!bsfg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bsfg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2943842,&quot;alt&quot;:&quot;Inside Data Engineering with Vu Trinh&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157743057?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside Data Engineering with Vu Trinh" title="Inside Data Engineering with Vu Trinh" srcset="https://substackcdn.com/image/fetch/$s_!bsfg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!bsfg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!bsfg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!bsfg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc65e0c5-8b45-489c-987e-639d844fc6be_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inside Data Engineering with Vu Trinh</figcaption></figure></div><p>Let&#8217;s dive into Inside Data Engineering:</p><h4>How would you describe Data Engineering?</h4><p>To me, data engineering is all about designing, building, and maintaining the foundation that enables efficient data storage, retrieval, and processing, ensuring that data is organized, and ready for analysis.</p><h4>How did you end up being a Data Engineer?</h4><p>Honestly, it&#8217;s a bit of an unexpected journey.</p><p>Back in 2019, I graduated in Electrical and Telecommunications but quickly realized I had no interest in working in that field. So, like many fresh grads, I went online searching for high-paying jobs, and &#8220;Data X&#8221; (whether scientist, engineer, or analyst) sounded cool.</p><p>Then, a company offered me a &#8220;data&#8221; job, even though they weren&#8217;t sure what the exact role was. (To this day, I&#8217;m not sure how I got that offer!)</p><p>But that job introduced me to Docker, Spark, HDFS, and Airflow, though mostly for building POCs. Eventually, the company shut down, forcing me to look for a new job.</p><p>That&#8217;s when I really took the time to research the differences between the &#8220;Data X&#8221; roles. And it finally clicked: <strong>I wanted to be a data engineer.</strong></p><h4>What's your day to day look like?</h4><p>I start my day by checking Slack&#8212;just in case someone is complaining about a data bug. (Kidding&#8230; mostly!)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!42IW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!42IW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!42IW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!42IW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!42IW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!42IW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png" width="538" height="310.7541208791209" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:538,&quot;bytes&quot;:157602,&quot;alt&quot;:&quot;Vu&#8217;s day to day&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157743057?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Vu&#8217;s day to day" title="Vu&#8217;s day to day" srcset="https://substackcdn.com/image/fetch/$s_!42IW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!42IW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!42IW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!42IW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4965a94-b211-4e15-aeea-02b0a62a5a3f_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Vu&#8217;s day to day</figcaption></figure></div><p>A typical day involves: </p><ul><li><p>Coding data pipelines</p></li><li><p>Fixing bugs</p></li><li><p>Helping data analysts optimize queries</p></li></ul><p>Lately, I&#8217;ve also been working on an internal data app focused on the semantic layer, which has been an interesting challenge.</p><h4>What are some stakeholders that you work with?</h4><p>Primary owners that I closely work with are Data Analysts and Project Owners.</p><h4>What kind of data do you work with?</h4><p>I work for a music game company with mobile games played globally. </p><p>Most of the data comes from: </p><ul><li><p>In-app event tracking</p></li><li><p>Third-party services that track user acquisition</p></li><li><p>Revenue collection processes</p></li></ul><p>This means handling large volumes of player interactions, engagement metrics, and monetization data to drive insights and optimize the game's performance.</p><h4>What data size do you work with?</h4><p>My day to day work involves a couple of TBs.</p><h4>What tech stack do you use?</h4><p>We primarily use BigQuery, DBT and Airflow. (I&#8217;m a fan of BigQuery)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WJ8T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WJ8T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!WJ8T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!WJ8T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!WJ8T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WJ8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png" width="601" height="347.14354395604397" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:601,&quot;bytes&quot;:1277020,&quot;alt&quot;:&quot;Vu&#8217;s tech stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157743057?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Vu&#8217;s tech stack" title="Vu&#8217;s tech stack" srcset="https://substackcdn.com/image/fetch/$s_!WJ8T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!WJ8T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!WJ8T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!WJ8T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0a7b8d-9250-4f8b-8d02-e433bd590dc0_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Vu&#8217;s tech stack</figcaption></figure></div><h4>What programming languages do you use?</h4><p>We are a Python and SQL shop.</p><h4>What tools do you leverage for GenAI?</h4><p>I primarily use ChatGPT, but I also leverage Perplexity and NotebookLM when I need deeper research.</p><h4>What is your favorite area of Data Engineering?</h4><p>My favorite thing about Data Engineering is that you never run out of things to learn. The field is constantly evolving, with new tools, architectures, and best practices emerging all the time.</p><h4>What advice would you give your past self as a beginner Data Engineer?</h4><p>Spend time understanding the real value a data engineer provides. This awareness will give you clarity on your contributions and help you avoid roles that expect you to train ML models instead. It will also guide you in prioritizing which skills to learn and which ones to skip.</p><p>Next, I&#8217;d tell myself to build strong data engineering fundamentals at all costs, especially data modeling, even if it feels tedious at first. It pays off in the long run.</p><h4>What are some challenging aspects of Data Engineering?</h4><p>There are two challenges that I can talk about:</p><ul><li><p>First, in Data Engineering there is the sheer amount of things to learn. That&#8217;s why understanding the true value a data engineer brings is crucial, it helps you focus on what&#8217;s worth mastering instead of getting lost in an endless tech stack.</p></li><li><p>Second, I observe is that many companies know they need a data team but have no clear direction on how to build or utilize it effectively.</p></li></ul><h4>What is the next big thing in Data Engineering?</h4><p>I believe Data Lake Houses will continue to grow, with table formats and query engines becoming even more efficient. The adoption of object storage as the primary storage layer will keep expanding, driven by its scalability and cost-effectiveness.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-IN4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-IN4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!-IN4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!-IN4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!-IN4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-IN4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp" width="619" height="353.7142857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:619,&quot;bytes&quot;:231448,&quot;alt&quot;:&quot;Lakehouse Image using ChatGPT&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157743057?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Lakehouse Image using ChatGPT" title="Lakehouse Image using ChatGPT" srcset="https://substackcdn.com/image/fetch/$s_!-IN4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!-IN4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!-IN4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!-IN4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8035d758-626c-44c7-8e3d-31b89748f207_1792x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Lakehouse Image using ChatGPT</figcaption></figure></div><h4>What are some common misconceptions about data engineering?</h4><p>A common misconception is that data engineering is always about big data. In reality, the core goal is to build a solid data foundation, ensuring data is stored, retrieved, and delivered efficiently for insights. </p><p>Sometimes, this means working with massive datasets, but often, you must work with <strong>"not-so-big"</strong> data.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Reach out if you like:</h3><ul><li><p>To be the guest and share your experiences &amp; journey.</p></li><li><p>To provide feedback and suggestions on how we can improve the quality of questions.</p></li><li><p>To suggest guests for the future articles.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Securely Share and Automate File Transfers with AWS Transfer Family & Terraform]]></title><description><![CDATA[Learn to deploy a secure, automated SFTP server with AWS Transfer Family & Terraform. Set up restricted users, enforce SSH & MFA, and leverage workflows for automation&#8212;scalable, secure, and efficient.]]></description><link>https://www.junaideffendi.com/p/securely-share-and-automate-file</link><guid isPermaLink="false">https://www.junaideffendi.com/p/securely-share-and-automate-file</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 22 Mar 2025 16:30:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gPx7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At some point in their career, every Data Engineer will work with SFTP servers&#8212;either fetching files from a vendor-provided SFTP or hosting one for vendors. Managing and securing these transfers can be challenging, especially at scale. AWS Transfer Family simplifies this process by providing a fully managed, scalable SFTP solution.</p><p>Today, we&#8217;ll explore how to set up and secure an SFTP server using AWS Transfer Family. We&#8217;ll cover:</p><ul><li><p>Deploying an SFTP server</p></li><li><p>Setting up users with read &amp; write restricted permissions</p></li><li><p>Securing onboarding with SSH (MFA approach shared)</p></li><li><p>Leveraging Transfer Family Workflows for automation</p></li></ul><p>By the end, you&#8217;ll have a secure, scalable, and automated solution for managing file transfers efficiently.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gPx7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gPx7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!gPx7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!gPx7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!gPx7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gPx7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:527305,&quot;alt&quot;:&quot;Securely Share and Automate File Transfers with AWS Transfer Family&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/158720772?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Securely Share and Automate File Transfers with AWS Transfer Family" title="Securely Share and Automate File Transfers with AWS Transfer Family" srcset="https://substackcdn.com/image/fetch/$s_!gPx7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!gPx7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!gPx7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!gPx7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2bb9fc59-9d8a-48ba-8974-6000a49d62b6_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Securely Share and Automate File Transfers with AWS Transfer Family</figcaption></figure></div><div class="pullquote"><p>&#11088; SFTP works seamlessly with S3 or EFS.</p></div><h4>Prerequisites</h4><p>Before we dive into AWS Transfer Family, following must be created:</p><ul><li><p>Create S3 Bucket for storing the files</p></li><li><p>Create Lambda Function for downstream processing</p></li><li><p>Create IAM role with access to S3 and Lambda for the Workflow</p></li><li><p>Create IAM role with restricted access to S3 for SFTP user</p><ul><li><p>Restricting at the prefix level e.g. &lt;bucket&gt;/username/</p></li></ul></li></ul><h3>Creating SFTP Server </h3><p>Creating a managed server with Public endpoint and workflow details.</p><pre><code>resource "aws_transfer_server" "sftp_server" {
  domain                 = "S3"
  identity_provider_type = "SERVICE_MANAGED"
  endpoint_type          = "PUBLIC"
  protocols              = ["SFTP"]

  workflow_details {
    on_upload {
      execution_role = &lt;workflow_iam_role_arm&gt;
      workflow_id    = aws_transfer_workflow.sftp_workflow.id
    }
  }
}</code></pre><ul><li><p><code>&lt;workflow_iam_role_arn&gt;</code>: Provide the role arn that has S3 and Lambda access</p></li></ul><h3>Creating SFTP User with SSH Access</h3><p>Creating a user with S3 directory mapping, this allows user to only have the required permission to that prefix.</p><pre><code>resource "aws_transfer_user" "sftp_user" {
  server_id = aws_transfer_server.sftp_server.id
  user_name = "&lt;username&gt;"
  role      = &lt;sftp_user_iam_role_arn&gt;
  home_directory_type = "LOGICAL"

  home_directory_mappings {
    entry  = "/test"
    target = "/&lt;bucket_name&gt;/&lt;username&gt;/"
  }
}

resource "aws_transfer_ssh_key" "sftp_ssh" {
  server_id = aws_transfer_server.sftp_server.id
  user_name = aws_transfer_user.sftp_user.user_name
  body      = &lt;ssh_public_key&gt;
}</code></pre><ul><li><p><code>&lt;username&gt;</code>: Provide the user name</p></li><li><p><code>&lt;sftp_user_iam_role_arn&gt;</code>: Provide the role arn that has S3 access restricted access</p></li><li><p><code>&lt;bucket_name&gt;</code>: Provide bucket name</p></li><li><p><code>&lt;ssh_public_key&gt;</code>: Provide the public key (generate using this <a href="https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent?platform=mac">guide</a>)</p><ul><li><p>In real world, vendor can generate key pair on their end and send over the public key.</p></li></ul></li></ul><p>For advance authentication mechanisms like Multi Factor (MFA), you can leverage a Lambda function as recommended by AWS. </p><blockquote><p>&#128214; Recommended Reading: <a href="https://aws.amazon.com/blogs/storage/implement-multi-factor-authentication-based-managed-file-transfer-using-aws-transfer-family-and-aws-secrets-manager/">Implement multi-factor authentication based managed file transfer using AWS Transfer Family and AWS Secrets Manager</a></p></blockquote><h3>Test connection</h3><p>Using SSH based approach it is pretty straight forward to log in to sftp server.</p><pre><code>ssh -i &lt;ssh_private_key&gt; &lt;username&gt;@&lt;host_address&gt;</code></pre><ul><li><p><code>&lt;ssh_private_key&gt;</code>: Provide the generated key from previous section.</p></li><li><p><code>&lt;username&gt;</code>: Provide the user name from the previous section.</p></li><li><p><code>&lt;host_address&gt;</code>: Provide the host address that can be found from the AWS sftp transfer family console.</p></li></ul><p>For non technical users, Transfer Family offers a managed web based interface for file transfers. Explore more <a href="https://aws.amazon.com/aws-transfer-family/web-apps/">here</a>.</p><h3>Setting up Workflows to trigger downstream jobs </h3><p>Workflows are great pre configured series of step that are part of the Transfer Family ecosystem. It allows you to perform several fully managed functionality on the files that you receive from SFTP right away.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cpQL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cpQL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!cpQL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!cpQL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!cpQL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cpQL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png" width="617" height="356.3853021978022" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:617,&quot;bytes&quot;:130813,&quot;alt&quot;:&quot;Predefined steps for workflow&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/158720772?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Predefined steps for workflow" title="Predefined steps for workflow" srcset="https://substackcdn.com/image/fetch/$s_!cpQL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!cpQL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!cpQL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!cpQL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F023ecd76-85c5-4322-b13e-4970623df188_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Predefined steps for workflow</figcaption></figure></div><p>In our case, we do three steps:</p><ul><li><p>Copy to internal S3 bucket with limited access, its recommended since the SFTP backed S3 has external access through SFTP.</p></li><li><p>Tag S3 objects depending on your usecase</p></li><li><p>Trigger a Lambda function for downstream processing and custom logic. e.g. sending email notification. </p></li></ul><blockquote><p>&#128214; Recommended Reading: <a href="https://docs.aws.amazon.com/transfer/latest/userguide/custom-step-details.html#example-workflow-lambda">How to return the response to workflow from Lambda Function</a></p></blockquote><pre><code>resource "aws_transfer_workflow" "sftp_workflow" {

  steps {
    type = "COPY"

    copy_step_details {
      destination_file_location {
        s3_file_location {
          bucket  = &lt;bucket_name&gt;
          key     = &lt;prefix_path&gt;
        }
      }
      overwrite_existing = "TRUE"
    }
  }

  steps {
    type = "TAG"

    tag_step_details {
      tags {
        key   = &lt;key&gt;
        value = &lt;value&gt;
      }
    }
  }

  steps {
    type = "CUSTOM"
    
    lambda_step_details {
      target = &lt;lambda_function_arn&gt;
    }
  }
}</code></pre><ul><li><p><code>&lt;bucket_name&gt;</code>: Provide the destination bucket name</p></li><li><p><code>&lt;prefix_path&gt;</code>: Provide the destination prefix path for the object</p></li><li><p><code>&lt;key&gt; &amp; &lt;value&gt;</code>: Provide the key and value for tagging</p></li><li><p><code>&lt;lambda_function_arn&gt;</code>: Provide arn for the Lambda function</p></li></ul><p>First step reference the original file while the corresponding steps automatically reference the file from previous step, it can be overwritten if needed.</p><p>Alternatively, you can skip the workflow setup by handling copy and tagging directly in your Lambda function, triggering it via SNS on an S3 drop. However, this approach falls outside the Transfer Family.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>I hope you found this article useful. With it, you&#8217;ve now learned how to develop a secure, scalable, and automated solution for efficiently managing file transfers in the AWS cloud environment.</p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Meta Data Tech Stack]]></title><description><![CDATA[Learn what data tech stack Meta leverages to process and store massive amount of data every day in their data centers.]]></description><link>https://www.junaideffendi.com/p/meta-data-tech-stack</link><guid isPermaLink="false">https://www.junaideffendi.com/p/meta-data-tech-stack</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 08 Mar 2025 17:30:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XPdX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Meta is one of the largest tech companies, relying heavily on data to make informed decisions since its early days. It hosts exabyte-scale data in its warehouse while processing terabytes per second from millions of producers.</p><p>Meta has open-sourced several tools like Hive and Presto, while others remain internal&#8212;some of which we will discuss in today&#8217;s article.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XPdX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XPdX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 424w, https://substackcdn.com/image/fetch/$s_!XPdX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 848w, https://substackcdn.com/image/fetch/$s_!XPdX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 1272w, https://substackcdn.com/image/fetch/$s_!XPdX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XPdX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png" width="1456" height="932" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:932,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1568651,&quot;alt&quot;:&quot;Meta Data Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Meta Data Tech Stack" title="Meta Data Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!XPdX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 424w, https://substackcdn.com/image/fetch/$s_!XPdX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 848w, https://substackcdn.com/image/fetch/$s_!XPdX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 1272w, https://substackcdn.com/image/fetch/$s_!XPdX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4dc4db7-ec65-47e7-a64d-dd7e758532f3_2547x1631.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Meta Data Tech Stack (Icon source: flaticon.com)</figcaption></figure></div><blockquote><p>Content is based on multiple sources including Meta Engineering Blog, Meta Research Papers, third party articles. You will find references as you read.</p></blockquote><div class="pullquote"><p>Today&#8217;s post is brought to you by <a href="https://dub.sh/XIP03kn">Multiplayer</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://dub.sh/XIP03kn" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dngB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 424w, https://substackcdn.com/image/fetch/$s_!dngB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 848w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1272w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png" width="481" height="180.375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:800,&quot;resizeWidth&quot;:481,&quot;bytes&quot;:39352,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://dub.sh/XIP03kn&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dngB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 424w, https://substackcdn.com/image/fetch/$s_!dngB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 848w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1272w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p><em>Multiplayer's makes debugging distributed systems easier with deep session replays. From frontend screens to backend traces, metrics, and logs, you have every detail you need to find and fix a bug in one place.</em></p></div><h3>Platform</h3><h4>On Premise</h4><p>Since its inception in 2004, Meta has operated out of its on-premise data centers spread across the globe, including the United States, Europe, and Asia-Pacific.</p><blockquote><p>&#128161;Meta has 35 data centers as per this <a href="https://www.datacentermap.com/c/meta/">source</a>.</p></blockquote><h3>Storage</h3><h4>Hive</h4><p>Hive was created at Meta and open-sourced in 2008. It is an exabyte-scale data warehouse storing millions of tables across multiple data centers. Hive leverages the <a href="https://orc.apache.org/">ORC</a> columnar format to efficiently store massive datasets.</p><h4>Scuba</h4><p>Scuba is an in-house tool designed for real-time data analysis. It provides multiple ways to access data at high speed through a UI or programmatic interfaces. Scuba excels at handling ad-hoc queries, with most responses returning in under a second.</p><blockquote><p>&#128161;Scuba is like Apache Druid at a very high level, read more <a href="https://imply.io/blog/reflecting-on-druid/">here</a>.</p></blockquote><h4>Laser</h4><p>Laser is a high-throughput, low-latency key-value storage service built on RocksDB. It reads data in real-time from Scribe streams and daily from Hive tables. It powers the Facebook product and integrates with apps like Puma and Stylus.</p><blockquote><p>&#128214; Recommended reading: <a href="https://research.facebook.com/publications/realtime-data-processing-at-facebook/">Realtime Data Processing at Facebook</a></p></blockquote><h3>Processing</h3><h4>Scribe</h4><p>Scribe processes over 2.5 TB per second of input from millions of producers and outputs 7+ TB per second to hundreds of thousands of consumers, primarily sending data to Scuba and Hive. Apache Kafka is an open-source alternative to Scribe.</p><blockquote><p>&#128214; Recommended Reading: <a href="https://engineering.fb.com/2019/10/07/core-infra/scribe/">Scribe: Transporting petabytes per hour via a distributed, buffered queueing system</a></p></blockquote><h4>Puma / Swift / Stylus</h4><p>Meta Engineering has built three in-house tools to read and write data in real-time to and from Scribe.</p><p>The following help you understand where these components sit in the system.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4T3p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4T3p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 424w, https://substackcdn.com/image/fetch/$s_!4T3p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 848w, https://substackcdn.com/image/fetch/$s_!4T3p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 1272w, https://substackcdn.com/image/fetch/$s_!4T3p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4T3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png" width="600" height="316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:316,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34036,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4T3p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 424w, https://substackcdn.com/image/fetch/$s_!4T3p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 848w, https://substackcdn.com/image/fetch/$s_!4T3p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 1272w, https://substackcdn.com/image/fetch/$s_!4T3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F011746f3-9057-417d-b00c-14e5acbd19c1_600x316.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image Source: https://research.facebook.com/publications/realtime-data-processing-at-facebook/</figcaption></figure></div><p><strong>Puma:</strong></p><ul><li><p>Puma is a stream processing system where apps are written in a SQL-like language with Java UDFs. </p></li><li><p>Puma processes Scribe streams with a few seconds' delay, outputting to another stream, real-time processor, or data store. </p></li><li><p>It's optimized for compiled queries, not ad-hoc analysis.</p></li></ul><p><strong>Swift:</strong></p><ul><li><p>Swift is a basic stream processing engine providing checkpointing for Scribe. </p></li><li><p>It offers a simple API to read streams with checkpoints, allowing apps to restart from the latest checkpoint. </p></li><li><p>Swift is ideal for low-throughput, stateless processing, with client apps often written in scripting languages like Python.</p></li></ul><p><strong>Stylus:</strong></p><ul><li><p>Stylus is a low-level stream processing framework written in C++.</p></li><li><p>It supports both stateless and stateful processors. </p></li><li><p>Its processing API is similar to other procedural stream processing systems.</p></li></ul><h4>Presto</h4><p>Presto, open sourced by Meta in 2013, is a successful big data product widely used for SQL-based data processing. At Meta, it enables fast data access, ranging from seconds to minutes, for billions of records.</p><div class="pullquote"><p>&#11088; If you are interested in learning how big data tech evolved over the years then read <a href="https://www.junaideffendi.com/p/data-processing-in-21st-century?r=cqjft&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false">Data Processing in 21st Century</a>.</p></div><h4>Spark</h4><p>Spark is another query engine offering an alternative to Presto. Unlike Presto, it provides greater flexibility, allowing users to leverage Java, Scala, and Python APIs for complex transformations.</p><h4>Dataswarm</h4><p>Dataswarm is an in-house orchestration tool similar to Apache Airflow, built in Python. It enables job orchestration and scheduling in a DAG-based pipeline and supports Presto and Spark jobs.</p><blockquote><p>&#128249; Recommended video: <a href="https://www.youtube.com/watch?v=M0VCbhfQ3HQ">Dataswarm</a></p></blockquote><h3>Dashboard</h3><h4>UniDash</h4><p>UniDash is an in-house visualization tool for creating dashboards, accessible via a web interface or Python API. It is powered by Presto and includes an additional caching layer through RaptorX.</p><blockquote><p>&#128214; Read More: <strong><a href="https://medium.com/@AnalyticsAtMeta/data-engineering-at-meta-high-level-overview-of-the-internal-tech-stack-a200460a44fe">High-Level Overview of the internal tech stack</a></strong><a href="https://medium.com/@AnalyticsAtMeta/data-engineering-at-meta-high-level-overview-of-the-internal-tech-stack-a200460a44fe"> </a></p></blockquote><div><hr></div><p><strong>Related Content: <a href="https://www.junaideffendi.com/t/tech-stack">Tech Stack Series</a></strong></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f9395de1-8383-4bce-8d73-5502522de193&quot;,&quot;caption&quot;:&quot;Pinterest, a tech company, processes enormous amounts of data daily. According to a 2014 article, this was around 20TB. Over a decade later, that number has significantly increased, with its S3 data lake now reportedly reaching exabyte scale, as mentioned in this&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Pinterest Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-01-18T17:30:54.398Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f8d7f-e946-4295-8617-62ee18e9091d_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/pinterest-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:153465107,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:11,&quot;comment_count&quot;:2,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;8489af6a-1e3a-4284-9e19-c9799ff0e049&quot;,&quot;caption&quot;:&quot;Netflix handle massive scale, from event data in streams to data at rest in the warehouse. Netflix data stack is pretty solid, mostly built on top of open source solutions. The data stack processes trillions of data points everyday while the scale of data at rest is in hundreds of Petabytes based on&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Netflix Data Tech Stack&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-08T16:31:10.943Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2ea4efe-5128-4330-804b-a54c2f561e08_2547x1477.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.junaideffendi.com/p/netflix-data-tech-stack&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144081570,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:19,&quot;comment_count&quot;:4,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p>&#128172; Let me know in the comments if I missed something.</p>]]></content:encoded></item><item><title><![CDATA[Inside Data Engineering with Yordan Ivanov]]></title><description><![CDATA[Join Yordan Ivanov as he demystifies data engineering, clarifying misconceptions about the role.]]></description><link>https://www.junaideffendi.com/p/inside-data-engineering-with-yordan</link><guid isPermaLink="false">https://www.junaideffendi.com/p/inside-data-engineering-with-yordan</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Wed, 26 Feb 2025 17:30:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!W-1U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, I&#8217;m kicking off a new series where we dive into the world of Data Engineering, straight from the perspective of experienced data engineers.</p><p>This series follows a Q&amp;A format, featuring professionals from around the world.</p><h3><strong>What to Expect:</strong></h3><ul><li><p><strong>Real-world insights</strong> &#8211; Learn what data engineers actually do on a daily basis.</p></li><li><p><strong>Career advice</strong> &#8211; Get guidance on skills, tools, and pathways to break into the field.</p></li><li><p><strong>Industry trends</strong> &#8211; Stay updated on evolving technologies and best practices.</p></li><li><p><strong>Challenges </strong>&#8211; Discover what real-world challenges engineers face.</p></li><li><p><strong>Common misconceptions</strong> &#8211; Debunk myths about data engineering and clarify its role.</p></li><li><p><strong>Inspiration from experts</strong> &#8211; Hear personal stories from seasoned professionals.</p></li></ul><div class="pullquote"><p>&#11088; If you're curious about data engineering or considering it as a career, this series is for you!</p></div><p>Today, we have <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Yordan Ivanov&quot;,&quot;id&quot;:40945395,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76f52904-5428-4d97-82a5-3faa722b8d46_2234x1253.jpeg&quot;,&quot;uuid&quot;:&quot;e37117ae-cbf9-4f9b-b7fa-0d07bf890851&quot;}" data-component-name="MentionToDOM"></span>, who has been in the Data Engineering space for many years. He also writes about Data Engineering at <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Data Gibberish&quot;,&quot;id&quot;:828483,&quot;type&quot;:&quot;pub&quot;,&quot;url&quot;:&quot;https://open.substack.com/pub/datagibberish&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7990f52-6d5e-41f3-8089-580ce2167837_500x500.png&quot;,&quot;uuid&quot;:&quot;cc17f502-496b-40c1-9fa6-6bace7e67bd7&quot;}" data-component-name="MentionToDOM"></span>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W-1U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W-1U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!W-1U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!W-1U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!W-1U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W-1U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3101605,&quot;alt&quot;:&quot;Inside Data Engineering with Yordan Ivanov&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157683503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside Data Engineering with Yordan Ivanov" title="Inside Data Engineering with Yordan Ivanov" srcset="https://substackcdn.com/image/fetch/$s_!W-1U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!W-1U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!W-1U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!W-1U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba009be-8504-4e29-b911-a450c97cd57e_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Inside Data Engineering with Yordan Ivanov</figcaption></figure></div><p>Let&#8217;s dive into Inside Data Engineering:</p><h4>How would you describe Data Engineering?</h4><p>Data engineering is like any engineering discipline. It&#8217;s about designing, building, and maintaining systems. Instead of roads or bridges, we build the infrastructure that moves, processes, and organises data so businesses can make informed decisions. It&#8217;s a mix of software engineering, systems architecture, and business understanding.</p><h4>How did you end up being a Data Engineer?</h4><p>I started as a software engineer, but I was always drawn to the bigger picture&#8212;how systems fit together and how businesses use technology to drive decisions. Over time, I moved into data engineering because it combined the technical challenge of software with the strategic impact of data. The truth is that I spent more than 1.5 years in data engineering before realising I am not a software engineer anymore.</p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:148079226,&quot;url&quot;:&quot;https://www.junaideffendi.com/p/transition-software-engineer-to-data&quot;,&quot;publication_id&quot;:2256445,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;title&quot;:&quot;Transition: Software Engineer to Data Engineer&quot;,&quot;truncated_body_text&quot;:&quot;Last month, I shared article on Types of Data Engineers with the goal to start a series that will cover the transition from various roles to Data Engineering. Today I am kicking off the series with Software to Data Engineering Transition.&quot;,&quot;date&quot;:&quot;2024-08-31T16:30:43.327Z&quot;,&quot;like_count&quot;:31,&quot;comment_count&quot;:4,&quot;bylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;handle&quot;:&quot;junaideffendi&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;profile_set_up_at&quot;:&quot;2022-05-25T22:34:01.768Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2273688,&quot;user_id&quot;:21393641,&quot;publication_id&quot;:2256445,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:2256445,&quot;name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;subdomain&quot;:&quot;junaideffendi&quot;,&quot;custom_domain&quot;:&quot;www.junaideffendi.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Covering tech, career, data, growth experiences from my journey.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;author_id&quot;:21393641,&quot;theme_var_background_pop&quot;:&quot;#8AE1A2&quot;,&quot;created_at&quot;:&quot;2024-01-13T20:16:55.701Z&quot;,&quot;email_from_name&quot;:&quot;Junaid Effendi&quot;,&quot;copyright&quot;:&quot;Junaid Effendi&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.junaideffendi.com/p/transition-software-engineer-to-data?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!iYad!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png" loading="lazy"><span class="embedded-post-publication-name">Junaid Effendi | Sharing knowledge for Engineers</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Transition: Software Engineer to Data Engineer</div></div><div class="embedded-post-body">Last month, I shared article on Types of Data Engineers with the goal to start a series that will cover the transition from various roles to Data Engineering. Today I am kicking off the series with Software to Data Engineering Transition&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 years ago &#183; 31 likes &#183; 4 comments &#183; Junaid Effendi</div></a></div><h4>What excites you about Data Engineering?</h4><p>The mix of problem-solving, business impact, and continuous learning. Data engineering isn&#8217;t just about writing code&#8212;it&#8217;s about designing systems that power decision-making. I enjoy bridging the gap between tech and business, figuring out the best architecture, and helping companies make better use of their data.</p><h4>What is your day to day look like?</h4><p>A mix of technical work, problem-solving, and collaboration. I spend time designing and reviewing architecture, writing or reviewing code, and troubleshooting issues. But I also work closely with stakeholders&#8212;understanding their needs, aligning on priorities, and making sure our solutions fit the business. As a leader, I also focus on strategy, mentoring, and making sure the team is working effectively.</p><p>Unfortunately, I rarely spend over 20% of my time writing code myself, nowadays.</p><h4>What are some stakeholders that you work with?</h4><p>Analysts, data scientists, finance teams, and executives. A big part of data engineering is not just building pipelines but ensuring that the right people get the right data in a way that makes sense for their needs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yrzS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yrzS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!yrzS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!yrzS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!yrzS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yrzS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png" width="513" height="296.3138736263736" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/421ec347-219f-47a4-92fd-20a688187084_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:513,&quot;bytes&quot;:150468,&quot;alt&quot;:&quot;Yordan&#8217;s stakeholders&quot;,&quot;title&quot;:&quot;Yordan&#8217;s stakeholders&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157683503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Yordan&#8217;s stakeholders" title="Yordan&#8217;s stakeholders" srcset="https://substackcdn.com/image/fetch/$s_!yrzS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!yrzS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!yrzS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!yrzS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F421ec347-219f-47a4-92fd-20a688187084_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Yordan&#8217;s stakeholders</figcaption></figure></div><h4>What kind of projects do you work on?</h4><p>I work on building and optimising data infrastructure, ensuring reliable data pipelines, and designing systems that support analytics and reporting. This includes integrating data from different sources, managing transformations, and improving data accessibility for business teams.</p><p>Some of the big projects I am involved in at the moment are:</p><ul><li><p>Helping with the migration and data remodeling of our billing tooling. It means more flexibility for the business to build packages that work better for our customers.</p></li><li><p>Restructuring the internal tooling and processes for the Data and Analytics team. With this one, I aim for simplicity and better alignment within the team.</p></li></ul><blockquote><p>&#128214; Related Reading: <a href="https://www.junaideffendi.com/p/large-scale-migration-best-practices?r=cqjft&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false">Large Scale Migration Best Practices</a></p></blockquote><h4>What kind of data do you work with?</h4><p>Mostly structured and semi-structured data&#8212;things like financial transactions, user behaviour, and operational data. Since I work at a fintech company, the data is often tied to business performance, financial metrics, and customer interactions.</p><p>Aside from that, my side gig allows me to work on different niches like crypto and marketing.</p><h4>What size of data do you work with?</h4><p>It varies, but typically in the terabytes range. Overall, my team and I are responsible for over <code>100 terabytes</code> of data in Snowflake. But it&#8217;s not about the size. The focus is more on efficiency, quality, and usability rather than just handling extreme scale.</p><h4>What tech stack do you use?</h4><p>Meltano for data ingestion, dbt for transformations, Airflow for orchestration, Synq for Data Quality, Looker for data visualisation and Snowflake as the data warehouse. I also work with Python, SQL, Git for development. My stack is focused on modern, scalable, and maintainable data engineering practices.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1VAv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1VAv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!1VAv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!1VAv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!1VAv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1VAv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png" width="557" height="321.7287087912088" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:557,&quot;bytes&quot;:1447335,&quot;alt&quot;:&quot;Yordan&#8217;s Tech Stack&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157683503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Yordan&#8217;s Tech Stack" title="Yordan&#8217;s Tech Stack" srcset="https://substackcdn.com/image/fetch/$s_!1VAv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 424w, https://substackcdn.com/image/fetch/$s_!1VAv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 848w, https://substackcdn.com/image/fetch/$s_!1VAv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 1272w, https://substackcdn.com/image/fetch/$s_!1VAv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c0f5d67-346a-44c8-98a2-d7c793169780_2367x1368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Yordan&#8217;s Tech Stack</figcaption></figure></div><h4>What tools do you leverage for GenAI?</h4><p>I use most of them. Perplexity was my default search engine for a long time. I also have some custom GPTs on ChatGPT and use Claude for coding.</p><p>Two tools I highly recommend are Ollama and LM Studio for anybody who wants to run models locally and not expose any data.</p><h4>What is your favourite area of Data Engineering?</h4><p>Architecture and systems design. I enjoy thinking about how different components fit together and how to build scalable, maintainable systems. I also like the intersection of data engineering and business&#8212;understanding how data can provide value rather than just focusing on the technical side.</p><h4>How can Data Engineering benefit from GenAI?</h4><p>The sky's the limit. As a Head of Data Engineering, I use AI every day to:</p><ul><li><p>Write code</p></li><li><p>Draw architecture diagrams</p></li><li><p>Brainstorm architecture ideas</p></li><li><p>Summarise meetings and whitepapers</p></li><li><p>Write technical and non-technical documentation</p></li></ul><div class="pullquote"><p>Today&#8217;s post is brought to you by <a href="https://dub.sh/A6vAbUl">Multiplayer</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://dub.sh/A6vAbUl" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dngB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 424w, https://substackcdn.com/image/fetch/$s_!dngB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 848w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1272w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png" width="481" height="180.375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:800,&quot;resizeWidth&quot;:481,&quot;bytes&quot;:39352,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://dub.sh/A6vAbUl&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dngB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 424w, https://substackcdn.com/image/fetch/$s_!dngB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 848w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1272w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>With Multiplayer you can build better distributed systems: automatically document your system architecture, effortlessly debug with deep session replays, and collaboratively design your system. Why are you still wasting time drawing manual diagrams?</em></p></div><h4>What is the next big thing in Data Engineering?</h4><p>AI will automate repetitive tasks, allowing engineers to focus on solving complex problems and designing efficient systems. However, this shift will raise the skill bar for entry-level roles, demanding stronger capabilities from aspiring engineers.</p><p>At the same time, companies are adopting flexible hiring models, favouring consultants and freelancers for their expertise. This creates new opportunities for engineers who can adapt quickly and deliver value.</p><h4>What advice would you give your past self as a beginner Data Engineer?</h4><p>Focus on the fundamentals first&#8212;SQL, Python, and understanding how data moves. But don&#8217;t just stay technical. Learn how businesses use data and how to communicate with stakeholders.</p><p>Also, don&#8217;t wait too long to build a personal brand. Data engineers often get stuck in the background, but being visible can open a lot of doors.</p><blockquote><p>&#128161;What was enough 5 years ago won&#8217;t cut it today.</p></blockquote><h4>What are some challenging aspects of Data Engineering?</h4><ul><li><p><strong>Aligning with business needs</strong> &#8211; Data engineers often get stuck building things without clear business impact.</p></li><li><p><strong>Managing complexity</strong> &#8211; As systems grow, keeping pipelines reliable and maintainable is a constant challenge.</p></li><li><p><strong>Tech hype vs. reality</strong> &#8211; Many tools promise to solve all problems but introduce new complexities. Knowing what to adopt and when is crucial.</p></li><li><p><strong>Trust and data quality</strong> &#8211; Data is only useful if people trust it. Ensuring accuracy and explaining how data is derived is a key challenge.</p></li></ul><h4>What are some common misconceptions about data engineering?</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z16y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z16y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!z16y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!z16y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!z16y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z16y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp" width="622" height="355.42857142857144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:622,&quot;bytes&quot;:236414,&quot;alt&quot;:&quot;Common misconceptions about data engineering&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.junaideffendi.com/i/157683503?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Common misconceptions about data engineering" title="Common misconceptions about data engineering" srcset="https://substackcdn.com/image/fetch/$s_!z16y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!z16y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!z16y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!z16y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0dfca39-8eb8-4d92-b73d-738a1f10021f_1792x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Common misconceptions about data engineering</figcaption></figure></div><ul><li><p><strong>It&#8217;s just ETL</strong> &#8211; Data engineering is much more than moving data. It&#8217;s about designing systems, managing infrastructure, and ensuring data is usable.</p></li><li><p><strong>Self-service BI will replace data engineers</strong> &#8211; Most stakeholders need guidance, and self-service tools rarely eliminate the need for proper data engineering.</p></li><li><p><strong>Big data is everywhere</strong> &#8211; Not every company needs a petabyte-scale solution. Many real-world problems can be solved with simpler, well-designed systems.</p></li><li><p><strong>More tools = better data</strong> &#8211; Tools don&#8217;t fix bad processes. Good data engineering is about understanding business needs, not just stacking technologies.</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Reach out if you like:</h3><ul><li><p>To be the guest and share your experiences &amp; journey.</p></li><li><p>To provide feedback and suggestions on how we can improve the quality of questions.</p></li><li><p>To suggest guests for the future articles.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[7 Ways to Share Knowledge for Continuous Learning]]></title><description><![CDATA[This article outlines seven effective approaches to sharing knowledge, helping you become a better engineer and a team player while fostering continuous learning and collaboration within teams.]]></description><link>https://www.junaideffendi.com/p/7-ways-to-share-knowledge-for-continuous</link><guid isPermaLink="false">https://www.junaideffendi.com/p/7-ways-to-share-knowledge-for-continuous</guid><dc:creator><![CDATA[Junaid Effendi]]></dc:creator><pubDate>Sat, 15 Feb 2025 17:30:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!X5ky!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the last article, we discussed the <a href="https://www.junaideffendi.com/p/building-a-collaborative-engineering">importance of teams and effective collaboration</a>. Today, we dive deeper into the related topic of knowledge sharing, a challenge teams face in sharing information effectively to keep the bus factor low.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X5ky!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X5ky!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!X5ky!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!X5ky!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!X5ky!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X5ky!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:443218,&quot;alt&quot;:&quot;Each box representing 1 strategy. Once you apply all 7 you reach the top. The learning does not stop there.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Each box representing 1 strategy. Once you apply all 7 you reach the top. The learning does not stop there." title="Each box representing 1 strategy. Once you apply all 7 you reach the top. The learning does not stop there." srcset="https://substackcdn.com/image/fetch/$s_!X5ky!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 424w, https://substackcdn.com/image/fetch/$s_!X5ky!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 848w, https://substackcdn.com/image/fetch/$s_!X5ky!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 1272w, https://substackcdn.com/image/fetch/$s_!X5ky!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F367f676a-ef96-4281-8d02-8f90266ea7dd_2547x1532.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Each box representing 1 strategy. Once you apply all 7 you reach the top. The learning does not stop there.</figcaption></figure></div><div class="pullquote"><p>&#128161;Knowledge sharing is the practice of keeping everyone informed about the happenings within a team, organization, or company.</p></div><p>If you don&#8217;t have the culture of knowledge sharing you can end up with variety of problems as <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Addy Osmani&quot;,&quot;id&quot;:11623675,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cee7ba66-e656-4450-a0ed-c951c27ee228_1080x1080.jpeg&quot;,&quot;uuid&quot;:&quot;f2c5e4c9-1113-4f23-bd40-2d37d75bce09&quot;}" data-component-name="MentionToDOM"></span> shared in his book Leading Effective Engineering Teams (Page 156), listed below:</p><div class="pullquote"><p><strong>Single points of failure</strong></p><p>If these individuals were to leave the team or become unavailable, their absence could disrupt the project and hinder its progress.</p><p><strong>Dependency</strong></p><p>Team members may become overly reliant on the expertise of these individuals, stunting their own growth and problem-solving abilities.</p><p><strong>Knowledge silos</strong></p><p>Information becomes trapped within a limited group, preventing the broader team from understanding and contributing to critical areas.</p><p><strong>Communication gaps</strong></p><p>Lack of knowledge distribution can lead to miscommunication and misunderstandings between different project parts.</p></div><p>Lets dive into the 7 ways to prevent above issues from happening.</p><h3>1-on-1 Sessions</h3><p>1-on-1 sessions are dedicated opportunities for personalized interactions with various members, such as managers or cross-team engineers. These sessions can occur during onboarding, routine check-ins, or on-demand for specific projects like pair programming.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Facilitate focused and tailored knowledge exchange, such as exploring a system, function, or codebase in depth.</p></li><li><p>Leverage these sessions for mentorship, constructive feedback, or transferring specialized expertise.</p></li><li><p>Foster open dialogue to address challenges, brainstorm solutions, and enhance collaboration.</p></li></ul><h3>Group Chats</h3><p>Group chat is another common way in most companies to share knowledge, usually done through a asynchronous communication tool like Slack.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Enable async/non urgent sharing of ideas, solutions, and best practices through dedicated groups.</p></li><li><p>Use topic-specific channels to centralize knowledge and keep it organized, e.g. Project based channels.</p></li><li><p>Encourage peer-to-peer learning by creating an open forum for questions and answers.</p></li></ul><h3>Tech Talks</h3><p>The idea is to share knowledge among different teams by showing what your team has been doing on a higher level from design to architecture that can be beneficial to other.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Provide a platform for team members to share their expertise on relevant topics.</p></li><li><p>Create opportunities to learn from diverse perspectives and experiences through Q/A sessions.</p></li><li><p>Build a repository of recorded sessions for on-demand knowledge access.</p></li></ul><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:140720591,&quot;url&quot;:&quot;https://www.junaideffendi.com/p/why-tech-talks-are-important&quot;,&quot;publication_id&quot;:2256445,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;title&quot;:&quot;Why Tech Talks are Important&quot;,&quot;truncated_body_text&quot;:&quot;Today&#8217;s article I will be sharing why presenting at tech talks is important for you and how it plays an important role in your career growth. Lets first see what tech talks are: Tech Talks are deep technical discussion forums within the organization or company, the purpose is to share knowledge across teams so they can better work together in the long r&#8230;&quot;,&quot;date&quot;:&quot;2024-02-27T17:30:20.052Z&quot;,&quot;like_count&quot;:13,&quot;comment_count&quot;:2,&quot;bylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;handle&quot;:&quot;junaideffendi&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;profile_set_up_at&quot;:&quot;2022-05-25T22:34:01.768Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2273688,&quot;user_id&quot;:21393641,&quot;publication_id&quot;:2256445,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:2256445,&quot;name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;subdomain&quot;:&quot;junaideffendi&quot;,&quot;custom_domain&quot;:&quot;www.junaideffendi.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Covering tech, career, data, growth experiences from my journey.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;author_id&quot;:21393641,&quot;theme_var_background_pop&quot;:&quot;#8AE1A2&quot;,&quot;created_at&quot;:&quot;2024-01-13T20:16:55.701Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:&quot;Junaid Effendi&quot;,&quot;copyright&quot;:&quot;Junaid Effendi&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.junaideffendi.com/p/why-tech-talks-are-important?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!iYad!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png" loading="lazy"><span class="embedded-post-publication-name">Junaid Effendi | Sharing knowledge for Engineers</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Why Tech Talks are Important</div></div><div class="embedded-post-body">Today&#8217;s article I will be sharing why presenting at tech talks is important for you and how it plays an important role in your career growth. Lets first see what tech talks are: Tech Talks are deep technical discussion forums within the organization or company, the purpose is to share knowledge across teams so they can better work together in the long r&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">2 years ago &#183; 13 likes &#183; 2 comments &#183; Junaid Effendi</div></a></div><h3>Newsletter</h3><p>Newsletter is an effective way to share a range of information with a wide audience on a regular schedule. They often include subscribe/unsubscribe options, making it easy to manage the volume of newsletters you receive.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Consolidate and share key knowledge updates, tips, and resources regularly.</p></li><li><p>Highlight achievements or learnings that inspire further knowledge sharing.</p></li><li><p>Keep the team aligned and informed about relevant developments and tools.</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.junaideffendi.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading, Share &#128257; and Subscribe &#128276;for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3>Documentation</h3><p>Documentation can be of many types:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cgdf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cgdf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 424w, https://substackcdn.com/image/fetch/$s_!Cgdf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 848w, https://substackcdn.com/image/fetch/$s_!Cgdf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 1272w, https://substackcdn.com/image/fetch/$s_!Cgdf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cgdf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png" width="580" height="336.2087912087912" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:580,&quot;bytes&quot;:197959,&quot;alt&quot;:&quot;Documentation for knowledge sharing&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Documentation for knowledge sharing" title="Documentation for knowledge sharing" srcset="https://substackcdn.com/image/fetch/$s_!Cgdf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 424w, https://substackcdn.com/image/fetch/$s_!Cgdf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 848w, https://substackcdn.com/image/fetch/$s_!Cgdf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 1272w, https://substackcdn.com/image/fetch/$s_!Cgdf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054fd48b-4a60-4a2a-9b0d-6ff8ff90bafc_2547x1477.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Three types of Documentation for knowledge sharing</figcaption></figure></div><h4><strong>Design Document</strong></h4><p>This focuses on design-related documents created during the planning phase, allowing peers to review and resolve design issues, fostering mutual learning.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Gain insights into design decisions and identify potential issues early.</p></li><li><p>Receive feedback to improve design quality.</p></li><li><p>Establish a shared understanding across the team.</p></li></ul><h4><strong>Process Document</strong></h4><p>A centralized repository (e.g. <a href="https://dub.sh/WoS7gWd">Multiplayer</a>, Notion or Confluence) for existing processes, such as final design documents, system architectures, workflows, and data flows, written for a broader audience.</p><div class="pullquote"><p>Today&#8217;s post is brought to you by:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://dub.sh/WoS7gWd" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dngB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 424w, https://substackcdn.com/image/fetch/$s_!dngB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 848w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1272w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png" width="481" height="180.375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:300,&quot;width&quot;:800,&quot;resizeWidth&quot;:481,&quot;bytes&quot;:39352,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://dub.sh/WoS7gWd&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dngB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 424w, https://substackcdn.com/image/fetch/$s_!dngB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 848w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1272w, https://substackcdn.com/image/fetch/$s_!dngB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc83fbd2d-cb25-4136-bb87-c5ea964a036a_800x300.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Multiplayer auto-discovers, tracks, and documents your entire system architecture&#8212;from its components to APIs, dependencies, and environments. Gain real-time, comprehensive visibility into your system, all at a glance.</em></p></div><p><strong>How to Benefit:</strong></p><ul><li><p>Understand the system and how components interact and integrate.</p></li><li><p>Reuse components and processes efficiently.</p></li><li><p>Ensure knowledge is accessible to a wider audience beyond the immediate team.</p></li></ul><h4><strong>Codebase</strong></h4><p>Codebase can be a great place to learn and share knowledge, it can be provide several different options; code comments, unit testing and tutorials.</p><p><strong>How to Benefit:</strong></p><ul><li><p><strong>Code Comments</strong>: Quickly understand code functionality through comments and examples.</p></li><li><p><strong>Unit Testing</strong>: Learn function usage directly from unit tests.</p></li><li><p><strong>Tutorials</strong>: Onboard new contributors efficiently with tutorials and setup guides.</p></li></ul><h3>Code Review</h3><p>Code reviews are among the most effective ways to share knowledge, benefiting both authors and reviewers by enabling mutual learning. They can be conducted asynchronously via a Version Control System or synchronously through Pair or Mob Programming sessions.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Enable knowledge transfer through in-depth feedback and collaborative discussions.</p></li><li><p>Identify areas for improvement while promoting consistent coding standards across the team.</p></li><li><p>Offer a structured method for learning from each other&#8217;s coding practices.</p></li></ul><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:140656109,&quot;url&quot;:&quot;https://www.junaideffendi.com/p/code-reviews&quot;,&quot;publication_id&quot;:2256445,&quot;publication_name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;title&quot;:&quot;Why Code Review? &quot;,&quot;truncated_body_text&quot;:&quot;Code reviews are great in the Software Engineering world, it is a great way to maintain consistency and quality of codebase based on some standards defined by the team, company or industry with some great free benefits like Knowledge Sharing. Code reviews have evolved in the past few decades. It started mainly from human review and now machines help hum&#8230;&quot;,&quot;date&quot;:&quot;2022-07-06T19:34:00.000Z&quot;,&quot;like_count&quot;:3,&quot;comment_count&quot;:0,&quot;bylines&quot;:[{&quot;id&quot;:21393641,&quot;name&quot;:&quot;Junaid Effendi&quot;,&quot;handle&quot;:&quot;junaideffendi&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06559f3-ee33-46f8-bfa0-50964179f235_1200x1200.png&quot;,&quot;bio&quot;:&quot;I love to share my learnings and experiences about Software and Data Engineering.&quot;,&quot;profile_set_up_at&quot;:&quot;2022-05-25T22:34:01.768Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:2273688,&quot;user_id&quot;:21393641,&quot;publication_id&quot;:2256445,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:2256445,&quot;name&quot;:&quot;Junaid Effendi | Sharing knowledge for Engineers&quot;,&quot;subdomain&quot;:&quot;junaideffendi&quot;,&quot;custom_domain&quot;:&quot;www.junaideffendi.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Covering tech, career, data, growth experiences from my journey.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6aa39096-d454-439f-98b5-baea84b501aa_800x800.png&quot;,&quot;author_id&quot;:21393641,&quot;theme_var_background_pop&quot;:&quot;#8AE1A2&quot;,&quot;created_at&quot;:&quot;2024-01-13T20:16:55.701Z&quot;,&quot;rss_website_url&quot;:null,&quot;email_from_name&quot;:&quot;Junaid Effendi&quot;,&quot;copyright&quot;:&quot;Junaid Effendi&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.junaideffendi.com/p/code-reviews?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!iYad!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6aa39096-d454-439f-98b5-baea84b501aa_800x800.png" loading="lazy"><span class="embedded-post-publication-name">Junaid Effendi | Sharing knowledge for Engineers</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Why Code Review? </div></div><div class="embedded-post-body">Code reviews are great in the Software Engineering world, it is a great way to maintain consistency and quality of codebase based on some standards defined by the team, company or industry with some great free benefits like Knowledge Sharing. Code reviews have evolved in the past few decades. It started mainly from human review and now machines help hum&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">4 years ago &#183; 3 likes &#183; Junaid Effendi</div></a></div><h3>Code Search</h3><p>Code search is invaluable for navigating large codebases. It serves as a one-stop solution for code discovery.</p><p><strong>How to Benefit:</strong></p><ul><li><p>Quickly locate and understand existing implementations, patterns, and best practices.</p></li><li><p>Identify duplicate efforts to improve collaboration and optimize workflows.</p></li><li><p>Enable independent learning and knowledge exploration within the codebase.</p></li></ul><div><hr></div><p>By understanding the importance of these seven ways, you'll gain valuable learning experiences that will help you become a better team player as well as a better engineer.</p><p>&#128172;Let me know if you have more ways in mind.</p><p></p>]]></content:encoded></item></channel></rss>