End to End Data Engineering
Covering core Data Engineering concepts with end to end data tech stacks from the modern times. Leverage it to build your own path.
Sharing a Data Engineering end to end image, covering the General tools, DevOps focused tech, important Data Engineering concepts and the Tech Stack needed to be a successful Data Engineer.
There could be some debatable stuff here, so lets clarify some through the following key points:
Assuming, one has the foundational knowledge of computer science and software engineering. E.g.
Data Structures and Algorithms
SWE Best Practices
Covering both analytics and infra side, as broken by Zach in his bootcamp.
Checkout his bootcamp
For concepts, focusing on just the important ones in my opinion. Let me know the ones I missed.
Not covering traditional technologies, however I encourage to go through for better understanding. E.g.
MapReduce
Not everything is required to be successful, highly depends on role, there is no straight path to be a Data Engineer. You can leverage this to create one path for yourself. E.g.
Not everyone works on DevOps stuff.
You may be using on premise cloud.
You may be owning the Data Platform.
Read different types of Data Engineering roles.
Showcasing just the popular tools per category covering both open source and proprietary ones. E.g.
Airflow, read more about data orchestrators
Snowflake, read best practices
This definitely not covering lot of aspects, e.g. for deeper dives, look into each tech and its architecture. E.g.
Spark
Kafka
If you dive deeper into Data Quality, you will find lot of important concepts which are often overlooked. If you are looking for a testing library check out Building Framework on top of Great Expectations.
💡Data Engineers with Software Engineering background are usually successful on the infra side, while Data Engineers with Analysis/Science background are usually successful on the Analytics side.
➡️ If you work in a smaller company, you may end up doing all the way from Infra like Kubernetes to processing the data to visualizing.
➡️ If you work in bigger company, you may focus on few pieces but you will become an expert in those.
I may have missed some important pieces and some of your favorite ones, please remind me in the comments what are some common stuff you work and would like to see in there.
Love the illustration a lot!
Hello Junaid,
I wanted to congratulate you on the quality of your work. The article is clear, concise, and the diagram adds a real visual plus!
However, I would like to make a comment regarding the "Quality" section. Sifflet, a French startup founded in 2021, has quickly established itself as one of the main European players in its field. Just one year after its launch, Sifflet already boasts clients such as the BBC, SERVIER, Nextbite, and the Carrefour group.
I simply wanted to share this information with you, so you can consider it for your future articles.
Feel free to take a look at our website if the subject interests you: https://www.siffletdata.com/.
Looking forward to reading your upcoming works,