My Data Engineering Roadmap as a Data Science Student
After sharing my motivation for pursuing Data Engineering, I wanted to put that passion into a structured plan. Something that connects what I learn in university, online courses, and real-world practice.
Overview
I designed this roadmap to align with eight semesters of my Data Science degree, combining theory from coursework and technical depth from self-learning. Each semester focuses on a specific layer of data engineering, from local experimentation to production-grade cloud systems.
This roadmap isn’t just a checklist, it’s a direction map. Flexible enough to adapt as I grow, but clear enough to keep me focused.
As a Data Science student, my goal is to build solid foundations first, automate second, and scale last.
Roadmap Table
Foundation (Phase 1-2) ✅
| Phase | Phase 1 - Foundation Basics | Phase 2 - Local ETL & Modeling |
|---|---|---|
| Status | ✅ Completed | ✅ Completed |
| Main Focus | Python & SQL Fundamentals | Databases & Local Data Handling |
| Key Skill & Topics | Basic statistics, Python basics, SQL fundamentals, Intro to Git & Linux | CSV ingestion, Pandas ETL, Bash automation, Data modeling (star schema), Normalization, Basic ETL patterns, Makefile-based orchestration |
| Tools / Platform | Python, Git/GitHub, Linux Shell, SQLite | PostgreSQL, Makefile, Pandas, Bash CLI |
| Project / Portfolio Output | Project 1: Movie Data ETL Pipeline | Project 2: E-commerce Data Pipeline |
Cloud (Phase 3-4) ⏳
| Phase | Phase 3 - Analytic Engineering | Phase 4 - Cloud Infra Basics |
|---|---|---|
| Status | ✅ Completed | ⏳ In Progress |
| Main Focus | Cloud object storage, Warehouse setup, dbt layering (staging → intermediate → marts), Basic orchestration | Docker, Terraform & CI/CD Fundamentals |
| Key Skill & Topics | dbt (layering), Star Schema, MRR, Cohort Retention, Customer LTV, Data Quality, Chaos Engineering | Containerized pipeline, infrastructure-as-code, basic networking, CI/CD, IAM basics |
| Tools / Platform | Cloudflare R2, Snowflake, dbt, Docker, Airflow | Docker, Terraform, GitHub Actions, AWS/GCP |
| Project / Portfolio Output | Project 3: End-to-end cloud data pipeline (Docker + Lambda) deployed fully via Terraform | Project 4: Dockerized Pipeline + Terraform Deploy + GitHub Actions CI |
Advanced (Phase 5-6) 🔒
| Phase | Phase 5 - Data Engineering Platform | Phase 6 - Production & Capstone |
|---|---|---|
| Status | 🔒 Post-Employment | 🔒 Post-Employment |
| Main Focus | End-to-End Pipeline, Streaming & Advanced Orchestration | Lakehouse, ML Integration & Full Platform |
| Key Skill & Topics | Advanced Airflow, Streaming (Kafka/Kinesis), Data lineage, Data governance, Observability,Testing (Great Expectations) | Delta Lake / Hudi, ACID transactions, Spark optimization, Feature store, ML pipeline, Data catalog |
| Tools / Platform | Airflow/Prefect, Kafka, Spark, Great Expectations, CloudWatch/Grafana | Databricks, Delta Lake, MLflow, Flink, Metabase/Power BI |
| Project / Portfolio Output | Project 5: Production-grade End-to-End Pipeline, ingestion → transform → quality → monitoring | Capstone: Full data platform, ingestion → processing → catalog → quality → dashboard + ML pipeline |
How I’ll Use This Roadmap
As a compass, to keep me aligned with my long-term goal: becoming a professional Data Engineer.
As a progress tracker, to document what I’ve learned and what still needs improvement.
As content inspiration, for future blog posts and portfolio updates.
This roadmap isn’t final, it’s something I’ll refine every semester as I gain experience through university projects, certifications, and my ongoing work in the data industry.
Roadmap Update Log
v2.3 - Phase 3 Completion & Cloud Analytics Update
March 2026
Highlights
- Phase 3 marked as Completed after building a full SaaS analytics engineering pipeline.
- Introduced Project 3: SaaS Analytics Pipeline as the main portfolio artifact for Phase 3.
- Updated the roadmap to reflect a local-first → cloud analytics transition strategy.
Changes
- Cloud section restructured: Phase 3 now emphasizes analytics engineering workflows in the cloud.
- Clarified learning focus around object storage, warehouse modeling, dbt transformations, and orchestration.
- Minor structural adjustments to prepare the roadmap for Phase 4.
Notes
- This update represents the transition point from local data pipeline experimentation to cloud-oriented analytics systems.
v2.2 - Phase Progress & Vertical Layout Update
February 2026
Highlights
- Roadmap layout changed from horizontal to vertical phase structure for better readability.
- Phase progress indicators added to track learning milestones.
Changes
- Improved visual clarity and roadmap navigation.
- Minor wording improvements across phase descriptions.
v2.1 - Phase 2 Learning Resources Update
December 2025
Changes
- Updated learning materials and references for Phase 2.
- Minor refinements to study priorities within the data engineering fundamentals phase.
v2.0 - Major Revision
December 2025
Highlights
- Significant roadmap restructuring based on learning progress and industry research.
- Clearer separation between foundational data engineering, analytics engineering, and infrastructure topics.
v1.1 - Tooling Update
November 2025
Changes
- Added several tools and technologies to the roadmap stack.
- Minor adjustments to the learning order.
v1.0 - Initial Structure
October 2025
Initial version of the roadmap outlining the long-term learning path toward a Data / Analytics Engineering career.
