My Data Engineering Roadmap as a Data Science Student

Oct 19, 2025·
Karhomatul Faqih Al Amin
Karhomatul Faqih Al Amin
· 4 min read
Image credit: Photo by Wes Hicks on Unsplash
blog

After sharing my motivation for pursuing Data Engineering, I wanted to put that passion into a structured plan. Something that connects what I learn in university, online courses, and real-world practice.


Overview

I designed this roadmap to align with eight semesters of my Data Science degree, combining theory from coursework and technical depth from self-learning. Each semester focuses on a specific layer of data engineering, from local experimentation to production-grade cloud systems.

This roadmap isn’t just a checklist, it’s a direction map. Flexible enough to adapt as I grow, but clear enough to keep me focused.

As a Data Science student, my goal is to build solid foundations first, automate second, and scale last.


Roadmap Table

Foundation (Phase 1-2) ✅

PhasePhase 1 - Foundation BasicsPhase 2 - Local ETL & Modeling
Status✅ Completed✅ Completed
Main FocusPython & SQL FundamentalsDatabases & Local Data Handling
Key Skill & TopicsBasic statistics, Python basics, SQL fundamentals, Intro to Git & LinuxCSV ingestion, Pandas ETL, Bash automation, Data modeling (star schema), Normalization, Basic ETL patterns, Makefile-based orchestration
Tools / PlatformPython, Git/GitHub, Linux Shell, SQLitePostgreSQL, Makefile, Pandas, Bash CLI
Project / Portfolio OutputProject 1: Movie Data ETL PipelineProject 2: E-commerce Data Pipeline

Cloud (Phase 3-4) ⏳

PhasePhase 3 - Analytic EngineeringPhase 4 - Cloud Infra Basics
Status✅ Completed⏳ In Progress
Main FocusCloud object storage, Warehouse setup, dbt layering (staging → intermediate → marts), Basic orchestrationDocker, Terraform & CI/CD Fundamentals
Key Skill & Topicsdbt (layering), Star Schema, MRR, Cohort Retention, Customer LTV, Data Quality, Chaos EngineeringContainerized pipeline, infrastructure-as-code, basic networking, CI/CD, IAM basics
Tools / PlatformCloudflare R2, Snowflake, dbt, Docker, AirflowDocker, Terraform, GitHub Actions, AWS/GCP
Project / Portfolio OutputProject 3: End-to-end cloud data pipeline (Docker + Lambda) deployed fully via TerraformProject 4: Dockerized Pipeline + Terraform Deploy + GitHub Actions CI

Advanced (Phase 5-6) 🔒

PhasePhase 5 - Data Engineering PlatformPhase 6 - Production & Capstone
Status🔒 Post-Employment🔒 Post-Employment
Main FocusEnd-to-End Pipeline, Streaming & Advanced OrchestrationLakehouse, ML Integration & Full Platform
Key Skill & TopicsAdvanced Airflow, Streaming (Kafka/Kinesis), Data lineage, Data governance, Observability,Testing (Great Expectations)Delta Lake / Hudi, ACID transactions, Spark optimization, Feature store, ML pipeline, Data catalog
Tools / PlatformAirflow/Prefect, Kafka, Spark, Great Expectations, CloudWatch/GrafanaDatabricks, Delta Lake, MLflow, Flink, Metabase/Power BI
Project / Portfolio OutputProject 5: Production-grade End-to-End Pipeline, ingestion → transform → quality → monitoringCapstone: Full data platform, ingestion → processing → catalog → quality → dashboard + ML pipeline

How I’ll Use This Roadmap

  • As a compass, to keep me aligned with my long-term goal: becoming a professional Data Engineer.

  • As a progress tracker, to document what I’ve learned and what still needs improvement.

  • As content inspiration, for future blog posts and portfolio updates.

This roadmap isn’t final, it’s something I’ll refine every semester as I gain experience through university projects, certifications, and my ongoing work in the data industry.


Roadmap Update Log

v2.3 - Phase 3 Completion & Cloud Analytics Update

March 2026

Highlights

  • Phase 3 marked as Completed after building a full SaaS analytics engineering pipeline.
  • Introduced Project 3: SaaS Analytics Pipeline as the main portfolio artifact for Phase 3.
  • Updated the roadmap to reflect a local-first → cloud analytics transition strategy.

Changes

  • Cloud section restructured: Phase 3 now emphasizes analytics engineering workflows in the cloud.
  • Clarified learning focus around object storage, warehouse modeling, dbt transformations, and orchestration.
  • Minor structural adjustments to prepare the roadmap for Phase 4.

Notes

  • This update represents the transition point from local data pipeline experimentation to cloud-oriented analytics systems.

v2.2 - Phase Progress & Vertical Layout Update

February 2026

Highlights

  • Roadmap layout changed from horizontal to vertical phase structure for better readability.
  • Phase progress indicators added to track learning milestones.

Changes

  • Improved visual clarity and roadmap navigation.
  • Minor wording improvements across phase descriptions.

v2.1 - Phase 2 Learning Resources Update

December 2025

Changes

  • Updated learning materials and references for Phase 2.
  • Minor refinements to study priorities within the data engineering fundamentals phase.

v2.0 - Major Revision

December 2025

Highlights

  • Significant roadmap restructuring based on learning progress and industry research.
  • Clearer separation between foundational data engineering, analytics engineering, and infrastructure topics.

v1.1 - Tooling Update

November 2025

Changes

  • Added several tools and technologies to the roadmap stack.
  • Minor adjustments to the learning order.

v1.0 - Initial Structure

October 2025

Initial version of the roadmap outlining the long-term learning path toward a Data / Analytics Engineering career.

Karhomatul Faqih Al Amin
Authors
Analytics Engineer Practitioner
Analytics Engineer practitioner with a strong interest in data pipelines, ETL processes, and scalable data systems. Currently pursuing an undergraduate degree in Data Science, focuses on building practical projects using Python, SQL, and modern data engineering tools. My learning journey emphasizes hands-on implementation, reproducibility, and aligning academic foundations with real-world data engineering needs.