Interview Readiness Platform
Become interview-ready for data engineering roles.
Practice answering real interview questions, get AI-scored rubric feedback, and track your improvement over time. Built specifically for Data Engineers.
What you practice
Three core review skills that come up in every data engineering interview.
How it works
20 Data Engineering interview questions
A sample of what you'll practice. Sign up free to see AI-scored rubric feedback on every answer.
You have a Spark job that runs fine on 10M rows but OOMs on 500M rows. Walk me through how you'd diagnose and fix it.
Explain the difference between SortMergeJoin and BroadcastHashJoin in Spark. When would you force one over the other?
Your dbt model takes 45 minutes to run. How do you find the bottleneck and fix it?
Design an incremental ingestion pipeline for a source table that has no updated_at column.
What's the difference between a fact table and a dimension table? Give a concrete example for an e-commerce company.
Explain SCD Type 2. Write the SQL MERGE statement to implement it.
You're seeing duplicate rows in your gold-layer table. Walk through the steps to identify the root cause.
What is a data contract? Who owns it, and what happens when a producer breaks it?
Compare Kafka delivery semantics: at-most-once vs at-least-once vs exactly-once. When does each matter?
Your Airflow DAG has been running for 6 hours and is stuck. What do you check first?
Explain watermarks in Flink/Spark Streaming. What happens to late-arriving events?
Write a SQL query to find the second-highest salary in each department without using LIMIT or TOP.
What is partition pruning and how does Iceberg's hidden partitioning improve on Hive-style partitioning?
You need to backfill 18 months of data. How do you do it safely without impacting production?
Explain the Medallion Architecture. What goes in Bronze, Silver, and Gold layers?
What's the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()? Give an example where they produce different results.
Your Kafka consumer lag is growing. What are the possible causes and how do you fix each?
Explain Change Data Capture. What are the trade-offs between log-based CDC and query-based CDC?
How does Spark's Adaptive Query Execution (AQE) work? What problems does it solve?
Design a data quality test suite for a payments pipeline. What tests are blocking vs. warning?
Simulate the real interview
Pick a level (mid / senior / staff), go through timed rounds across topics, and get a full report with hiring verdict at the end.
Example DE rounds: Data Modeling → SQL → Python → System Design
Built for data roles
Deep ontology covering 100+ topics across Data Engineering, Data Science, and Data Analytics. Questions are tagged by difficulty, topic, and subtopic — so you always know what you're working on.