Interview Readiness Platform

Become interview-ready for data engineering roles.

Practice answering real interview questions, get AI-scored rubric feedback, and track your improvement over time. Built specifically for Data Engineers.

Rubric-driven scoringMock interview modeAI explanations after every answer

What you practice

Three core review skills that come up in every data engineering interview.

System Design
Design data pipelines, warehouses, and architectures. Explain trade-offs with depth and clarity.
SQL & Python
Write and review queries, explain window functions, optimize performance, and handle edge cases.
Concepts & Theory
Answer questions on data modeling, orchestration, streaming, and cloud data platforms.

How it works

Step 1
Pick your topics
Choose a role (DE, DS, DA) and the specific topics you want to drill — SQL, Spark, dbt, system design, and more.
Step 2
Answer and get feedback
Write your answer in plain text. Our AI scores it against a rubric and shows exactly where to improve.
Step 3
Track your progress
See your average scores by topic, identify gaps, and use the mock interview mode to simulate the real thing.

20 Data Engineering interview questions

A sample of what you'll practice. Sign up free to see AI-scored rubric feedback on every answer.

Get started free →
Spark

You have a Spark job that runs fine on 10M rows but OOMs on 500M rows. Walk me through how you'd diagnose and fix it.

Spark

Explain the difference between SortMergeJoin and BroadcastHashJoin in Spark. When would you force one over the other?

dbt

Your dbt model takes 45 minutes to run. How do you find the bottleneck and fix it?

Ingestion

Design an incremental ingestion pipeline for a source table that has no updated_at column.

Modeling

What's the difference between a fact table and a dimension table? Give a concrete example for an e-commerce company.

SQL

Explain SCD Type 2. Write the SQL MERGE statement to implement it.

Data Quality

You're seeing duplicate rows in your gold-layer table. Walk through the steps to identify the root cause.

Contracts

What is a data contract? Who owns it, and what happens when a producer breaks it?

Streaming

Compare Kafka delivery semantics: at-most-once vs at-least-once vs exactly-once. When does each matter?

Orchestration

Your Airflow DAG has been running for 6 hours and is stuck. What do you check first?

Streaming

Explain watermarks in Flink/Spark Streaming. What happens to late-arriving events?

SQL

Write a SQL query to find the second-highest salary in each department without using LIMIT or TOP.

Storage

What is partition pruning and how does Iceberg's hidden partitioning improve on Hive-style partitioning?

Orchestration

You need to backfill 18 months of data. How do you do it safely without impacting production?

Modeling

Explain the Medallion Architecture. What goes in Bronze, Silver, and Gold layers?

SQL

What's the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()? Give an example where they produce different results.

Streaming

Your Kafka consumer lag is growing. What are the possible causes and how do you fix each?

Ingestion

Explain Change Data Capture. What are the trade-offs between log-based CDC and query-based CDC?

Spark

How does Spark's Adaptive Query Execution (AQE) work? What problems does it solve?

Data Quality

Design a data quality test suite for a payments pipeline. What tests are blocking vs. warning?

Mock Interview Mode

Simulate the real interview

Pick a level (mid / senior / staff), go through timed rounds across topics, and get a full report with hiring verdict at the end.

Example DE rounds: Data Modeling → SQL → Python → System Design

Try a mock interview →

Built for data roles

Deep ontology covering 100+ topics across Data Engineering, Data Science, and Data Analytics. Questions are tagged by difficulty, topic, and subtopic — so you always know what you're working on.

Data EngineerData ScientistData AnalystAnalytics Engineer