Question 1

dbt vs traditional ETL (like Informatica) — what's the difference?

Accepted Answer

dbt is an ELT framework (Extract Load Transform) — data lands in your warehouse first, then dbt transforms it using SQL. Traditional ETL transforms before loading. dbt's advantage: leverage the warehouse's compute (Snowflake, BigQuery), version-control transformations via Git, test data quality, document lineage, and iterate quickly. Drawback: requires centralized data warehouse (not cheap for small datasets). Use dbt if you have Snowflake/BigQuery/Redshift + >1GB data. Use traditional ETL if moving raw files between on-prem systems.

Question 2

What's a 'model' in dbt and how does it differ from a view or table?

Accepted Answer

A dbt model = a SQL file that defines a transformation. When you run dbt, each model becomes either a view or table in your warehouse (you choose via materialization config). View = query runs every time (cheaper compute, slower), table = pre-computed and stored (expensive compute, fast queries). Incremental model = hybrid (update only new rows since last run). Use tables for frequently-queried fact tables, views for intermediate staging layers. `SELECT * FROM {{ ref('my_model') }}` pulls from upstream models.

Question 3

How do I test data quality in dbt and prevent bad data from reaching downstream dashboards?

Accepted Answer

dbt includes built-in tests (unique, not_null, relationships, accepted_values) in schema.yml and custom SQL tests. Example: `- name: user_id
  tests:
    - unique
    - not_null`. Run `dbt test` to validate. Fail fast: block downstream models if tests fail (dbt's default). Advanced: write custom tests (e.g., 'conversion rate between 0 and 1'). Test at source level (raw data) and model level (transformed). Monitor test coverage: aim for >80% of columns tested.

Question 4

Snapshots and incremental models — when do I use each?

Accepted Answer

Incremental = transforms only NEW rows since last run (fast, cheap). Use for fact tables growing daily (events, transactions). Snapshot = captures a point-in-time view of a slowly-changing dimension (SCD). Example: snapshot user profile each day, track when 'plan' column changed from free to paid. Incremental fails if you need historical versions of a row; snapshot is designed for that. Rule of thumb: incremental for immutable events, snapshots for mutable dimensions.

Question 5

How do I set up CI/CD for dbt in production (dbt Cloud vs self-hosted)?

Accepted Answer

dbt Cloud = managed solution, runs jobs on dbt's infrastructure, built-in scheduler, PR deployment previews (run jobs against staging branch). Cost ~$100-300/mo. Self-hosted (Airflow/Dagster) = you manage Python orchestrator, invoke dbt CLI in tasks, more control but more ops. Modern best practice: use dbt Cloud for small teams (<$10M ARR), self-hosted for large companies managing complex DAGs. dbt Cloud integration: commit to branch, dbt automatically runs `dbt test` + `dbt build` on preview branch before merging to main.

Question 6

What's the dbt semantic layer and how does it change analytics?

Accepted Answer

Semantic layer = centralized metric definitions (one source of truth for KPIs). Instead of each BI tool defining 'revenue' differently, dbt semantic layer defines it once (SELECT SUM(amount) WHERE status='paid'), and Looker/Tableau/Metabase pull from it. Benefit: consistency, fast iterations (change metric definition once, all dashboards update). Drawback: new workflow, requires dbt Cloud (not Core), learning curve. Use if: multiple BI tools or teams defining metrics differently. Skip if: small team, single BI tool.

Question 7

How do I debug dbt when transformations produce wrong results?

Accepted Answer

Use `dbt debug` to check Warehouse connection. Use `dbt run --select my_model` to run one model. Check `target/compiled/` folder to see the actual SQL generated (Jinja templates expanded). Check `target/run_results.json` for execution times and test failures. Common bugs: missing WHERE clause (forgot to filter source data), wrong join logic (many-to-many join creating duplicates), stale dependencies (old ref() to wrong model). Always run `dbt test` after model changes. Use `dbt freshness` to check source data staleness.

Region	Junior	Mid	Senior
USA	$85k	$128k	$163k
UK	£52k	£82k	£110k
EU	€58k	€90k	€125k
CANADA	C$92k	C$135k	C$175k

dbt (Data Build Tool)

What is dbt (Data Build Tool)

📋 Before you start

💰 Salary by region

🎓 Certifications

🎯 Careers using dbt (Data Build Tool)

⚖ Compare with

❓ FAQ

Not sure this skill is for you?

Find your ideal career path