▶What's the difference between Snowflake and BigQuery?
Snowflake is cloud-agnostic (AWS/Azure/GCP), uses virtual warehouses with independent compute scaling, offers time-travel (query historical snapshots), and dominates enterprise adoption ($50B valuation, 8000+ customers). Pricing: per-second compute + storage. BigQuery is Google-native, serverless (no warehouse management), uses slots for reserved compute, and excels in Google Cloud environments. Pricing: per-query with monthly commitments for high volume. Choose Snowflake for multi-cloud flexibility and team control over compute. Choose BigQuery for Google Cloud-first orgs wanting zero infrastructure management.
▶How do I optimize costs in Snowflake vs BigQuery?
Snowflake: (1) Right-size virtual warehouses — don't run X-Large for small queries; (2) Use clustering on high-cardinality columns to skip full table scans; (3) Understand query impact — smaller warehouse = slower but cheaper; (4) Monitor per-user spend with WAREHOUSE_METERING_HISTORY; (5) Suspend warehouses when idle. BigQuery: (1) Use partitioned tables to scan fewer bytes (costs are per-byte scanned, not compute time); (2) Use clustering for high-cardinality columns; (3) Run BI Engine for cached results; (4) Estimate query cost before running with `--dry_run` flag; (5) Use monthly commitments if volume >100TB/month. Snowflake: fixed per-second billing, easier to budget. BigQuery: usage-based, harder to predict but cheaper for ad-hoc queries.
▶Should I use ELT or ETL with these warehouses?
ELT (Extract-Load-Transform) is the modern default: load raw data into the warehouse first, then transform with SQL (dbt). Why: (1) Faster initial load (no pre-processing); (2) Flexible schema evolution (schema-on-read); (3) All transformations in SQL using warehouse compute; (4) Version-controlled dbt models; (5) Easy to backfill and test. ETL (Extract-Transform-Load) = transform before loading (old pattern, still used for legacy systems). Both Snowflake and BigQuery are optimized for ELT. Use dbt as your transformation framework — it generates SQL, runs on schedule, and integrates with both.
▶What's partitioning and clustering? Which should I use?
Partitioning: divide a table by a column (usually date) so queries only scan relevant partitions. Example: `PARTITION BY DATE(created_at)` — query '2026-01-01' only scans Jan 2026 data, not all years. Clustering: sort data within partitions by one or more columns (high-cardinality: user_id, product_id). Both reduce bytes scanned = lower cost + faster queries. Snowflake: clustering is automatic (Z-order) but you specify cluster key. BigQuery: partitioning is mandatory for large tables, clustering is optional. Use partitioning for temporal data (dates, timestamps). Use clustering for high-cardinality columns (user_id, account_id, product_id). Combine both: partition by date, cluster by user_id.
▶When should I use Snowflake/BigQuery vs Postgres?
Postgres: transactional OLTP (inserts/updates/deletes), relational consistency (ACID), small-to-medium datasets (1-10TB), low latency (ms), complex JOINs. Snowflake/BigQuery: analytical OLAP (massive SELECTs), denormalized schemas (fact tables), 100TB-EB scale, batch queries (seconds to minutes OK), compressed storage, cheap retention. Example: use Postgres as your operational database (customer orders, inventory), stream/batch to Snowflake for analytics (revenue by region, cohort analysis). Postgres = source of truth for transactional data. Snowflake/BigQuery = analytics engine for historical analysis + ML.
▶How do I implement time travel in Snowflake?
Time travel = query a table as it was at a specific time in the past. Snowflake: `SELECT * FROM table_name AT (TIMESTAMP => '2026-01-01 10:00:00'::timestamp_tz)` or `BEFORE (STATEMENT => query_id)`. Retention: 90 days by default (configurable). Use cases: (1) accidental DELETE recovery; (2) auditing data changes; (3) A/B testing by querying different points in time; (4) debugging data pipeline bugs. Cost: minimal — just READS, no extra storage. BigQuery: no native time travel, but snapshots exist (table decorators) with cheaper storage: `SELECT * FROM table@1234567890000` (milliseconds since epoch). Snowflake time travel is a major advantage for audit/recovery workflows.
▶How do dbt models fit into Snowflake/BigQuery?
dbt (data build tool) = your transformation framework. Write SQL SELECT statements, dbt compiles to INSERT/CREATE TABLE/CREATE VIEW. Workflow: (1) raw data lands in Snowflake/BigQuery (Fivetran/Airbyte does EL); (2) dbt models transform raw → mart tables; (3) dbt manages dependencies (model A depends on model B); (4) dbt runs on schedule (daily) or on-demand; (5) dbt tracks lineage and tests data quality (e.g., `not_null`, `unique` checks). Both Snowflake and BigQuery have first-class dbt support — dbt adapters handle SQL dialect differences. Use dbt for: schema evolution, version control (git), documentation (auto-generated), testing, deployment CI/CD.