Data & AIETLdata pipelinelakehouseAirflow

ETL data pipeline

Batch + streaming ETL into a lakehouse: source → ingestion → transformation → warehouse → BI/ML consumers.

The prompt

Hybrid batch and streaming ETL pipeline. Batch sources: PostgreSQL replicas (CDC via Debezium → Kafka), CSV uploads to S3, and third-party API pulls scheduled by Airflow. Streaming sources: clickstream events via Kafka. Ingestion lands raw data in S3 (bronze layer in a lakehouse). Spark jobs running on Kubernetes transform bronze to silver (cleaned) and silver to gold (aggregated). Gold tables are queryable via Athena and replicated to a Snowflake warehouse for BI. ML team consumes silver tables via a Feast feature store. Show the Airflow DAG orchestration, the data quality checks (Great Expectations), and the failed-record dead letter.

Hybrid batch and streaming ETL pipeline. Batch sources: PostgreSQL replicas (CDC via Debezium → Kafka), CSV uploads to S3, and third-party API pulls scheduled by Airflow. Streaming sources: clickstream events via Kafka. Ingestion lands raw data in S3 (bronze layer in a lakehouse). Spark jobs running on Kubernetes transform bronze to silver (cleaned) and silver to gold (aggregated). Gold tables are queryable via Athena and replicated to a Snowflake warehouse for BI. ML team consumes silver tables via a Feast feature store. Show the Airflow DAG orchestration, the data quality checks (Great Expectations), and the failed-record dead letter.

What it generates

A data pipeline diagram with sources, ingestion, bronze/silver/gold lakehouse layers, orchestration, and downstream consumers.

When to use it

For modern data platforms that combine batch and streaming, want a lakehouse architecture, and serve both BI and ML workloads.

Generate this diagram in seconds

Copy the prompt above, sign in for free, and paste it into the generator.

Related data & ai templates

RAG (retrieval-augmented generation) pipeline

Document ingestion, embedding, vector search, LLM generation, and response streaming for a production RAG application.

Multi-agent LLM system

Hierarchical multi-agent architecture: orchestrator agent dispatches to specialist agents with shared memory and tool access.

MLOps training & inference pipeline

End-to-end ML lifecycle: feature store, training pipeline, model registry, online inference, and monitoring.