ETL data pipeline
Batch + streaming ETL into a lakehouse: source → ingestion → transformation → warehouse → BI/ML consumers.
The prompt
Hybrid batch and streaming ETL pipeline. Batch sources: PostgreSQL replicas (CDC via Debezium → Kafka), CSV uploads to S3, and third-party API pulls scheduled by Airflow. Streaming sources: clickstream events via Kafka. Ingestion lands raw data in S3 (bronze layer in a lakehouse). Spark jobs running on Kubernetes transform bronze to silver (cleaned) and silver to gold (aggregated). Gold tables are queryable via Athena and replicated to a Snowflake warehouse for BI. ML team consumes silver tables via a Feast feature store. Show the Airflow DAG orchestration, the data quality checks (Great Expectations), and the failed-record dead letter.
What it generates
A data pipeline diagram with sources, ingestion, bronze/silver/gold lakehouse layers, orchestration, and downstream consumers.
When to use it
For modern data platforms that combine batch and streaming, want a lakehouse architecture, and serve both BI and ML workloads.
Generate this diagram in seconds
Copy the prompt above, sign in for free, and paste it into the generator.
Related data & ai templates
RAG (retrieval-augmented generation) pipeline
Document ingestion, embedding, vector search, LLM generation, and response streaming for a production RAG application.
Multi-agent LLM system
Hierarchical multi-agent architecture: orchestrator agent dispatches to specialist agents with shared memory and tool access.
MLOps training & inference pipeline
End-to-end ML lifecycle: feature store, training pipeline, model registry, online inference, and monitoring.