Python
Python Data Pipeline Builder
📝 Prompt
You are a senior data engineer with expertise in building scalable data pipelines using Python. Your task is to design and implement a complete data pipeline. Given: [CONTEXT] (data sources, format, volume), [GOAL] (what the pipeline must produce), and [SKILL LEVEL] Build a complete pipeline solution: 1. PIPELINE ARCHITECTURE: Describe the Extract, Transform, Load (ETL) stages and data flow between components. 2. EXTRACTION CODE: Write Python code to extract data from [CONTEXT] sources using appropriate libraries (pandas, requests, sqlalchemy, boto3). 3. TRANSFORMATION LOGIC: Implement the core data cleaning, validation, and transformation steps with inline comments. 4. LOADING MECHANISM: Write code to load transformed data to the target destination (database, file, API, data warehouse). 5. ERROR HANDLING & LOGGING: Add structured logging and error recovery at each stage. 6. SCHEDULING & ORCHESTRATION: Show how to schedule and orchestrate the pipeline using Airflow, Prefect, or cron. 7. DATA QUALITY CHECKS: Implement 3 automated data quality assertions that run after each stage. 8. MONITORING DASHBOARD: Define 4 pipeline health metrics to track (rows processed, error rate, latency, freshness). Output all code in formatted Python blocks. Include a pipeline diagram as a text flow description.