⌬WorkflowDevOps & InfraFree
Apache Airflow
Apache-2.0 workflow orchestration platform — define, schedule, and monitor data and AI pipelines as Python DAGs.
Apache Airflow
Apache Airflow is the most widely deployed open-source workflow orchestration platform, used by thousands of organizations to schedule and monitor data engineering, ML, and AI pipelines. Workflows are defined as Python DAGs (Directed Acyclic Graphs), giving full programmatic control over task dependencies, scheduling, and retry logic.
Key features
- Python DAGs — define complex workflows as code, version-controlled in git
- 1,000+ pre-built operators: HTTP, SQL, Spark, Kubernetes, cloud services (AWS/GCP/Azure)
- Rich web UI for DAG visualization, task logs, and backfill management
- Dynamic DAG generation — build pipelines programmatically from configs or DB queries
- Pluggable executors: LocalExecutor, CeleryExecutor, KubernetesExecutor
- Apache-2.0 license — fully commercial-friendly
Quick start
pip install apache-airflow
# Initialize the database and create an admin user
airflow db init
airflow users create --username admin --firstname Admin \
--lastname User --role Admin --email admin@example.com
# Start scheduler and webserver
airflow scheduler &
airflow webserver --port 8080
# dags/my_ai_pipeline.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def run_embedding_job():
# load docs, embed, upsert to vector db
pass
with DAG("ai_pipeline", start_date=datetime(2024, 1, 1), schedule="@daily") as dag:
embed = PythonOperator(task_id="embed_docs", python_callable=run_embedding_job)
Install via ai-supply
npx ai-supply add apache-airflow-workflows
Curated mirror of the open-source Apache Airflow (Apache-2.0). Get it from the source.