Integrating Apache Airflow with Celery and Dragonfly
Learn how to set up Apache Airflow and Celery with the high-performance Dragonfly data store for superior workflow orchestration and scalability.

What is Apache Airflow?
Apache Airflow has emerged as a powerful tool in the world of workflow automation and orchestration. Its ability to programmatically author, schedule, and monitor workflows makes it a favorite among engineers and data scientists. Celery, a distributed task queue for Python often using Redis as a backing storage, can be used in conjunction with Airflow for distributed task management. However, Dragonfly offers a compelling alternative. As a disruptive, multi-threaded, in-memory data store, Dragonfly delivers superior performance, flexibility, scalability, and cost savings compared to Redis. Previously, we have demonstrated how to run Celery on Dragonfly with a practical application example. In this guide, we’ll take a step further and walk through integrating Apache Airflow, Celery, and Dragonfly.
Why Choose Dragonfly Over Redis?
Before diving into the technical steps, let’s explore why Dragonfly is a better choice:
- Performance: Dragonfly outperforms Redis in throughput, offering 25x operations per second. This is crucial for high-traffic applications, ensuring efficient task management and reduced tail latency.
- Scalability: Dragonfly’s multi-threaded architecture allows it to scale with CPU cores, unlike Redis’s single-threaded nature for data operations. This architectural difference is essential for handling large-scale task distributions without bottlenecks. Going beyond running on a single server, Dragonfly Swarm also offers distributed multi-shard clustering.
- Cost Efficiency: By offering higher performance with fewer resources, Dragonfly reduces infrastructure costs, providing a more economical solution for large-scale applications.
- Flexibility: Built to integrate seamlessly into modern architectures, Dragonfly offers better adaptability for evolving business needs compared to legacy systems.
Now, let’s dive into the integration process.
Step-by-Step Integration Guide
Prerequisites
- Docker (v27.4.0), for running the Dragonfly server in a containerized environment.
- uv (v0.8.12), an extremely fast Python package and project manager.
- Make sure port
6379
is free (we’ll bind Dragonfly there) and port8080
is free for Airflow’s API server.
Start the Dragonfly Server in Docker
Firstly, let’s run a Dragonfly server using Docker locally:
$> docker run -d --name dragonfly -p 6379:6379 docker.dragonflydb.io/dragonflydb/dragonfly
For the command above:
docker run -d
runs the container in detached mode (in the background).-name dragonfly
gives the container a predictable name.-p 6379:6379
maps the container’s port6379
to your machine’s port6379
.- The URL
docker.dragonflydb.io/dragonflydb/dragonfly
has the latest version of the Dragonfly Docker image ready to download.
Celery by default uses Redis as a message broker (to queue tasks) and a result backend (to store results). Dragonfly offers a much more performant, Redis‑compatible alternative, which can replace Redis in this case with ease. To check that the Dragonfly container is running, run docker ps
and that should list the dragonfly
container. If port 6379
is already in use, consider stopping the service occupying the port or change the mapping (e.g., -p 6380:6379
) and update URLs you’ll see later in the guide.
Create and Activate a Python 3.12 Virtual Environment
At Dragonfly, we prioritize efficiency above all. This principle extends to our development tools and workflows, even for a guide like this. I’ve found that traditional Python package and environment managers can be slow (this is 100% Joe’s personal opinion) and only “kind of work” with broken dependency installations from time to time, which is extremely annoying (this is also 100% Joe’s personal opinion). To solve this, we are adopting uv
, an emerging tool in the Python ecosystem that perfectly aligns with our value: being robust and efficient. uv
is an extremely fast Python package and project manager written in Rust. It is designed as a single, unified tool that replaces the functionality of pip
, pip-tools
, pipx
, poetry
, pyenv
, twine
, virtualenv
, and more, providing a splendid developer experience.
Because uv
is a relatively new tool, our first step is to ensure it is installed correctly on your system:
# Install using 'curl' or 'wget'.
$> curl -LsSf https://astral.sh/uv/install.sh | sh
$> wget -qO- https://astral.sh/uv/install.sh | sh
# You can also request a specific version.
$> curl -LsSf https://astral.sh/uv/0.8.12/install.sh | sh
Once installed, we can verify its version:
$> uv --version
#=> uv 0.8.12 (36151df0e 2025-08-18)
Now, we are ready to use uv
to manage our integration project and dependencies. Create the project by using uv init
:
$> uv init airflow-celery-dragonfly --python 3.12
#=> Initialized project `airflow-celery-dragonfly` at `/Users/joe/workspace/...`
$> cd airflow-celery-dragonfly
By using the commands above, we can see that:
- A project directory named
airflow-celery-dragonfly
is created byuv
. - We also specify to
uv
that we want Python v3.12, as per the Airflow documentation, Airflow supports Python 3.9, 3.10, 3.11, and 3.12. - Note that we also change directory into
airflow-celery-dragonfly
for all the following steps.
Install Airflow with Celery Support
Next, install Airflow 3.0.3 with the Celery extra and the official constraints file for Python 3.12:
$> uv add "apache-airflow[celery]==3.0.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.0.3/constraints-3.12.txt"
#=> Using CPython 3.12.5 interpreter at: /opt/homebrew/opt/python@3.12/bin/python3.12
#=> Creating virtual environment at: .venv
#=> Resolved 145 packages in 3.00s
#=> Prepared 106 packages in 1.25s
#=> Installed 142 packages in 245ms
By running the command above:
uv
automatically creates a virtual environment using the Python version we specified.- Airflow relies on a specific set of dependency versions. The constraints file prevents version conflicts by pinning compatible packages.
- If you see dependency conflicts (which would be reported at installation time, instead of code runtime, thank you so much
uv
), double‑check that the Airflow version and constraints URL match your Python version.
Choose Airflow Storage Location
Set AIRFLOW_HOME
to a folder inside your project so everything stays tidy:
$> mkdir airflow-home
$> export AIRFLOW_HOME="$PWD/airflow-home"
AIRFLOW_HOME
is where Airflow keeps airflow.cfg
, logs, the metadata SQLite database (in this demo), and your dags/
folder.
Initialize & Migrate the Airflow Database
Airflow v3 uses db migrate
instead of the deprecated db init
for database initialization. And because we are using uv
, commands can be prefixed with uv run
to ensure environment consistency:
$> uv run airflow db migrate
This creates or upgrades the metadata database and writes airflow.cfg
.
Configure Airflow to use Celery with Dragonfly
You can configure Airflow in two ways. Environment variables offer simplicity and is often preferred in servers where you have full control. Alternatives, the airflow.cfg
file can be used.
Option 1: Set Environment Variables
Export these every time you open a new terminal that runs an Airflow component:
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> export AIRFLOW__CORE__EXECUTOR=CeleryExecutor
$> export AIRFLOW__CELERY__BROKER_URL=redis://localhost:6379/0
$> export AIRFLOW__CELERY__RESULT_BACKEND=redis://localhost:6379/1
$> export AIRFLOW__CORE__MP_START_METHOD=fork
$> export AIRFLOW__DAG_PROCESSOR__STALE_BUNDLE_CLEANUP_INTERVAL=0
What each setting means:
CORE__EXECUTOR=CeleryExecutor
: Airflow will queue tasks to Celery instead of running them locally.CELERY__BROKER_URL
: where Celery sends/reads messages. We use Dragonfly via the Redis protocol atdb=0
.CELERY__RESULT_BACKEND
: where task results are stored. We use Dragonfly at atdb=1
.CORE__MP_START_METHOD=fork
: preferfork
to avoid overhead on macOS/Linux during task process creation.DAG_PROCESSOR__STALE_BUNDLE_CLEANUP_INTERVAL=0
: speeds up development by disabling periodic cleanup of serialized DAG bundles.
Option 2: Edit airflow.cfg
Open $AIRFLOW_HOME/airflow.cfg
and set the following keys:
[core]
executor = CeleryExecutor
mp_start_method = fork
[celery]
broker_url = redis://localhost:6379/0
result_backend = redis://localhost:6379/1
[dag_processor]
stale_bundle_cleanup_interval = 0
Note that the environment variables approach overrides values in airflow.cfg
.
Create your First DAG (Directed Acyclic Graph)
Create the folder if it doesn’t exist and add a Python file representing a DAG, which is a model that encapsulates everything needed to execute a workflow:
$> mkdir -p "$AIRFLOW_HOME/dags"
$> touch "$AIRFLOW_HOME/dags/hello_dragonfly.py"
Paste the following Python code into the file:
from airflow import DAG
from airflow.operators.python import PythonOperator
import pendulum
def hello():
print("🎉 Hello from a Celery worker via Dragonfly!")
with DAG(
dag_id="hello_dragonfly",
start_date=pendulum.yesterday("UTC"),
schedule=None,
catchup=False,
):
PythonOperator(
task_id="first_task",
python_callable=hello,
)
What’s happening here:
dag_id="hello_dragonfly"
: the unique DAG name you’ll see in the UI.start_date=pendulum.yesterday("UTC")
: a safe past date so Airflow considers the DAG schedulable without backfilling.schedule=None
: you’ll trigger DAG runs manually.catchup=False
: don’t backfill historical runs.- A single
PythonOperator
task that prints a message. This task will actually run on a Celery worker and we’ll verify that too.
Start Airflow Components
You’ll run three processes. Open three terminals (or tabs). In each one, make sure the AIRFLOW_HOME
environment variable is set. If you’re not using the airflow.cfg
file, make sure other AIRFLOW_*
environment variables mentioned above are also set for each terminal.
Terminal A — Scheduler
The scheduler parses DAGs, decides what needs to run, and sends tasks to Celery via Dragonfly:
# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow scheduler
Terminal B — Celery worker
This starts a Celery worker subscribed to the default
queue, which is the queue our tasks use by default:
# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow celery worker --queues default
Terminal C — API server (UI)
This serves the Airflow UI and API at http://localhost:8080
:
# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow api-server -p 8080
Accessing the Airflow UI
When you have all three processes running go to http://localhost:8080
and login with the following credentials:
- Username:
admin
- Password: Look it up in the file generated by the simple auth manager as shown below.
- Note that we use this simple auth mechanism for local development. For anything serious in production, switch to a proper auth backend.
$> cat "$AIRFLOW_HOME/simple_auth_manager_passwords.json.generated"
In the UI, you should now be able to search for the DAG we’ve created.

Airflow | API Server UI
Sometimes the UI lags a bit during development. If that happens do the following in a new terminal:
# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow dags reserialize
$> uv run airflow dags list | grep hello_dragonfly
dags reserialize
refreshes the serialized representations that the web UI reads.dags list
helps confirm Airflow sees your DAG file.- If you still don’t see it, check the scheduler terminal for parsing errors.
Trigger the DAG and Verify Results
In the UI, choose the hello_dragonfly
workflow and click Trigger
. You should see the run move to the success
state. In the meantime, the scheduler output can be monitored, where the state=success
and executor_state=success
logs confirm the scheduler saw the task complete successfully.
Dag run start:2025-08-21 05:22:36.842947+00:00 end:2025-08-21 05:22:38.110071+00:00
INFO - DagRun Finished: dag_id=hello_dragonfly, logical_date=2025-08-21 05:22:33.863000+00:00, run_id=manual__2025-08-21T05:22:36.476816+00:00, run_start_date=2025-08-21 05:22:36.842947+00:00, run_end_date=2025-08-21 05:22:38.110071+00:00, run_duration=1.267124, state=success, run_type=manual, data_interval_start=2025-08-21 05:22:33.863000+00:00, data_interval_end=2025-08-21 05:22:33.863000+00:00,
INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='hello_dragonfly', task_id='first_task', run_id='manual__2025-08-21T05:22:36.476816+00:00', try_number=1, map_index=-1)
INFO - TaskInstance Finished: dag_id=hello_dragonfly, task_id=first_task, run_id=manual__2025-08-21T05:22:36.476816+00:00, map_index=-1, run_start_date=2025-08-21 05:22:37.151890+00:00, run_end_date=2025-08-21 05:22:37.552758+00:00, run_duration=0.401, state=success, executor=CeleryExecutor(parallelism=32), executor_state=success, try_number=1, max_tries=0, pool=default_pool, queue=default, priority_weight=1, operator=PythonOperator, queued_dttm=2025-08-21 05:22:36.879924+00:00, scheduled_dttm=2025-08-21 05:22:36.868834+00:00,queued_by_job_id=2, pid=79194
Similarly, Celery worker output can be examined:
[info ] [aed10f27-bfb8-42ed-94f8-3bc1119c7a06] Executing workload in Celery: token='eyJ***' ti=TaskInstance(id=UUID('0198cb14-3756-7d25-a51b-cc09e190f069'), task_id='first_task', dag_id='hello_dragonfly', run_id='manual__2025-08-21T05:22:36.476816+00:00', try_number=1, map_index=-1, pool_slots=1, queue='default', priority_weight=1, executor_config=None, parent_context_carrier={}, context_carrier={}, queued_dttm=None) dag_rel_path=PurePosixPath('hello_dragonfly.py') bundle_info=BundleInfo(name='dags-folder', version=None) log_path='dag_id=hello_dragonfly/run_id=manual__2025-08-21T05:22:36.476816+00:00/task_id=first_task/attempt=1.log' type='ExecuteTask' [airflow.providers.celery.executors.celery_executor_utils]
[info ] Secrets backends loaded for worker [supervisor] backend_classes=['EnvironmentVariablesBackend'] count=1
[info ] Task finished [supervisor] duration=0.41565824998542666 exit_code=0 final_state=success
The log Task finished ... final_state=success
is the clearest sign that our Celery worker ran the task and reported back to Dragonfly without errors. It’s important to note that the DAG we created for this guide is intentionally simple. Its purpose is to serve as a minimal, functional example to validate that the entire integrated system (from the Airflow scheduler and web server to the Celery worker and Dragonfly backend) is communicating correctly. Writing sophisticated DAGs with complex dependencies, custom operators, and sensors is a topic for another time. In this guide, our primary focus was on the foundational setup: configuring the components to work together seamlessly.
Conclusion
While this post focused on setup rather than benchmarking, how Dragonfly can make this integration so powerful is well-documented. In previous benchmarks, we’ve demonstrated Dragonfly achieving 25x the throughput of Redis, and its advantages extend directly to queue-based workloads. For a deeper dive into the performance metrics and architectural advancements, you can explore our previous analyses:
- Benchmarking Dragonfly vs. Redis
- Dragonfly Reaches 6.43 Million QPS on AWS Graviton3 Instance
- BullMQ Performance Optimization on Dragonfly
- Sidekiq Performance Optimization on Dragonfly
This proven performance means that the stack you’ve just built isn’t just functional. It’s a high-performance foundation ready to handle the most demanding data orchestration challenges you throw at it.
By integrating Apache Airflow with Celery and Dragonfly, you leverage a modern, high-performance architecture that outpaces traditional Redis setups. Dragonfly’s multi-threaded capabilities ensure superior scalability and performance, while its resource efficiency leads to significant cost savings. This guide provides the technical roadmap to unlock these advantages, positioning Dragonfly as a disruptive force in legacy technology landscapes. As you deploy and test your workflows, you’ll notice the tangible benefits Dragonfly brings to your infrastructure.