Integrating Apache Airflow with Celery and Dragonfly

What is Apache Airflow?

Apache Airflow has emerged as a powerful tool in the world of workflow automation and orchestration. Its ability to programmatically author, schedule, and monitor workflows makes it a favorite among engineers and data scientists. Celery, a distributed task queue for Python often using Redis as a backing storage, can be used in conjunction with Airflow for distributed task management. However, Dragonfly offers a compelling alternative. As a disruptive, multi-threaded, in-memory data store, Dragonfly delivers superior performance, flexibility, scalability, and cost savings compared to Redis. Previously, we have demonstrated how to run Celery on Dragonfly with a practical application example. In this guide, we’ll take a step further and walk through integrating Apache Airflow, Celery, and Dragonfly.

Why Choose Dragonfly Over Redis?

Before diving into the technical steps, let’s explore why Dragonfly is a better choice:

Performance: Dragonfly outperforms Redis in throughput, offering 25x operations per second. This is crucial for high-traffic applications, ensuring efficient task management and reduced tail latency.
Scalability: Dragonfly’s multi-threaded architecture allows it to scale with CPU cores, unlike Redis’s single-threaded nature for data operations. This architectural difference is essential for handling large-scale task distributions without bottlenecks. Going beyond running on a single server, Dragonfly Swarm also offers distributed multi-shard clustering.
Cost Efficiency: By offering higher performance with fewer resources, Dragonfly reduces infrastructure costs, providing a more economical solution for large-scale applications.
Flexibility: Built to integrate seamlessly into modern architectures, Dragonfly offers better adaptability for evolving business needs compared to legacy systems.

Now, let’s dive into the integration process.

Step-by-Step Integration Guide

Prerequisites

Docker (v27.4.0), for running the Dragonfly server in a containerized environment.
uv (v0.8.12), an extremely fast Python package and project manager.
Make sure port 6379 is free (we’ll bind Dragonfly there) and port 8080 is free for Airflow’s API server.

Start the Dragonfly Server in Docker

Firstly, let’s run a Dragonfly server using Docker locally:

$> docker run -d --name dragonfly -p 6379:6379 docker.dragonflydb.io/dragonflydb/dragonfly

For the command above:

docker run -d runs the container in detached mode (in the background).
-name dragonfly gives the container a predictable name.
-p 6379:6379 maps the container’s port 6379 to your machine’s port 6379.
The URL docker.dragonflydb.io/dragonflydb/dragonfly has the latest version of the Dragonfly Docker image ready to download.

Celery by default uses Redis as a message broker (to queue tasks) and a result backend (to store results). Dragonfly offers a much more performant, Redis‑compatible alternative, which can replace Redis in this case with ease. To check that the Dragonfly container is running, run docker ps and that should list the dragonfly container. If port 6379 is already in use, consider stopping the service occupying the port or change the mapping (e.g., -p 6380:6379) and update URLs you’ll see later in the guide.

Create and Activate a Python 3.12 Virtual Environment

At Dragonfly, we prioritize efficiency above all. This principle extends to our development tools and workflows, even for a guide like this. I’ve found that traditional Python package and environment managers can be slow (this is 100% Joe’s personal opinion) and only “kind of work” with broken dependency installations from time to time, which is extremely annoying (this is also 100% Joe’s personal opinion). To solve this, we are adopting uv, an emerging tool in the Python ecosystem that perfectly aligns with our value: being robust and efficient. uv is an extremely fast Python package and project manager written in Rust. It is designed as a single, unified tool that replaces the functionality of pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more, providing a splendid developer experience.

Because uv is a relatively new tool, our first step is to ensure it is installed correctly on your system:

# Install using 'curl' or 'wget'.
$> curl -LsSf https://astral.sh/uv/install.sh | sh
$> wget -qO- https://astral.sh/uv/install.sh | sh

# You can also request a specific version.
$> curl -LsSf https://astral.sh/uv/0.8.12/install.sh | sh

Once installed, we can verify its version:

$> uv --version
#=> uv 0.8.12 (36151df0e 2025-08-18)

Now, we are ready to use uv to manage our integration project and dependencies. Create the project by using uv init:

$> uv init airflow-celery-dragonfly --python 3.12
#=> Initialized project `airflow-celery-dragonfly` at `/Users/joe/workspace/...`

$> cd airflow-celery-dragonfly

By using the commands above, we can see that:

A project directory named airflow-celery-dragonfly is created by uv.
We also specify to uv that we want Python v3.12, as per the Airflow documentation, Airflow supports Python 3.9, 3.10, 3.11, and 3.12.
Note that we also change directory into airflow-celery-dragonfly for all the following steps.

Install Airflow with Celery Support

Next, install Airflow 3.0.3 with the Celery extra and the official constraints file for Python 3.12:

$> uv add "apache-airflow[celery]==3.0.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.0.3/constraints-3.12.txt"
#=> Using CPython 3.12.5 interpreter at: /opt/homebrew/opt/python@3.12/bin/python3.12
#=> Creating virtual environment at: .venv
#=> Resolved 145 packages in 3.00s
#=> Prepared 106 packages in 1.25s
#=> Installed 142 packages in 245ms

By running the command above:

uv automatically creates a virtual environment using the Python version we specified.
Airflow relies on a specific set of dependency versions. The constraints file prevents version conflicts by pinning compatible packages.
If you see dependency conflicts (which would be reported at installation time, instead of code runtime, thank you so much uv), double‑check that the Airflow version and constraints URL match your Python version.

Choose Airflow Storage Location

Set AIRFLOW_HOME to a folder inside your project so everything stays tidy:

$> mkdir airflow-home
$> export AIRFLOW_HOME="$PWD/airflow-home"

AIRFLOW_HOME is where Airflow keeps airflow.cfg, logs, the metadata SQLite database (in this demo), and your dags/ folder.

Initialize & Migrate the Airflow Database

Airflow v3 uses db migrate instead of the deprecated db init for database initialization. And because we are using uv, commands can be prefixed with uv run to ensure environment consistency:

$> uv run airflow db migrate

This creates or upgrades the metadata database and writes airflow.cfg.

Configure Airflow to use Celery with Dragonfly

You can configure Airflow in two ways. Environment variables offer simplicity and is often preferred in servers where you have full control. Alternatives, the airflow.cfg file can be used.

Option 1: Set Environment Variables

Export these every time you open a new terminal that runs an Airflow component:

$> export AIRFLOW_HOME="$PWD/airflow-home"
$> export AIRFLOW__CORE__EXECUTOR=CeleryExecutor
$> export AIRFLOW__CELERY__BROKER_URL=redis://localhost:6379/0
$> export AIRFLOW__CELERY__RESULT_BACKEND=redis://localhost:6379/1
$> export AIRFLOW__CORE__MP_START_METHOD=fork
$> export AIRFLOW__DAG_PROCESSOR__STALE_BUNDLE_CLEANUP_INTERVAL=0

What each setting means:

CORE__EXECUTOR=CeleryExecutor: Airflow will queue tasks to Celery instead of running them locally.
CELERY__BROKER_URL: where Celery sends/reads messages. We use Dragonfly via the Redis protocol at db=0.
CELERY__RESULT_BACKEND: where task results are stored. We use Dragonfly at at db=1.
CORE__MP_START_METHOD=fork: prefer fork to avoid overhead on macOS/Linux during task process creation.
DAG_PROCESSOR__STALE_BUNDLE_CLEANUP_INTERVAL=0: speeds up development by disabling periodic cleanup of serialized DAG bundles.

Option 2: Edit `airflow.cfg`

Open $AIRFLOW_HOME/airflow.cfg and set the following keys:

[core]
executor = CeleryExecutor
mp_start_method = fork

[celery]
broker_url = redis://localhost:6379/0
result_backend = redis://localhost:6379/1

[dag_processor]
stale_bundle_cleanup_interval = 0

Note that the environment variables approach overrides values in airflow.cfg.

Create your First DAG (Directed Acyclic Graph)

Create the folder if it doesn’t exist and add a Python file representing a DAG, which is a model that encapsulates everything needed to execute a workflow:

$> mkdir -p "$AIRFLOW_HOME/dags"
$> touch "$AIRFLOW_HOME/dags/hello_dragonfly.py"

Paste the following Python code into the file:

from airflow import DAG
from airflow.operators.python import PythonOperator
import pendulum

def hello():
    print("🎉 Hello from a Celery worker via Dragonfly!")

with DAG(
    dag_id="hello_dragonfly",
    start_date=pendulum.yesterday("UTC"),   
    schedule=None,
    catchup=False,
):
    PythonOperator(
        task_id="first_task",
        python_callable=hello,
    )

What’s happening here:

dag_id="hello_dragonfly": the unique DAG name you’ll see in the UI.
start_date=pendulum.yesterday("UTC"): a safe past date so Airflow considers the DAG schedulable without backfilling.
schedule=None: you’ll trigger DAG runs manually.
catchup=False: don’t backfill historical runs.
A single PythonOperator task that prints a message. This task will actually run on a Celery worker and we’ll verify that too.

Start Airflow Components

You’ll run three processes. Open three terminals (or tabs). In each one, make sure the AIRFLOW_HOME environment variable is set. If you’re not using the airflow.cfg file, make sure other AIRFLOW_* environment variables mentioned above are also set for each terminal.

Terminal A — Scheduler

The scheduler parses DAGs, decides what needs to run, and sends tasks to Celery via Dragonfly:

# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow scheduler

Terminal B — Celery worker

This starts a Celery worker subscribed to the default queue, which is the queue our tasks use by default:

# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow celery worker --queues default

Terminal C — API server (UI)

This serves the Airflow UI and API at http://localhost:8080:

# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow api-server -p 8080

Accessing the Airflow UI

When you have all three processes running go to http://localhost:8080 and login with the following credentials:

Username: admin
Password: Look it up in the file generated by the simple auth manager as shown below.
Note that we use this simple auth mechanism for local development. For anything serious in production, switch to a proper auth backend.

$> cat "$AIRFLOW_HOME/simple_auth_manager_passwords.json.generated"

In the UI, you should now be able to search for the DAG we’ve created.

Airflow | API Server UI

Sometimes the UI lags a bit during development. If that happens do the following in a new terminal:

# Also set other 'AIRFLOW_*' environment variables above if not using 'airflow.cfg'.
$> export AIRFLOW_HOME="$PWD/airflow-home"
$> uv run airflow dags reserialize
$> uv run airflow dags list | grep hello_dragonfly

dags reserialize refreshes the serialized representations that the web UI reads.
dags list helps confirm Airflow sees your DAG file.
If you still don’t see it, check the scheduler terminal for parsing errors.

Trigger the DAG and Verify Results

In the UI, choose the hello_dragonfly workflow and click Trigger. You should see the run move to the success state. In the meantime, the scheduler output can be monitored, where the state=success and executor_state=success logs confirm the scheduler saw the task complete successfully.

Dag run start:2025-08-21 05:22:36.842947+00:00 end:2025-08-21 05:22:38.110071+00:00
INFO - DagRun Finished: dag_id=hello_dragonfly, logical_date=2025-08-21 05:22:33.863000+00:00, run_id=manual__2025-08-21T05:22:36.476816+00:00, run_start_date=2025-08-21 05:22:36.842947+00:00, run_end_date=2025-08-21 05:22:38.110071+00:00, run_duration=1.267124, state=success, run_type=manual, data_interval_start=2025-08-21 05:22:33.863000+00:00, data_interval_end=2025-08-21 05:22:33.863000+00:00,
INFO - Received executor event with state success for task instance TaskInstanceKey(dag_id='hello_dragonfly', task_id='first_task', run_id='manual__2025-08-21T05:22:36.476816+00:00', try_number=1, map_index=-1)
INFO - TaskInstance Finished: dag_id=hello_dragonfly, task_id=first_task, run_id=manual__2025-08-21T05:22:36.476816+00:00, map_index=-1, run_start_date=2025-08-21 05:22:37.151890+00:00, run_end_date=2025-08-21 05:22:37.552758+00:00, run_duration=0.401, state=success, executor=CeleryExecutor(parallelism=32), executor_state=success, try_number=1, max_tries=0, pool=default_pool, queue=default, priority_weight=1, operator=PythonOperator, queued_dttm=2025-08-21 05:22:36.879924+00:00, scheduled_dttm=2025-08-21 05:22:36.868834+00:00,queued_by_job_id=2, pid=79194

Similarly, Celery worker output can be examined:

[info     ] [aed10f27-bfb8-42ed-94f8-3bc1119c7a06] Executing workload in Celery: token='eyJ***' ti=TaskInstance(id=UUID('0198cb14-3756-7d25-a51b-cc09e190f069'), task_id='first_task', dag_id='hello_dragonfly', run_id='manual__2025-08-21T05:22:36.476816+00:00', try_number=1, map_index=-1, pool_slots=1, queue='default', priority_weight=1, executor_config=None, parent_context_carrier={}, context_carrier={}, queued_dttm=None) dag_rel_path=PurePosixPath('hello_dragonfly.py') bundle_info=BundleInfo(name='dags-folder', version=None) log_path='dag_id=hello_dragonfly/run_id=manual__2025-08-21T05:22:36.476816+00:00/task_id=first_task/attempt=1.log' type='ExecuteTask' [airflow.providers.celery.executors.celery_executor_utils]
[info     ] Secrets backends loaded for worker [supervisor] backend_classes=['EnvironmentVariablesBackend'] count=1
[info     ] Task finished                  [supervisor] duration=0.41565824998542666 exit_code=0 final_state=success

The log Task finished ... final_state=success is the clearest sign that our Celery worker ran the task and reported back to Dragonfly without errors. It’s important to note that the DAG we created for this guide is intentionally simple. Its purpose is to serve as a minimal, functional example to validate that the entire integrated system (from the Airflow scheduler and web server to the Celery worker and Dragonfly backend) is communicating correctly. Writing sophisticated DAGs with complex dependencies, custom operators, and sensors is a topic for another time. In this guide, our primary focus was on the foundational setup: configuring the components to work together seamlessly.

Conclusion

While this post focused on setup rather than benchmarking, how Dragonfly can make this integration so powerful is well-documented. In previous benchmarks, we’ve demonstrated Dragonfly achieving 25x the throughput of Redis, and its advantages extend directly to queue-based workloads. For a deeper dive into the performance metrics and architectural advancements, you can explore our previous analyses:

This proven performance means that the stack you’ve just built isn’t just functional. It’s a high-performance foundation ready to handle the most demanding data orchestration challenges you throw at it.

By integrating Apache Airflow with Celery and Dragonfly, you leverage a modern, high-performance architecture that outpaces traditional Redis setups. Dragonfly’s multi-threaded capabilities ensure superior scalability and performance, while its resource efficiency leads to significant cost savings. This guide provides the technical roadmap to unlock these advantages, positioning Dragonfly as a disruptive force in legacy technology landscapes. As you deploy and test your workflows, you’ll notice the tangible benefits Dragonfly brings to your infrastructure.