← Back to Projects
Nexus Data Pipeline

Nexus Data Pipeline

An asynchronous data aggregation pipeline built with Celery and Redis to process background events.

Python Django Celery Redis PostgreSQL

Overview

Nexus is a background data pipeline designed to aggregate analytics from simulated third-party APIs into a unified dashboard. The core requirement was to process incoming webhook events asynchronously so that the main web application threads remain unblocked and responsive.

Architecture

I implemented an event-driven architecture using Django, Celery, and Redis. When the Django API receives a webhook, it instantly places a task onto a Redis message queue and returns a 202 Accepted response. A separate pool of Celery workers consumes these tasks in the background, processes the payload, and writes the aggregated data to PostgreSQL.

graph LR Webhook[Incoming Webhook] -->|POST /webhook| API[Django API] API -->|Delay Task| Broker[(Redis)] API -->|202 Accepted| Webhook Broker -->|Consume| Worker[Celery Worker Pool] Worker -->|Process & Aggregate| DB[(PostgreSQL)]

Challenges

API Rate Limiting

A significant challenge during development was handling simulated rate limits from external APIs. If an external service rejected a polling request with a 429 Too Many Requests error, the Celery worker would fail, potentially losing the task.

Solution: I utilized Celery's autoretry_for decorator with an exponential backoff strategy. If a rate limit error occurs, the task is safely placed back onto the queue to retry with increasing delays (e.g., 1 minute, then 2 minutes), ensuring data resilience without hammering the external service.

Lessons Learned

I gained a practical understanding of message brokers and asynchronous task execution. Designing this pipeline taught me the importance of writing idempotent background tasks, ensuring that if a worker crashes and a task is retried, it doesn't result in duplicate records in the database.