Nexus Data Pipeline
An asynchronous data aggregation pipeline built with Celery and Redis to process background events.
Overview
Nexus is a background data pipeline designed to aggregate analytics from simulated third-party APIs into a unified dashboard. The core requirement was to process incoming webhook events asynchronously so that the main web application threads remain unblocked and responsive.
Architecture
I implemented an event-driven architecture using Django, Celery, and Redis. When the Django API receives a webhook, it instantly places a task onto a Redis message queue and returns a 202 Accepted response. A separate pool of Celery workers consumes these tasks in the background, processes the payload, and writes the aggregated data to PostgreSQL.
Challenges
API Rate Limiting
A significant challenge during development was handling simulated rate limits from external APIs. If an external service rejected a polling request with a 429 Too Many Requests error, the Celery worker would fail, potentially losing the task.
Solution: I utilized Celery's autoretry_for decorator with an exponential backoff strategy. If a rate limit error occurs, the task is safely placed back onto the queue to retry with increasing delays (e.g., 1 minute, then 2 minutes), ensuring data resilience without hammering the external service.
Lessons Learned
I gained a practical understanding of message brokers and asynchronous task execution. Designing this pipeline taught me the importance of writing idempotent background tasks, ensuring that if a worker crashes and a task is retried, it doesn't result in duplicate records in the database.