ADR-009: Message Queue — Dual-Queue Strategy
Status: Proposed Date: 2026-05-17
Context
BayanCore requires message queuing for two distinct purposes:
- ERPNext background jobs: Task execution for workflows, email sending, report generation, scheduled jobs
- Cross-service events: Event-driven communication between BayanCore services (Guardian, AI Engine, Audit Logger) and ERPNext
ERPNext already uses Redis + Celery for its internal task queue. We need to determine whether to extend this pattern for cross-service communication or introduce a separate event streaming solution.
Decision
A dual-queue strategy is adopted:
Queue 1: Redis + Celery (ERPNext Internal Jobs)
- Used for: ERPNext background tasks, workflow execution, email queues, report generation, scheduled jobs
- Technology: Redis (OCI Cache) + Celery worker processes
- Pattern: Task queue with retry, dead-letter, and priority queues
- Scope: Within ERPNext application boundary
Queue 2: OCI Streaming (Kafka-compatible) (Cross-Service Events)
- Used for: Cross-service event publishing and consumption, audit trail events, compliance events, AI indexing triggers
- Technology: OCI Streaming (fully managed Apache Kafka)
- Pattern: Event sourcing, pub/sub, event replay for compliance
- Scope: Between BayanCore services (Guardian, AI Engine, Audit Logger, ERPNext)
Event Topics:
invoice.created— Triggers Guardian ZATCA validationinvoice.cleared— Triggers PDF generation and deliveryworkflow.completed— Triggers audit logging and AI indexinguser.action— Triggers audit trail entrycompliance.violation— Triggers alert and remediation workflow
Rationale:
- Redis+Celery is native to ERPNext — no changes needed for internal jobs
- OCI Streaming provides durable, replayable event log for compliance/audit requirements
- Decoupled cross-service communication — new consumers can be added without changing producers
- OCI Streaming is fully managed — no Kafka cluster management overhead
- Kafka's partition model supports high-throughput event processing
Consequences
- Positive: Leverages existing ERPNext patterns, durable event log for compliance, decoupled services, fully managed
- Trade-offs: Two queue technologies to operate, developers need to understand both patterns
- Risks: Event ordering across partitions requires careful design, Kafka consumer lag monitoring needed
Alternatives Considered
- RabbitMQ only: Mature message broker but would replace ERPNext's native Celery setup and add operational complexity
- OCI Streaming only: Could handle everything but would require migrating ERPNext jobs from Celery to Kafka consumers
- Redis+Celery only: Simpler but lacks durable event replay capability needed for compliance audit trails
- AWS SQS/SNS: Not available in KSA regions, violates PDPL data residency