Skip to main content

ADR-009: Message Queue — Dual-Queue Strategy

Status: Proposed Date: 2026-05-17

Context

BayanCore requires message queuing for two distinct purposes:

  1. ERPNext background jobs: Task execution for workflows, email sending, report generation, scheduled jobs
  2. Cross-service events: Event-driven communication between BayanCore services (Guardian, AI Engine, Audit Logger) and ERPNext

ERPNext already uses Redis + Celery for its internal task queue. We need to determine whether to extend this pattern for cross-service communication or introduce a separate event streaming solution.

Decision

A dual-queue strategy is adopted:

Queue 1: Redis + Celery (ERPNext Internal Jobs)

  • Used for: ERPNext background tasks, workflow execution, email queues, report generation, scheduled jobs
  • Technology: Redis (OCI Cache) + Celery worker processes
  • Pattern: Task queue with retry, dead-letter, and priority queues
  • Scope: Within ERPNext application boundary

Queue 2: OCI Streaming (Kafka-compatible) (Cross-Service Events)

  • Used for: Cross-service event publishing and consumption, audit trail events, compliance events, AI indexing triggers
  • Technology: OCI Streaming (fully managed Apache Kafka)
  • Pattern: Event sourcing, pub/sub, event replay for compliance
  • Scope: Between BayanCore services (Guardian, AI Engine, Audit Logger, ERPNext)

Event Topics:

  • invoice.created — Triggers Guardian ZATCA validation
  • invoice.cleared — Triggers PDF generation and delivery
  • workflow.completed — Triggers audit logging and AI indexing
  • user.action — Triggers audit trail entry
  • compliance.violation — Triggers alert and remediation workflow

Rationale:

  • Redis+Celery is native to ERPNext — no changes needed for internal jobs
  • OCI Streaming provides durable, replayable event log for compliance/audit requirements
  • Decoupled cross-service communication — new consumers can be added without changing producers
  • OCI Streaming is fully managed — no Kafka cluster management overhead
  • Kafka's partition model supports high-throughput event processing

Consequences

  • Positive: Leverages existing ERPNext patterns, durable event log for compliance, decoupled services, fully managed
  • Trade-offs: Two queue technologies to operate, developers need to understand both patterns
  • Risks: Event ordering across partitions requires careful design, Kafka consumer lag monitoring needed

Alternatives Considered

  • RabbitMQ only: Mature message broker but would replace ERPNext's native Celery setup and add operational complexity
  • OCI Streaming only: Could handle everything but would require migrating ERPNext jobs from Celery to Kafka consumers
  • Redis+Celery only: Simpler but lacks durable event replay capability needed for compliance audit trails
  • AWS SQS/SNS: Not available in KSA regions, violates PDPL data residency