Skip to main content

Observability

Overview

BayanCore observability ensures we can detect, diagnose, and resolve issues quickly while maintaining compliance with data residency requirements.

Three Pillars

Metrics

  • Infrastructure: CPU, memory, disk, network (OCI monitoring)
  • Application: Request rate, error rate, latency
  • Business: FWCR, ZATCA clearance time, AI accuracy
  • SLOs: 99.9% uptime, <2s invoice clearance

Logs

  • Application Logs: Structured JSON logs
  • Audit Logs: Immutable action logs
  • System Logs: OCI compute and database logs
  • Retention: 2 years (OCI Object Storage)

Traces

  • Distributed Tracing: Request flow across services
  • Span Collection: OpenTelemetry standard
  • Sampling: Adaptive sampling for high-traffic endpoints

Alerting

Alert Channels

  • P1 (Critical): WhatsApp + Slack + Phone call
  • P2 (High): Slack + Email
  • P3 (Medium): Email
  • P4 (Low): Dashboard notification

Key Alerts

  • ZATCA clearance failure rate > 5%
  • API error rate > 1%
  • Response time P95 > 3s
  • Database connection pool exhaustion
  • AI hallucination rate > 1%

Dashboard

  • Infrastructure health overview
  • Application performance metrics
  • Business KPI dashboard
  • Compliance status dashboard
  • AI performance metrics