Observability
Overview
BayanCore observability ensures we can detect, diagnose, and resolve issues quickly while maintaining compliance with data residency requirements.
Three Pillars
Metrics
- Infrastructure: CPU, memory, disk, network (OCI monitoring)
- Application: Request rate, error rate, latency
- Business: FWCR, ZATCA clearance time, AI accuracy
- SLOs: 99.9% uptime, <2s invoice clearance
Logs
- Application Logs: Structured JSON logs
- Audit Logs: Immutable action logs
- System Logs: OCI compute and database logs
- Retention: 2 years (OCI Object Storage)
Traces
- Distributed Tracing: Request flow across services
- Span Collection: OpenTelemetry standard
- Sampling: Adaptive sampling for high-traffic endpoints
Alerting
Alert Channels
- P1 (Critical): WhatsApp + Slack + Phone call
- P2 (High): Slack + Email
- P3 (Medium): Email
- P4 (Low): Dashboard notification
Key Alerts
- ZATCA clearance failure rate > 5%
- API error rate > 1%
- Response time P95 > 3s
- Database connection pool exhaustion
- AI hallucination rate > 1%