ADR-008: Vector Store — OCI Search with OpenSearch
Status: Proposed Date: 2026-05-17
Context
BayanCore's AI features (Ask, Act, Automate tiers) require a vector database for the RAG (Retrieval-Augmented Generation) pipeline. The system must store embeddings of company documents, policies, and business data to enable semantic search and context-aware AI responses. All vector data must remain in KSA to comply with PDPL data residency requirements.
Since we selected MariaDB as the primary ERP database (ADR-003), we cannot use pgvector. We need a dedicated vector search solution that is fully managed, KSA-compliant, and integrates well with our OCI-hosted AI inference models.
Decision
OCI Search with OpenSearch is selected as the vector search engine for BayanCore's RAG pipeline.
Rationale:
- Fully managed service by OCI — minimal operational overhead
- Available in OCI Riyadh region — full PDPL compliance
- Native vector search with k-NN plugin for billion-scale similarity search
- Hybrid search: combines vector similarity with full-text search for better results
- Built-in RAG pipeline support with OCI Generative AI Agents integration
- LangChain integration for simplified AI application development
- Supports Cohere and Llama embedding models (both OCI-hosted)
- Enterprise-grade security with encryption, access controls, and auditing
Deployment Configuration:
- Single OpenSearch cluster in OCI Riyadh
- Vector indexes for company document embeddings
- Full-text indexes for keyword search fallback
- Integration with OCI Generative AI for embedding generation (Cohere embed-multilingual-v3)
- Data ingestion pipeline: documents → chunking → embedding → OpenSearch index
Consequences
- Positive: Fully managed, KSA-compliant, RAG-ready, hybrid search capabilities, less ops overhead than self-hosted alternatives
- Trade-offs: Tied to OCI ecosystem, OpenSearch-specific query syntax
- Risks: Service availability depends on OCI region health, embedding model costs
Alternatives Considered
- Qdrant on OKE: More flexible vector features but requires managing Kubernetes deployment and increases operational complexity
- Oracle AI Vector Search (Oracle DB 26ai): Powerful but requires Oracle Database license (expensive) and conflicts with MariaDB decision
- pgvector on PostgreSQL: Would require running PostgreSQL alongside MariaDB, adding unnecessary complexity
- Milvus self-hosted: Apache-licensed and scalable but requires full self-management on OCI compute