RAG (Retrieval-Augmented Generation)
Overview
RAG combines the power of retrieval-based systems with generative AI models to provide accurate, context-aware responses. Instead of relying solely on the LLM's training data, RAG fetches relevant information from your document repository before generating responses.
Basic RAG Flow
Our Enhanced RAG Architecture
Our RAG implementation uses a sophisticated pipeline that includes query rewriting, semantic search, and result reranking for improved accuracy.
Components Explained
1. Query Rewriter
- Expands and reformulates the original query
- Improves retrieval by considering synonyms and related concepts
- Uses LLM to understand query intent
2. Semantic Search
- Converts documents and queries into embeddings
- Uses similarity metrics to find relevant content
- Efficiently searches through large document collections
- Implemented through our Snowflake semantic API
3. Result Reranker
- Cross-encodes query-document pairs
- Provides more nuanced relevance scoring
- Ensures most relevant content appears first
Configuration
Here's how to configure a RAG agent with our enhanced architecture:
{
"name": "DocumentHelper",
"description": "Document assistant",
"agentType": "rag",
"config": {
"query_engine": {
"llm_model_id": "amazon.nova-micro-v1:0"
},
"retriever": {
"filter_value": {},
"hx_env_id": "<your-environment-id>",
"limit": 25,
"retriever_type": "snowflake_semantic_api",
"semantic_api_url": "https://your-semantic-api-endpoint/semantic/similar-chunks"
}
}
}
Best Practices
-
Document Preparation
- Ensure documents are properly chunked
- Maintain consistent formatting
- Include metadata for filtering
-
Query Formation
- Be specific in queries
- Use natural language
- Include relevant context
-
Filter Usage
- The
filterValue
parameter follows the Semantic API filtering structure - For detailed filter options and syntax, refer to the Semantic API Documentation
- Common filters include document type, department, and date ranges
- Combine with semantic search for better results
- The
The filtering capabilities are provided by the Semantic API. Check their documentation for the complete list of supported filters and proper filter syntax.
Streaming Support
RAG agents support streaming responses through the /v1/agents/{agent_id}/invoke-rag-stream
endpoint:
POST /v1/agents/{agent_id}/invoke-rag-stream
{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"filterValue": {
"department": "HR",
"documentType": "policy"
}
}
When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.