RAG (Retrieval-Augmented Generation)
Overview
RAG combines the power of retrieval-based systems with generative AI models to provide accurate, context-aware responses. Instead of relying solely on the LLM's training data, RAG fetches relevant information from your document repository before generating responses.
Basic RAG Flow
Configuration
Here's how to configure a RAG agent with our enhanced architecture:
{
"name": "DocumentHelper",
"description": "Document assistant",
"agentType": "rag",
"notes": "Agent for document queries",
"config": {
"filter_value": {},
"limit": 5,
"system_prompt": """|
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: """,
"llm_model_id": "amazon.nova-micro-v1:0",
"inference_config": {
"max_tokens": 4000,
"temperature": 0.7
}
}
}
Best Practices
-
Document Preparation
- Ensure documents are properly chunked
- Maintain consistent formatting
- Include metadata for filtering
-
Query Formation
- Be specific in queries
- Use natural language
- Include relevant context
-
Filter Usage
- The
filterValue
parameter follows the Semantic API filtering structure - For detailed filter options and syntax, refer to the Semantic API Documentation
- Common filters include document type, department, and date ranges
- Combine with semantic search for better results
- The
The filtering capabilities are provided by the Semantic API. Check their documentation for the complete list of supported filters and proper filter syntax.
Streaming Support
RAG agents support streaming responses through the /v1/agents/{agent_id}/invoke-rag-stream
endpoint:
POST /v1/agents/{agent_id}/invoke-rag-stream
{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"filterValue": {
"department": "HR",
"documentType": "policy"
}
}
When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.