Skip to main content

RAG (Retrieval-Augmented Generation)

Overview

RAG combines the power of retrieval-based systems with generative AI models to provide accurate, context-aware responses. Instead of relying solely on the LLM's training data, RAG fetches relevant information from your document repository before generating responses.

Basic RAG Flow

Our Enhanced RAG Architecture

Our RAG implementation uses a sophisticated pipeline that includes query rewriting, semantic search, and result reranking for improved accuracy.

Components Explained

1. Query Rewriter

  • Expands and reformulates the original query
  • Improves retrieval by considering synonyms and related concepts
  • Uses LLM to understand query intent
  • Converts documents and queries into embeddings
  • Uses similarity metrics to find relevant content
  • Efficiently searches through large document collections
  • Implemented through our Snowflake semantic API

3. Result Reranker

  • Cross-encodes query-document pairs
  • Provides more nuanced relevance scoring
  • Ensures most relevant content appears first

Configuration

Here's how to configure a RAG agent with our enhanced architecture:

{
"name": "DocumentHelper",
"description": "Document assistant",
"agentType": "rag",
"config": {
"query_engine": {
"llm_model_id": "amazon.nova-micro-v1:0"
},
"retriever": {
"filter_value": {},
"hx_env_id": "<your-environment-id>",
"limit": 25,
"retriever_type": "snowflake_semantic_api",
"semantic_api_url": "https://your-semantic-api-endpoint/semantic/similar-chunks"
}
}
}

Best Practices

  1. Document Preparation

    • Ensure documents are properly chunked
    • Maintain consistent formatting
    • Include metadata for filtering
  2. Query Formation

    • Be specific in queries
    • Use natural language
    • Include relevant context
  3. Filter Usage

    • The filterValue parameter follows the Semantic API filtering structure
    • For detailed filter options and syntax, refer to the Semantic API Documentation
    • Common filters include document type, department, and date ranges
    • Combine with semantic search for better results
tip

The filtering capabilities are provided by the Semantic API. Check their documentation for the complete list of supported filters and proper filter syntax.

Streaming Support

RAG agents support streaming responses through the /v1/agents/{agent_id}/invoke-rag-stream endpoint:

POST /v1/agents/{agent_id}/invoke-rag-stream
{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"filterValue": {
"department": "HR",
"documentType": "policy"
}
}
tip

When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.