Skip to main content

RAG (Retrieval-Augmented Generation)

Overview

RAG combines the power of retrieval-based systems with generative AI models to provide accurate, context-aware responses. Instead of relying solely on the LLM's training data, RAG fetches relevant information from your document repository before generating responses.

Configuration

Here's how to configure a RAG agent with our enhanced architecture:

{
"name": "DocumentHelper",
"description": "Document assistant",
"agentType": "rag",
"notes": "Agent for document queries",
"config": {
"hxqlQuery": "SELECT * FROM SysContent",
},
"limit": 5,
"systemPrompt": "Context information is below.\n---------------------\n{context_str}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query_str}\nAnswer: ",
"llmModelId": "amazon.nova-micro-v1:0",
"inferenceConfig": {
"maxTokens": 4000,
"temperature": 0.7
},
"guardrails": ["HAIP-Profanity", "HAIP-Insults-High"]
}
}

Max Retries

When invoking agents that error for any reason (E.g., input is too large, network issue, etc.), the client you're using to make the request could time out. To help with this, you could set maxRetries in the inferenceConfig to a low number (E.g., 2).

Best Practices

  1. Document Preparation

    • Ensure documents are properly chunked
    • Maintain consistent formatting
    • Include metadata for filtering
  2. Query Formation

    • Be specific in queries
    • Use natural language
    • Include relevant context
  3. Filter Usage

    • The hxqlQuery parameter follows the HxQL filtering structure
    • Common filters include document type, department, and date ranges
    • Combine with semantic search for better results
tip

The filtering capabilities are provided by the Semantic API. Check their documentation for the complete list of supported filters and proper filter syntax.

  1. Guardrails
    • Use the /v1/guardrails endpoint to retrieve the full list of supported guardrails
    • Pass a list of the guardrail names with the payload for the LLM to use
    • Guardrails can be passed at creation time to apply to all agent invocations
    • Additional guardrails can be passed invocation time to apply to that specific invocation only

Streaming Support

RAG agents support streaming responses through the unified /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream endpoint:

POST /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream
{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"hxqlQuery": "SELECT * FROM SysContent",

}
tip

When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.

tip

Due to the nature of streaming, returning data chunk by chunk, failures that occur during runtime are difficult to return to the client. Therefore, the returned status code might be a 200. Please check the logs for errors that occur.