Skip to main content

RAG (Retrieval-Augmented Generation)

Overview

RAG combines the power of retrieval-based systems with generative AI models to provide accurate, context-aware responses. Instead of relying solely on the LLM's training data, RAG fetches relevant information from your document repository before generating responses.

Configuration

Here's how to configure a RAG agent with our enhanced architecture:

{
"name": "document-helper",
"displayName": "Document Helper",
"description": "Document assistant",
"agentType": "rag",
"notes": "Agent for document queries",
"config": {
"hxqlQuery": "SELECT * FROM SysContent",
"limit": 5,
"hybridSearch": true,
"adjacentChunkRange": 1,
"adjacentChunkMerge": true,
"rerankerTopN": 5,
"enableHallucinationCheck": true,
"enableMultihopQueryRefinement": false,
"systemPrompt": "Context information is below.\n---------------------\n{context_str}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query_str}\nAnswer: ",
"llmModelId": "amazon.nova-micro-v1:0",
"inferenceConfig": {
"maxTokens": 4000,
"temperature": 0.7
},
"guardrails": ["HAIP-Profanity", "HAIP-Insults-High"]
}
}

Configuration Parameters

ParameterTypeDescriptionRequiredDefault
hxqlQuerystringHXQL query for Content Lake retrieval filteringNonull
limitinteger (≥ 1)Maximum number of chunks to retrieve from Content LakeNo50
hybridSearchbooleanEnable/disable hybrid search (embeddings + full-text)Notrue
adjacentChunkRangeinteger (≥ 0)Number of adjacent chunks to fetch around each retrieved chunk. For range=N, fetches N chunks before and N chunks after each result (0 = disabled)No0
adjacentChunkMergebooleanWhen true, adjacent chunk text is merged into the parent chunk in document order. When false, adjacent chunks are returned as separate nodesNofalse
rerankerTopNinteger (≥ 1)Number of top results to keep after rerankingNo13
enableHallucinationCheckbooleanEnable hallucination detection. When enabled, the agent validates that the generated response is supported by the retrieved chunks and retries if hallucination is detectedNofalse
enableMultihopQueryRefinementbooleanEnable multi-hop query refinement. When enabled, the agent refines its query and re-retrieves if the initial chunks are deemed irrelevantNofalse
systemPromptstringSystem prompt template. Must include {context_str} and {query_str} placeholdersNonull
llmModelIdstringID of the language model to useYesnull
inferenceConfigobjectLLM parameters (see Inference Config)NoSee defaults
guardrailsarrayList of guardrail names to applyNonull
tip

Parameters like enableHallucinationCheck, enableMultihopQueryRefinement, adjacentChunkRange, adjacentChunkMerge, rerankerTopN, and hybridSearch can also be overridden per invocation request.

Invocation Parameters

These optional fields can be passed in the request body when invoking a RAG agent (/invoke or /invoke-stream):

ParameterTypeDescriptionDefault
hxqlQuerystringOverride the agent-level HXQL query for this invocationAgent config value
hybridSearchbooleanEnable/disable hybrid search (embeddings + full-text). Uses config default if not specified.Agent config value
guardrailsstring[]Additional guardrails for this invocation[]

Max Retries

When invoking agents that error for any reason (E.g., input is too large, network issue, etc.), the client you're using to make the request could time out. To help with this, you could set maxRetries in the inferenceConfig to a low number (E.g., 2).

Best Practices

  1. Document Preparation

    • Ensure documents are properly chunked
    • Maintain consistent formatting
    • Include metadata for filtering
  2. Query Formation

    • Be specific in queries
    • Use natural language
    • Include relevant context
  3. Filter Usage

    • The hxqlQuery parameter follows the HxQL filtering structure
    • Common filters include document type, department, and date ranges
    • Combine with semantic search for better results
tip

The filtering capabilities are provided by the Semantic API. Check their documentation for the complete list of supported filters and proper filter syntax.

  1. Guardrails
    • Use the /v1/guardrails endpoint to retrieve the full list of supported guardrails
    • Pass a list of the guardrail names with the payload for the LLM to use
    • Guardrails can be passed at creation time to apply to all agent invocations
    • Additional guardrails can be passed invocation time to apply to that specific invocation only

Invoking a RAG Agent

Standard Invocation

POST /v1/agents/{agent_id}/versions/{version_id}/invoke

Example Request:

{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"hxqlQuery": "SELECT * FROM SysContent"
}

Example Response:

{
"object": "response",
"createdAt": 1741705447,
"model": "amazon.nova-micro-v1:0",
"output": [
{
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "According to the HR policy, full-time employees receive 15 vacation days per year..."
}
]
}
],
"customOutputs": {
"sourceNodes": [
{
"docId": "6e4a7f58-13f1-4d3f-83b9-ec86c5b7df60",
"chunkId": "b9f622c9-7a62-46fc-8cf1-0d1f4d3c35d7",
"score": 0.95,
"text": "VACATION POLICY: Full-time employees are entitled to 15 vacation days per calendar year..."
}
],
"ragMode": "normal"
}
}

Response Fields

FieldDescription
outputList of response messages from the agent
customOutputs.sourceNodesDocuments retrieved from Content Lake that were used as context
customOutputs.sourceNodes[].scoreRelevance score (0–1) of the retrieved chunk
customOutputs.sourceNodes[].textText content of the retrieved chunk
customOutputs.ragModeRAG mode used (normal or deepResearch)
tip

Use latest as the version_id to invoke the most recent version of the agent.

Multi-Turn Conversation

RAG Agents support multi-turn conversations by passing previous messages in the messages array. The last message must always have role: "user".

Session-Based Memory

RAG agents support optional session-based short-term memory. Add an X-Session-ID header with a consistent UUID to maintain conversation context across multiple invocations. The platform does not auto-generate session IDs — you must generate and manage them yourself (any valid UUID).

{
"messages": [
{
"role": "user",
"content": "What's our vacation policy?"
},
{
"role": "assistant",
"content": "Full-time employees receive 15 vacation days per year..."
},
{
"role": "user",
"content": "How do I request time off?"
}
],
"hxqlQuery": "SELECT * FROM SysContent"
}

Streaming Support

RAG agents support streaming responses through the unified /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream endpoint:

POST /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream

Example Request:

{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"hxqlQuery": "SELECT * FROM SysContent",
"enableHallucinationCheck": true,
"enableMultihopQueryRefinement": false
}
tip

When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.

tip

Due to the nature of streaming, returning data chunk by chunk, failures that occur during runtime are difficult to return to the client. Therefore, the returned status code might be a 200. Please check the logs for errors that occur.

Error Responses

When a request fails, the API returns a JSON error response in the following format:

{
"status": 400,
"error": "HTTPException",
"message": "Description of what went wrong"
}

Common Errors

StatusErrorDescription
400Bad RequestInvalid agent configuration (e.g., missing hxqlQuery at both config and invocation level) or invalid invocation parameters.
401UnauthorizedMissing or expired access token. Re-authenticate to obtain a new token.
403ForbiddenInsufficient permissions for the requested operation. Verify your IAM user group permissions.
404Not FoundAgent ID or version does not exist. Use GET /v1/agents to verify available agents.