RAG (Retrieval-Augmented Generation)
Overview
RAG combines the power of retrieval-based systems with generative AI models to provide accurate, context-aware responses. Instead of relying solely on the LLM's training data, RAG fetches relevant information from your document repository before generating responses.
Configuration
Here's how to configure a RAG agent with our enhanced architecture:
{
"name": "document-helper",
"displayName": "Document Helper",
"description": "Document assistant",
"agentType": "rag",
"notes": "Agent for document queries",
"config": {
"hxqlQuery": "SELECT * FROM SysContent",
"limit": 5,
"hybridSearch": true,
"adjacentChunkRange": 1,
"adjacentChunkMerge": true,
"rerankerTopN": 5,
"enableHallucinationCheck": true,
"enableMultihopQueryRefinement": false,
"systemPrompt": "Context information is below.\n---------------------\n{context_str}\n---------------------\nGiven the context information and not prior knowledge, answer the query.\nQuery: {query_str}\nAnswer: ",
"llmModelId": "amazon.nova-micro-v1:0",
"inferenceConfig": {
"maxTokens": 4000,
"temperature": 0.7
},
"guardrails": ["HAIP-Profanity", "HAIP-Insults-High"]
}
}
Configuration Parameters
| Parameter | Type | Description | Required | Default |
|---|---|---|---|---|
hxqlQuery | string | HXQL query for Content Lake retrieval filtering | No | null |
limit | integer (≥ 1) | Maximum number of chunks to retrieve from Content Lake | No | 50 |
hybridSearch | boolean | Enable/disable hybrid search (embeddings + full-text) | No | true |
adjacentChunkRange | integer (≥ 0) | Number of adjacent chunks to fetch around each retrieved chunk. For range=N, fetches N chunks before and N chunks after each result (0 = disabled) | No | 0 |
adjacentChunkMerge | boolean | When true, adjacent chunk text is merged into the parent chunk in document order. When false, adjacent chunks are returned as separate nodes | No | false |
rerankerTopN | integer (≥ 1) | Number of top results to keep after reranking | No | 13 |
enableHallucinationCheck | boolean | Enable hallucination detection. When enabled, the agent validates that the generated response is supported by the retrieved chunks and retries if hallucination is detected | No | false |
enableMultihopQueryRefinement | boolean | Enable multi-hop query refinement. When enabled, the agent refines its query and re-retrieves if the initial chunks are deemed irrelevant | No | false |
systemPrompt | string | System prompt template. Must include {context_str} and {query_str} placeholders | No | null |
llmModelId | string | ID of the language model to use | Yes | null |
inferenceConfig | object | LLM parameters (see Inference Config) | No | See defaults |
guardrails | array | List of guardrail names to apply | No | null |
Parameters like enableHallucinationCheck, enableMultihopQueryRefinement, adjacentChunkRange, adjacentChunkMerge, rerankerTopN, and hybridSearch can also be overridden per invocation request.
Invocation Parameters
These optional fields can be passed in the request body when invoking a RAG agent (/invoke or /invoke-stream):
| Parameter | Type | Description | Default |
|---|---|---|---|
hxqlQuery | string | Override the agent-level HXQL query for this invocation | Agent config value |
hybridSearch | boolean | Enable/disable hybrid search (embeddings + full-text). Uses config default if not specified. | Agent config value |
guardrails | string[] | Additional guardrails for this invocation | [] |
Max Retries
When invoking agents that error for any reason (E.g., input is too large, network issue, etc.),
the client you're using to make the request could time out.
To help with this, you could set maxRetries in the inferenceConfig to a low number (E.g., 2).
Best Practices
-
Document Preparation
- Ensure documents are properly chunked
- Maintain consistent formatting
- Include metadata for filtering
-
Query Formation
- Be specific in queries
- Use natural language
- Include relevant context
-
Filter Usage
- The
hxqlQueryparameter follows the HxQL filtering structure - Common filters include document type, department, and date ranges
- Combine with semantic search for better results
- The
The filtering capabilities are provided by the Semantic API. Check their documentation for the complete list of supported filters and proper filter syntax.
- Guardrails
- Use the
/v1/guardrailsendpoint to retrieve the full list of supported guardrails - Pass a list of the guardrail names with the payload for the LLM to use
- Guardrails can be passed at creation time to apply to all agent invocations
- Additional guardrails can be passed invocation time to apply to that specific invocation only
- Use the
Invoking a RAG Agent
Standard Invocation
POST /v1/agents/{agent_id}/versions/{version_id}/invoke
Example Request:
{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"hxqlQuery": "SELECT * FROM SysContent"
}
Example Response:
{
"object": "response",
"createdAt": 1741705447,
"model": "amazon.nova-micro-v1:0",
"output": [
{
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "According to the HR policy, full-time employees receive 15 vacation days per year..."
}
]
}
],
"customOutputs": {
"sourceNodes": [
{
"docId": "6e4a7f58-13f1-4d3f-83b9-ec86c5b7df60",
"chunkId": "b9f622c9-7a62-46fc-8cf1-0d1f4d3c35d7",
"score": 0.95,
"text": "VACATION POLICY: Full-time employees are entitled to 15 vacation days per calendar year..."
}
],
"ragMode": "normal"
}
}
Response Fields
| Field | Description |
|---|---|
output | List of response messages from the agent |
customOutputs.sourceNodes | Documents retrieved from Content Lake that were used as context |
customOutputs.sourceNodes[].score | Relevance score (0–1) of the retrieved chunk |
customOutputs.sourceNodes[].text | Text content of the retrieved chunk |
customOutputs.ragMode | RAG mode used (normal or deepResearch) |
Use latest as the version_id to invoke the most recent version of the agent.
Multi-Turn Conversation
RAG Agents support multi-turn conversations by passing previous messages in the messages array. The last message must always have role: "user".
RAG agents support optional session-based short-term memory. Add an X-Session-ID header with a consistent UUID to maintain conversation context across multiple invocations. The platform does not auto-generate session IDs — you must generate and manage them yourself (any valid UUID).
{
"messages": [
{
"role": "user",
"content": "What's our vacation policy?"
},
{
"role": "assistant",
"content": "Full-time employees receive 15 vacation days per year..."
},
{
"role": "user",
"content": "How do I request time off?"
}
],
"hxqlQuery": "SELECT * FROM SysContent"
}
Streaming Support
RAG agents support streaming responses through the unified /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream endpoint:
POST /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream
Example Request:
{
"messages": [
{
"role": "user",
"content": "What's in our HR policy about vacation days?"
}
],
"hxqlQuery": "SELECT * FROM SysContent",
"enableHallucinationCheck": true,
"enableMultihopQueryRefinement": false
}
When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.
Due to the nature of streaming, returning data chunk by chunk, failures that occur during runtime are difficult to return to the client. Therefore, the returned status code might be a 200. Please check the logs for errors that occur.
Error Responses
When a request fails, the API returns a JSON error response in the following format:
{
"status": 400,
"error": "HTTPException",
"message": "Description of what went wrong"
}
Common Errors
| Status | Error | Description |
|---|---|---|
400 | Bad Request | Invalid agent configuration (e.g., missing hxqlQuery at both config and invocation level) or invalid invocation parameters. |
401 | Unauthorized | Missing or expired access token. Re-authenticate to obtain a new token. |
403 | Forbidden | Insufficient permissions for the requested operation. Verify your IAM user group permissions. |
404 | Not Found | Agent ID or version does not exist. Use GET /v1/agents to verify available agents. |