Tool Agent
What is an Agent?
An agent is an AI system that can perceive its environment, make decisions, and take actions to achieve specific goals. In our context, agents are specialized LLM-powered systems that can:
- Understand natural language inputs
- Plan sequences of actions
- Use tools to accomplish tasks
- Maintain context through conversations
- Generate coherent responses
Tool Agent Architecture
Tool Agents use a ReAct (Reasoning + Acting) pattern combined with function calling capabilities from LLM providers. This enables them to:
- Reason about what tools to use
- Execute tools with appropriate parameters
- Observe results
- Plan next steps
Agent Workflow
Configuration Parameters
Essential Parameters
| Parameter | Description | Required | Example |
|---|---|---|---|
llmModelId | Bedrock model identifier | Yes | anthropic.claude-3-haiku-20240307-v1:0 |
systemPrompt | Instructions for the agent | Yes | "You are a helpful assistant..." |
tools | Array of available tools | Yes | See tools configuration |
inferenceConfig | LLM parameters | No | See Inference Config below |
guardrails | List of guardrails names | No | ["HAIP-Profanity", "HAIP-Insults-High"] |
Inference Config
| Parameter | Description | Default |
|---|---|---|
maxTokens | Maximum number of tokens to generate | 4000 |
temperature | Temperature for inference (0.0 to 1.0) | 0.0 |
maxRetries | Maximum number of retries on failure | 10 |
timeout | Timeout for inference in seconds | 3600 |
Tool Configuration
Each tool requires toolType and varies by type:
| Tool Type | Description | Additional Fields |
|---|---|---|
function | Call a predefined function | funcName (required) |
structured_output | Force JSON output matching a schema | outputSchema (required), modelId (optional — model for LLM-assisted schema parsing) |
task_agent | Invoke a Task Agent as a tool | agentId (required), agentVersion (optional, default "latest") |
All tool types accept optional name and description fields.
Max Retries
When invoking agents that error for any reason (E.g., input is too large, network issue, etc.),
the client you're using to make the request could time out.
To help with this, you could set maxRetries in the inferenceConfig to a low number (E.g., 2).
Example Configurations
Basic Example
{
"name": "Multiplier",
"description": "An agent that multiplies two numbers",
"agentType": "tool",
"config": {
"llmModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"systemPrompt": "You are a helpful assistant that uses available tools to answer questions.",
"tools": [
{
"toolType": "function",
"name": "multiply",
"description": "Multiplies two numbers",
"funcName": "multiply"
}
],
"inferenceConfig": {
"maxTokens": 4000,
"temperature": 0.7
},
"guardrails": ["HAIP-Insults-Low"]
}
}
This basic configuration enables the agent to perform multiplication operations. More complex configurations can be created by adding additional tools and customizing parameters.
Structured Output Tools
Tool Agents can be configured to output structured data in JSON format using the structured_output tool type. This is useful when you need to extract specific information in a consistent format or transform unstructured text into structured data.
Configuration Examples
-
Person Information
{
"name": "Info Extractor",
"description": "Extracts information about a person",
"agentType": "tool",
"config": {
"llmModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"systemPrompt": "You are a helpful assistant that extracts structured information.",
"tools": [
{
"toolType": "structured_output",
"name": "structured_output",
"description": "Extracts structured information about a person from text",
"outputSchema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Full name of the person"
},
"age": {
"type": "integer",
"description": "Age of the person"
},
"occupation": {
"type": "string",
"description": "Person's job or profession"
}
},
"required": ["name"]
}
}
]
}
}Sample Output
Input: "John Doe is a 35-year-old software engineer"
{
"name": "John Doe",
"age": 35,
"occupation": "software engineer"
} -
Product Information
{
"name": "Product Info Extractor",
"description": "Extracts information about a product",
"agentType": "tool",
"config": {
"llmModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"systemPrompt": "You are a helpful assistant that extracts product information.",
"tools": [
{
"toolType": "structured_output",
"name": "structured_output",
"description": "Extracts product information from text",
"outputSchema": {
"type": "object",
"properties": {
"productName": { "type": "string" },
"price": { "type": "number" },
"currency": { "type": "string" },
"inStock": { "type": "boolean" }
},
"required": ["productName", "price"]
}
}
]
}
} -
Event Detail with nested schemas
{
"name": "Event Detailer",
"description": "Extracts information about an event",
"agentType": "tool",
"config": {
"llmModelId": "anthropic.claude-3-haiku-20240307-v1:0",
"systemPrompt": "You are a helpful assistant that extracts event information.",
"tools": [
{
"toolType": "structured_output",
"name": "structured_output",
"description": "Extracts event information from text",
"outputSchema": {
"type": "object",
"properties": {
"eventName": { "type": "string" },
"date": { "type": "string", "format": "date" },
"location": { "type": "string" },
"attendees": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["eventName", "date"]
}
}
]
}
}
Working with Images
Tool Agents support processing images alongside text inputs. Images are provided within the content array. Each image is represented by an item with type: "image", containing a source object that specifies the origin (URL, S3 path, or base64), mediaType, and the actual image data or reference.
Images can be provided in three primary ways:
-
URL-based images: Provide a direct HTTPS URL to the image. The
mediaTypefield indicates the image format.{
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/image.jpg",
"mediaType": "image/jpeg" // Supported: image/jpeg, image/png, image/tiff, etc.
}
}
],
"role": "user"
}Note: For TIFF images (
mediaType: "image/tiff"), the URL must end with a.tiffor.tifextension. -
S3 path images: Reference an image stored in an S3 bucket. Specify the S3 path and the
mediaType.{
"content": [
{
"type": "text",
"text": "Analyze this image from S3"
},
{
"type": "image",
"source": {
"type": "s3_path",
"path": "s3://bucket-name/path/to/image.png",
"mediaType": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
}
}
],
"role": "user"
}Note: For TIFF images (
mediaType: "image/tiff"), the S3 path must point to a file with a.tiffor.tifextension. -
Base64-encoded images: Embed the image data directly as a base64 encoded string. The
mediaTypeis crucial here.{
"content": [
{
"type": "text",
"text": "Describe this embedded image"
},
{
"type": "image",
"source": {
"type": "base64",
"data": "<base64-encoded-image-data>",
"mediaType": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
}
}
],
"role": "user"
}Note: The
mediaTypefield is mandatory and crucial for correctly interpreting the base64 data. Forimage/tiff, the image data itself is validated (e.g., for correct TIFF format based on its magic number).
- Specify
mediaType: Always include the correctmediaType(e.g.,image/jpeg,image/png,image/tiff) in thesourceobject. This is essential for proper processing. - Supported Formats: While many standard image formats can be indicated via
mediaType, ensure your specific use case is tested. JPG, PNG, and TIFF are explicitly supported and validated. - TIFF Specifics:
- For URL or S3 sources of TIFF images, ensure the file path ends with
.tiffor.tif. - For base64 encoded TIFF images, the data integrity (magic number) is checked.
- For URL or S3 sources of TIFF images, ensure the file path ends with
- Image Source: Clearly define the image
sourcewith itstype(url,s3_path,base64) and the corresponding data field (url,path,data). - Size and Prompts: Keep image sizes reasonable (refer to any documented limits) to avoid timeouts and include clear text prompts to guide the agent's analysis.
- Costs: Be mindful of potential costs and rate limits when processing multiple or large images.
Working with Documents
Tool Agents can process various types of documents as part of their input. This allows you to provide rich context to the agent for tasks like summarization, question answering based on specific texts, or data extraction. Documents are included in the content array of a message, similar to images. Each document is represented by an item with type: "document", containing a source object that specifies the origin (URL, S3 path, or base64), mediaType, and the actual document data or reference. The name field is required (1-200 chars) and helps the agent identify specific documents.
Supported File Types: PDF, CSV, XLS, XLSX, HTML, TXT, MD. The mediaType field in the source object should correspond to the actual format of the document (e.g., application/pdf for PDF, text/csv for CSV).
Maximum File Size: 4.5 MB
Document Limit: Up to 5 documents can be included per message.
Documents can be provided in three primary ways:
-
URL-based documents: Provide a direct HTTPS URL to the document. The
mediaTypefield indicates the document format.{
"content": [
{
"type": "text",
"text": "Summarize the attached report."
},
{
"type": "document",
"source": {
"type": "url",
"url": "https://example.com/annual_report.pdf",
"mediaType": "application/pdf"
},
"name": "Annual Report 2023"
}
],
"role": "user"
} -
Base64-encoded documents: Embed the document data directly as a base64 encoded string. The
mediaTypeis crucial here.{
"content": [
{
"type": "text",
"text": "What are the main points in this text file?"
},
{
"type": "document",
"source": {
"type": "base64",
"data": "<base64-encoded-document-content>",
"mediaType": "application/pdf"
},
"name": "Meeting Notes"
}
],
"role": "user"
}Note: The
mediaTypefield is mandatory and crucial for correctly interpreting the base64 data. -
S3 path documents: Reference a document stored in an S3 bucket. Specify the S3 path and the
mediaType.{
"content": [
{
"type": "text",
"text": "Analyze this document from S3"
},
{
"type": "document",
"source": {
"type": "s3_path",
"path": "s3://bucket-name/path/to/document.xlsx",
"mediaType": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
},
"name": "Sales Data Q1"
}
],
"role": "user"
}Note: The
mediaTypefield is mandatory and crucial for correctly interpreting the document.
The name field is required and helps the agent identify and refer to specific documents, especially when multiple documents are provided. Allowed characters: alphanumeric, single spaces, hyphens, parentheses, and square brackets.
- Use
name: Provide a descriptive name for each document (required, 1-200 chars). Use a neutral name to avoid prompt injection risks. - Specify
mediaType: Always include the correctmediaType(e.g.,application/pdf,text/csv,application/msword) in thesourceobject. This is essential for proper processing and should correspond to one of the Supported File Types. - Respect Limits: Do not exceed the 5-document limit per message or the 4.5 MB file size limit.
- Choose the Right Source Type:
- Use
source.type: "url"withsource.urlfor publicly accessible documents. - Use
source.type: "s3_path"withsource.pathfor documents in S3 (once feature is available). - Use
source.type: "base64"withsource.datafor smaller files or when direct content embedding is necessary. Ensure proper base64 encoding.
- Use
- Clear Prompts: Accompany documents with clear text prompts instructing the agent on what to do with them.
- Supported Formats: Ensure your documents are in one of the supported file formats and the
mediaTypeaccurately reflects this.
Best Practices
-
System Prompts
- Be specific about the agent's role
- Define clear boundaries
- Include error handling instructions
-
Tool Selection
- Choose tools that complement each other
- Provide clear tool descriptions
- Limit tools to necessary ones only
-
Error Handling
- Include retry logic
- Provide fallback options
- Handle tool failures gracefully
Best Practices for Structured Output
-
Schema Design
- Keep schemas focused and specific
- Use clear property names
- Include descriptions for complex fields
- Mark required fields appropriately
-
Data Validation
- Use appropriate data types
- Include format specifications when needed
- Consider adding enum values for restricted choices
-
Error Handling
- Provide fallback values for optional fields
- Handle missing or invalid data gracefully
- Include validation messages in the schema
Invoking a Tool Agent
Standard Invocation
POST /v1/agents/{agent_id}/versions/{version_id}/invoke
Example Request:
{
"messages": [
{
"role": "user",
"content": "What is 5 times 20?"
}
]
}
Example Response:
{
"object": "response",
"createdAt": 1741705500,
"model": "anthropic.claude-3-haiku-20240307-v1:0",
"output": [
{
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The result of 5 times 20 is 100."
}
]
}
]
}
Use latest as the version_id to invoke the most recent version of the agent.
Multi-Turn Conversation
Tool Agents support multi-turn conversations by passing previous messages in the messages array. The last message must always have role: "user".
{
"messages": [
{
"role": "user",
"content": "What is 5 times 20?"
},
{
"role": "assistant",
"content": "The result of 5 times 20 is 100."
},
{
"role": "user",
"content": "Now divide that by 4."
}
]
}
Invocation with Guardrails
Additional guardrails can be passed at invocation time to apply to that specific request only:
{
"messages": [
{
"role": "user",
"content": "What is 5 times 20?"
}
],
"guardrails": ["HAIP-Hate-High", "HAIP-Insults-Low"]
}
Limitations
- Tool execution timeout is fixed (configurable via
inferenceConfig.timeout, default 3600s)
Streaming Support
Tool agents support streaming responses through the unified /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream endpoint:
Example Request:
{
"messages": [
{
"role": "user",
"content": "Explain how multiplication works."
}
]
}
When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.
Due to the nature of streaming, returning data chunk by chunk, failures that occur during runtime are difficult to return to the client. Therefore, the returned status code might be a 200, but we do our best to return the error in the response. However, it's also difficult doing that, as silent failures could occur. Please check the logs for errors that occur.
Error Responses
When a request fails, the API returns a JSON error response in the following format:
{
"status": 400,
"error": "HTTPException",
"message": "Description of what went wrong"
}
Common Errors
| Status | Error | Description |
|---|---|---|
400 | Bad Request | Invalid agent configuration (e.g., missing tools, invalid toolType, malformed outputSchema), missing required fields, or invalid invocation parameters. |
401 | Unauthorized | Missing or expired access token. Re-authenticate to obtain a new token. |
403 | Forbidden | Insufficient permissions for the requested operation. Verify your IAM user group permissions. |
404 | Not Found | Agent ID or version does not exist. Use GET /v1/agents to verify available agents. |
Future Enhancements
- Additional tool types planned
- Enhanced memory capabilities
- Dynamic tool loading