Tool Agent
What is an Agent?
An agent is an AI system that can perceive its environment, make decisions, and take actions to achieve specific goals. In our context, agents are specialized LLM-powered systems that can:
- Understand natural language inputs
- Plan sequences of actions
- Use tools to accomplish tasks
- Maintain context through conversations
- Generate coherent responses
Tool Agent Architecture
Tool Agents use a ReAct (Reasoning + Acting) pattern combined with function calling capabilities from LLM providers. This enables them to:
- Reason about what tools to use
- Execute tools with appropriate parameters
- Observe results
- Plan next steps
Agent Workflow
Configuration Parameters
Essential Parameters
Parameter | Description | Required | Example |
---|---|---|---|
llm_model_id | Bedrock model identifier | Yes | anthropic.claude-3-haiku-20240307-v1:0 |
system_prompt | Instructions for the agent | Yes | "You are a helpful assistant..." |
tools | Array of available tools | Yes | See tools configuration |
inference_config | LLM parameters | No | {"max_tokens": 4000} |
Tool Configuration
Each tool requires:
{
"tool_type": "function|rag|mcp",
"name": "tool_name",
"description": "What the tool does",
"func_name": "function_identifier"
}
Example Configurations
Basic Example
{
"llm_model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"system_prompt": "You are a helpful assistant that uses available tools to answer questions.",
"tools": [
{
"tool_type": "function",
"name": "multiply",
"description": "Multiplies two numbers",
"func_name": "multiply_numbers"
}
],
"inference_config": {
"max_tokens": 4000,
"temperature": 0.7
}
}
This basic configuration enables the agent to perform multiplication operations. More complex configurations can be created by adding additional tools and customizing parameters.
:::
Structured Output Tools
Structured Output on Tool Agent is currently an experimental feature. The API and functionality may change in future releases.
Tool Agents can be configured to output structured data in JSON format using the structured_output
tool type. This is useful when you need to extract specific information in a consistent format or transform unstructured text into structured data.
Configuration Examples
-
Person Information
{
"tool_type": "structured_output",
"name": "structured_output",
"description": "Extracts structured information about a person from text",
"output_schema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Full name of the person"
},
"age": {
"type": "integer",
"description": "Age of the person"
},
"occupation": {
"type": "string",
"description": "Person's job or profession"
}
},
"required": ["name"]
}
}Sample Output
// Input: "John Doe is a 35-year-old software engineer"
// Output:
{
"name": "John Doe",
"age": 35,
"occupation": "software engineer"
} -
Product Information
{
"tool_type": "structured_output",
"name": "structured_output",
"description": "Extracts product information from text",
"output_schema": {
"type": "object",
"properties": {
"product_name": { "type": "string" },
"price": { "type": "number" },
"currency": { "type": "string" },
"in_stock": { "type": "boolean" }
},
"required": ["product_name", "price"]
}
} -
Event Detail with nested schemas
{
"tool_type": "structured_output",
"name": "structured_output",
"description": "Extracts event information from text",
"output_schema": {
"type": "object",
"properties": {
"event_name": { "type": "string" },
"date": { "type": "string", "format": "date" },
"location": { "type": "string" },
"attendees": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["event_name", "date"]
}
}
Working with Images
Tool Agents support processing images alongside text inputs. Images are provided within the content
array. Each image is represented by an item with type: "image"
, containing a source
object that specifies the origin (URL, S3 path, or base64), media_type
, and the actual image data or reference.
Images can be provided in three primary ways:
-
URL-based images: Provide a direct HTTPS URL to the image. The
media_type
field indicates the image format.{
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image",
"source": {
"type": "url",
"url": "https://example.com/image.jpg",
"media_type": "image/jpeg" // Supported: image/jpeg, image/png, image/tiff, etc.
}
}
],
"role": "user"
}Note: For TIFF images (
media_type: "image/tiff"
), the URL must end with a.tiff
or.tif
extension. -
S3 path images: Reference an image stored in an S3 bucket. Specify the S3 path and the
media_type
.{
"content": [
{
"type": "text",
"text": "Analyze this image from S3"
},
{
"type": "image",
"source": {
"type": "s3_path",
"path": "s3://bucket-name/path/to/image.png",
"media_type": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
}
}
],
"role": "user"
}Note: For TIFF images (
media_type: "image/tiff"
), the S3 path must point to a file with a.tiff
or.tif
extension. -
Base64-encoded images: Embed the image data directly as a base64 encoded string. The
media_type
is crucial here.{
"content": [
{
"type": "text",
"text": "Describe this embedded image"
},
{
"type": "image",
"source": {
"type": "base64",
"data": "<base64-encoded-image-data>",
"media_type": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
}
}
],
"role": "user"
}Note: The
media_type
field is mandatory and crucial for correctly interpreting the base64 data. Forimage/tiff
, the image data itself is validated (e.g., for correct TIFF format based on its magic number).
- Specify
media_type
: Always include the correctmedia_type
(e.g.,image/jpeg
,image/png
,image/tiff
) in thesource
object. This is essential for proper processing. - Supported Formats: While many standard image formats can be indicated via
media_type
, ensure your specific use case is tested. JPG, PNG, and TIFF are explicitly supported and validated. - TIFF Specifics:
- For URL or S3 sources of TIFF images, ensure the file path ends with
.tiff
or.tif
. - For base64 encoded TIFF images, the data integrity (magic number) is checked.
- For URL or S3 sources of TIFF images, ensure the file path ends with
- Image Source: Clearly define the image
source
with itstype
(url
,s3_path
,base64
) and the corresponding data field (url
,path
,data
). - Size and Prompts: Keep image sizes reasonable (refer to any documented limits) to avoid timeouts and include clear text prompts to guide the agent's analysis.
- Costs: Be mindful of potential costs and rate limits when processing multiple or large images.
Working with Documents
Tool Agents can process various types of documents as part of their input. This allows you to provide rich context to the agent for tasks like summarization, question answering based on specific texts, or data extraction. Documents are included in the content
array of a message, similar to images. Each document is represented by an item with type: "document"
, containing a source
object that specifies the origin (URL, S3 path, or base64), media_type
, and the actual document data or reference. The document_title
is an optional top-level field that can help the agent identify specific documents.
Supported File Types: PDF, CSV, DOC, DOCX, XLS, XLSX, HTML, TXT, MD. The media_type
field in the source
object should correspond to the actual format of the document (e.g., application/pdf
for PDF, text/csv
for CSV).
Maximum File Size: 4.5 MB
Document Limit: Up to 5 documents can be included per message.
Documents can be provided in three primary ways:
-
URL-based documents: Provide a direct HTTPS URL to the document. The
media_type
field indicates the document format.{
"content": [
{
"type": "text",
"text": "Summarize the attached report."
},
{
"type": "document",
"source": {
"type": "url",
"url": "https://example.com/annual_report.pdf",
"media_type": "application/pdf" // E.g., application/pdf, text/csv, etc.
},
"document_title": "Annual Report 2023" // Optional
}
],
"role": "user"
} -
Base64-encoded documents: Embed the document data directly as a base64 encoded string. The
media_type
is crucial here.{
"content": [
{
"type": "text",
"text": "What are the main points in this text file?"
},
{
"type": "document",
"source": {
"type": "base64",
"data": "<base64-encoded-document-content>",
"media_type": "application/pdf" // E.g., text/plain, application/msword, etc.
},
"document_title": "Meeting Notes" // Optional
}
],
"role": "user"
}Note: The
media_type
field is mandatory and crucial for correctly interpreting the base64 data. -
S3 path documents: Reference a document stored in an S3 bucket. Specify the S3 path and the
media_type
.{
"content": [
{
"type": "text",
"text": "Analyze this document from S3"
},
{
"type": "document",
"source": {
"type": "s3_path",
"path": "s3://bucket-name/path/to/document.xlsx",
"media_type": "text/markdown" // E.g., application/vnd.ms-excel, etc.
},
"document_title": "Sales Data Q1" // Optional
}
],
"role": "user"
}Note: This feature is soon to be added. The
media_type
field is mandatory and crucial for correctly interpreting the document.
The document_title
field is optional but recommended, as it can help the agent identify and refer to specific documents, especially when multiple documents are provided. This field remains at the top level of the document
object.
- Use
document_title
: Provide a descriptive title for each document. This field is at the top-level of thedocument
object. - Specify
media_type
: Always include the correctmedia_type
(e.g.,application/pdf
,text/csv
,application/msword
) in thesource
object. This is essential for proper processing and should correspond to one of the Supported File Types. - Respect Limits: Do not exceed the 5-document limit per message or the 4.5 MB file size limit.
- Choose the Right Source Type:
- Use
source.type: "url"
withsource.url
for publicly accessible documents. - Use
source.type: "s3_path"
withsource.path
for documents in S3 (once feature is available). - Use
source.type: "base64"
withsource.data
for smaller files or when direct content embedding is necessary. Ensure proper base64 encoding.
- Use
- Clear Prompts: Accompany documents with clear text prompts instructing the agent on what to do with them.
- Supported Formats: Ensure your documents are in one of the supported file formats and the
media_type
accurately reflects this.
Best Practices
-
System Prompts
- Be specific about the agent's role
- Define clear boundaries
- Include error handling instructions
-
Tool Selection
- Choose tools that complement each other
- Provide clear tool descriptions
- Limit tools to necessary ones only
-
Error Handling
- Include retry logic
- Provide fallback options
- Handle tool failures gracefully
Best Practices for Structured Output
-
Schema Design
- Keep schemas focused and specific
- Use clear property names
- Include descriptions for complex fields
- Mark required fields appropriately
-
Data Validation
- Use appropriate data types
- Include format specifications when needed
- Consider adding enum values for restricted choices
-
Error Handling
- Provide fallback values for optional fields
- Handle missing or invalid data gracefully
- Include validation messages in the schema
Limitations
- Tool execution timeout is fixed
- Limited to predefined tools (multiply, semantic_search, mcp, structured_output)
Future Enhancements
- Additional tool types planned
- Enhanced memory capabilities
- Dynamic tool loading