Tool Agent

What is an Agent?

An agent is an AI system that can perceive its environment, make decisions, and take actions to achieve specific goals. In our context, agents are specialized LLM-powered systems that can:

Understand natural language inputs
Plan sequences of actions
Use tools to accomplish tasks
Maintain context through conversations
Generate coherent responses

Tool Agent Architecture

Tool Agents use a ReAct (Reasoning + Acting) pattern combined with function calling capabilities from LLM providers. This enables them to:

Reason about what tools to use
Execute tools with appropriate parameters
Observe results
Plan next steps

Agent Workflow

Configuration Parameters

Essential Parameters

Parameter	Description	Required	Example
`llm_model_id`	Bedrock model identifier	Yes	`anthropic.claude-3-haiku-20240307-v1:0`
`system_prompt`	Instructions for the agent	Yes	"You are a helpful assistant..."
`tools`	Array of available tools	Yes	See tools configuration
`inference_config`	LLM parameters	No	`{"max_tokens": 4000}`

Tool Configuration

Each tool requires:

{
    "tool_type": "function|rag",
    "name": "tool_name",
    "description": "What the tool does",
    "func_name": "function_identifier"
}

Max Retries

When invoking agents that error for any reason (E.g., input is too large, network issue, etc.), the client you're using to make the request could time out. To help with this, you could set max_retries in the inference_config to a low number (E.g., 2).

Example Configurations

Basic Example

{
    "llm_model_id": "anthropic.claude-3-haiku-20240307-v1:0",
    "system_prompt": "You are a helpful assistant that uses available tools to answer questions.",
    "tools": [
        {
            "tool_type": "function",
            "name": "multiply",
            "description": "Multiplies two numbers",
            "func_name": "multiply_numbers"
        }
    ],
    "inference_config": {
        "max_tokens": 4000,
        "temperature": 0.7
    }
}

Example Usage

This basic configuration enables the agent to perform multiplication operations. More complex configurations can be created by adding additional tools and customizing parameters.

:::

Structured Output Tools

Experimental Feature

Structured Output on Tool Agent is currently an experimental feature. The API and functionality may change in future releases.

Tool Agents can be configured to output structured data in JSON format using the structured_output tool type. This is useful when you need to extract specific information in a consistent format or transform unstructured text into structured data.

Configuration Examples

Person Information

{
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts structured information about a person from text",
    "output_schema": {
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "Full name of the person"
            },
            "age": {
                "type": "integer",
                "description": "Age of the person"
            },
            "occupation": {
                "type": "string",
                "description": "Person's job or profession"
            }
        },
        "required": ["name"]
    }
}

Sample Output

// Input: "John Doe is a 35-year-old software engineer"
// Output:
{
    "name": "John Doe",
    "age": 35,
    "occupation": "software engineer"
}

Product Information

{
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts product information from text",
    "output_schema": {
        "type": "object",
        "properties": {
            "product_name": { "type": "string" },
            "price": { "type": "number" },
            "currency": { "type": "string" },
            "in_stock": { "type": "boolean" }
        },
        "required": ["product_name", "price"]
    }
}

Event Detail with nested schemas

{
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts event information from text",
    "output_schema": {
        "type": "object",
        "properties": {
            "event_name": { "type": "string" },
            "date": { "type": "string", "format": "date" },
            "location": { "type": "string" },
            "attendees": {
                "type": "array",
                "items": { "type": "string" }
            }
        },
        "required": ["event_name", "date"]
    }
}

Working with Images

Tool Agents support processing images alongside text inputs. Images are provided within the content array. Each image is represented by an item with type: "image", containing a source object that specifies the origin (URL, S3 path, or base64), media_type, and the actual image data or reference.

Images can be provided in three primary ways:

URL-based images: Provide a direct HTTPS URL to the image. The media_type field indicates the image format.

{
    "content": [
        {
            "type": "text",
            "text": "What's in this image?"
        },
        {
            "type": "image",
            "source": {
                "type": "url",
                "url": "https://example.com/image.jpg",
                "media_type": "image/jpeg" // Supported: image/jpeg, image/png, image/tiff, etc.
            }
        }
    ],
    "role": "user"
}

Note: For TIFF images (media_type: "image/tiff"), the URL must end with a .tiff or .tif extension.

S3 path images: Reference an image stored in an S3 bucket. Specify the S3 path and the media_type.

{
    "content": [
        {
            "type": "text",
            "text": "Analyze this image from S3"
        },
        {
            "type": "image",
            "source": {
                "type": "s3_path",
                "path": "s3://bucket-name/path/to/image.png",
                "media_type": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
            }
        }
    ],
    "role": "user"
}

Note: For TIFF images (media_type: "image/tiff"), the S3 path must point to a file with a .tiff or .tif extension.

Base64-encoded images: Embed the image data directly as a base64 encoded string. The media_type is crucial here.

{
    "content": [
        {
            "type": "text",
            "text": "Describe this embedded image"
        },
        {
            "type": "image",
            "source": {
                "type": "base64",
                "data": "<base64-encoded-image-data>",
                "media_type": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
            }
        }
    ],
    "role": "user"
}

Note: The media_type field is mandatory and crucial for correctly interpreting the base64 data. For image/tiff, the image data itself is validated (e.g., for correct TIFF format based on its magic number).

Best Practices for Images

Specify media_type: Always include the correct media_type (e.g., image/jpeg, image/png, image/tiff) in the source object. This is essential for proper processing.
Supported Formats: While many standard image formats can be indicated via media_type, ensure your specific use case is tested. JPG, PNG, and TIFF are explicitly supported and validated.
TIFF Specifics:
- For URL or S3 sources of TIFF images, ensure the file path ends with .tiff or .tif.
- For base64 encoded TIFF images, the data integrity (magic number) is checked.
Image Source: Clearly define the image source with its type (url, s3_path, base64) and the corresponding data field (url, path, data).
Size and Prompts: Keep image sizes reasonable (refer to any documented limits) to avoid timeouts and include clear text prompts to guide the agent's analysis.
Costs: Be mindful of potential costs and rate limits when processing multiple or large images.

Working with Documents

Tool Agents can process various types of documents as part of their input. This allows you to provide rich context to the agent for tasks like summarization, question answering based on specific texts, or data extraction. Documents are included in the content array of a message, similar to images. Each document is represented by an item with type: "document", containing a source object that specifies the origin (URL, S3 path, or base64), media_type, and the actual document data or reference. The document_title is an optional top-level field that can help the agent identify specific documents.

Supported File Types: PDF, CSV, XLS, XLSX, HTML, TXT, MD. The media_type field in the source object should correspond to the actual format of the document (e.g., application/pdf for PDF, text/csv for CSV). Maximum File Size: 4.5 MB Document Limit: Up to 5 documents can be included per message.

Documents can be provided in three primary ways:

URL-based documents: Provide a direct HTTPS URL to the document. The media_type field indicates the document format.

{
    "content": [
        {
            "type": "text",
            "text": "Summarize the attached report."
        },
        {
            "type": "document",
            "source": {
                "type": "url",
                "url": "https://example.com/annual_report.pdf",
                "media_type": "application/pdf" // E.g., application/pdf, text/csv, etc.
            },
            "document_title": "Annual Report 2023" // Optional
        }
    ],
    "role": "user"
}

Base64-encoded documents: Embed the document data directly as a base64 encoded string. The media_type is crucial here.

{
    "content": [
        {
            "type": "text",
            "text": "What are the main points in this text file?"
        },
        {
            "type": "document",
            "source": {
                "type": "base64",
                "data": "<base64-encoded-document-content>",
                "media_type": "application/pdf" // E.g., text/plain, application/msword, etc.
            },
            "document_title": "Meeting Notes" // Optional
        }
    ],
    "role": "user"
}

Note: The media_type field is mandatory and crucial for correctly interpreting the base64 data.

S3 path documents: Reference a document stored in an S3 bucket. Specify the S3 path and the media_type.

{
    "content": [
        {
            "type": "text",
            "text": "Analyze this document from S3"
        },
        {
            "type": "document",
            "source": {
                "type": "s3_path",
                "path": "s3://bucket-name/path/to/document.xlsx",
                "media_type": "text/markdown" // E.g., application/vnd.ms-excel, etc.
            },
            "document_title": "Sales Data Q1" // Optional
        }
    ],
    "role": "user"
}

Note: This feature is soon to be added. The media_type field is mandatory and crucial for correctly interpreting the document.

The document_title field is optional but recommended, as it can help the agent identify and refer to specific documents, especially when multiple documents are provided. This field remains at the top level of the document object.

Best Practices for Documents

Use document_title: Provide a descriptive title for each document. This field is at the top-level of the document object.
Specify media_type: Always include the correct media_type (e.g., application/pdf, text/csv, application/msword) in the source object. This is essential for proper processing and should correspond to one of the Supported File Types.
Respect Limits: Do not exceed the 5-document limit per message or the 4.5 MB file size limit.
Choose the Right Source Type:
- Use source.type: "url" with source.url for publicly accessible documents.
- Use source.type: "s3_path" with source.path for documents in S3 (once feature is available).
- Use source.type: "base64" with source.data for smaller files or when direct content embedding is necessary. Ensure proper base64 encoding.
Clear Prompts: Accompany documents with clear text prompts instructing the agent on what to do with them.
Supported Formats: Ensure your documents are in one of the supported file formats and the media_type accurately reflects this.

Best Practices

System Prompts
- Be specific about the agent's role
- Define clear boundaries
- Include error handling instructions
Tool Selection
- Choose tools that complement each other
- Provide clear tool descriptions
- Limit tools to necessary ones only
Error Handling
- Include retry logic
- Provide fallback options
- Handle tool failures gracefully

Best Practices for Structured Output

Schema Design
- Keep schemas focused and specific
- Use clear property names
- Include descriptions for complex fields
- Mark required fields appropriately
Data Validation
- Use appropriate data types
- Include format specifications when needed
- Consider adding enum values for restricted choices
Error Handling
- Provide fallback values for optional fields
- Handle missing or invalid data gracefully
- Include validation messages in the schema

Limitations

Tool execution timeout is fixed
Limited to predefined tools (multiply, semantic_search, structured_output)

Streaming Support

Tool agents support streaming responses through the unified /v1/agents/{agent_id}/versions/{version_id-or-latest}/invoke-stream endpoint:

tip

When using streaming, responses come in chunks. Each chunk is a valid JSON object containing a portion of the complete response. The concatenation of all chunks forms the complete text response, which is not in JSON format.

tip

Due to the nature of streaming, returning data chunk by chunk, failures that occur during runtime are difficult to return to the client. Therefore, the returned status code might be a 200, but we do our best to return the error in the response. However, it's also difficult doing that, as silent failures could occur. Please check the logs for errors that occur.

Future Enhancements

Additional tool types planned
Enhanced memory capabilities
Dynamic tool loading

What is an Agent?​

Tool Agent Architecture​

Agent Workflow​

Configuration Parameters​

Essential Parameters​

Tool Configuration​

Max Retries​

Example Configurations​

Basic Example​

Structured Output Tools​

Configuration Examples​

Working with Images​

Working with Documents​

Best Practices​

Best Practices for Structured Output​

Limitations​

Streaming Support​

Future Enhancements​