Skip to main content

Tool Agent

What is an Agent?

An agent is an AI system that can perceive its environment, make decisions, and take actions to achieve specific goals. In our context, agents are specialized LLM-powered systems that can:

  • Understand natural language inputs
  • Plan sequences of actions
  • Use tools to accomplish tasks
  • Maintain context through conversations
  • Generate coherent responses

Tool Agent Architecture

Tool Agents use a ReAct (Reasoning + Acting) pattern combined with function calling capabilities from LLM providers. This enables them to:

  1. Reason about what tools to use
  2. Execute tools with appropriate parameters
  3. Observe results
  4. Plan next steps

Agent Workflow

Configuration Parameters

Essential Parameters

ParameterDescriptionRequiredExample
llm_model_idBedrock model identifierYesanthropic.claude-3-haiku-20240307-v1:0
system_promptInstructions for the agentYes"You are a helpful assistant..."
toolsArray of available toolsYesSee tools configuration
inference_configLLM parametersNo{"max_tokens": 4000}

Tool Configuration

Each tool requires:

{
"tool_type": "function|rag|mcp",
"name": "tool_name",
"description": "What the tool does",
"func_name": "function_identifier"
}

Example Configurations

Basic Example

{
"llm_model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"system_prompt": "You are a helpful assistant that uses available tools to answer questions.",
"tools": [
{
"tool_type": "function",
"name": "multiply",
"description": "Multiplies two numbers",
"func_name": "multiply_numbers"
}
],
"inference_config": {
"max_tokens": 4000,
"temperature": 0.7
}
}
Example Usage

This basic configuration enables the agent to perform multiplication operations. More complex configurations can be created by adding additional tools and customizing parameters.

:::

Structured Output Tools

Experimental Feature

Structured Output on Tool Agent is currently an experimental feature. The API and functionality may change in future releases.

Tool Agents can be configured to output structured data in JSON format using the structured_output tool type. This is useful when you need to extract specific information in a consistent format or transform unstructured text into structured data.

Configuration Examples

  1. Person Information

    {
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts structured information about a person from text",
    "output_schema": {
    "type": "object",
    "properties": {
    "name": {
    "type": "string",
    "description": "Full name of the person"
    },
    "age": {
    "type": "integer",
    "description": "Age of the person"
    },
    "occupation": {
    "type": "string",
    "description": "Person's job or profession"
    }
    },
    "required": ["name"]
    }
    }

    Sample Output

    // Input: "John Doe is a 35-year-old software engineer"
    // Output:
    {
    "name": "John Doe",
    "age": 35,
    "occupation": "software engineer"
    }
  2. Product Information

    {
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts product information from text",
    "output_schema": {
    "type": "object",
    "properties": {
    "product_name": { "type": "string" },
    "price": { "type": "number" },
    "currency": { "type": "string" },
    "in_stock": { "type": "boolean" }
    },
    "required": ["product_name", "price"]
    }
    }
  3. Event Detail with nested schemas

    {
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts event information from text",
    "output_schema": {
    "type": "object",
    "properties": {
    "event_name": { "type": "string" },
    "date": { "type": "string", "format": "date" },
    "location": { "type": "string" },
    "attendees": {
    "type": "array",
    "items": { "type": "string" }
    }
    },
    "required": ["event_name", "date"]
    }
    }

Working with Images

Tool Agents support processing images alongside text inputs. Images are provided within the content array. Each image is represented by an item with type: "image", containing a source object that specifies the origin (URL, S3 path, or base64), media_type, and the actual image data or reference.

Images can be provided in three primary ways:

  1. URL-based images: Provide a direct HTTPS URL to the image. The media_type field indicates the image format.

    {
    "content": [
    {
    "type": "text",
    "text": "What's in this image?"
    },
    {
    "type": "image",
    "source": {
    "type": "url",
    "url": "https://example.com/image.jpg",
    "media_type": "image/jpeg" // Supported: image/jpeg, image/png, image/tiff, etc.
    }
    }
    ],
    "role": "user"
    }

    Note: For TIFF images (media_type: "image/tiff"), the URL must end with a .tiff or .tif extension.

  2. S3 path images: Reference an image stored in an S3 bucket. Specify the S3 path and the media_type.

    {
    "content": [
    {
    "type": "text",
    "text": "Analyze this image from S3"
    },
    {
    "type": "image",
    "source": {
    "type": "s3_path",
    "path": "s3://bucket-name/path/to/image.png",
    "media_type": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
    }
    }
    ],
    "role": "user"
    }

    Note: For TIFF images (media_type: "image/tiff"), the S3 path must point to a file with a .tiff or .tif extension.

  3. Base64-encoded images: Embed the image data directly as a base64 encoded string. The media_type is crucial here.

    {
    "content": [
    {
    "type": "text",
    "text": "Describe this embedded image"
    },
    {
    "type": "image",
    "source": {
    "type": "base64",
    "data": "<base64-encoded-image-data>",
    "media_type": "image/png" // Supported: image/jpeg, image/png, image/tiff, etc.
    }
    }
    ],
    "role": "user"
    }

    Note: The media_type field is mandatory and crucial for correctly interpreting the base64 data. For image/tiff, the image data itself is validated (e.g., for correct TIFF format based on its magic number).

Best Practices for Images
  • Specify media_type: Always include the correct media_type (e.g., image/jpeg, image/png, image/tiff) in the source object. This is essential for proper processing.
  • Supported Formats: While many standard image formats can be indicated via media_type, ensure your specific use case is tested. JPG, PNG, and TIFF are explicitly supported and validated.
  • TIFF Specifics:
    • For URL or S3 sources of TIFF images, ensure the file path ends with .tiff or .tif.
    • For base64 encoded TIFF images, the data integrity (magic number) is checked.
  • Image Source: Clearly define the image source with its type (url, s3_path, base64) and the corresponding data field (url, path, data).
  • Size and Prompts: Keep image sizes reasonable (refer to any documented limits) to avoid timeouts and include clear text prompts to guide the agent's analysis.
  • Costs: Be mindful of potential costs and rate limits when processing multiple or large images.

Working with Documents

Tool Agents can process various types of documents as part of their input. This allows you to provide rich context to the agent for tasks like summarization, question answering based on specific texts, or data extraction. Documents are included in the content array of a message, similar to images. Each document is represented by an item with type: "document", containing a source object that specifies the origin (URL, S3 path, or base64), media_type, and the actual document data or reference. The document_title is an optional top-level field that can help the agent identify specific documents.

Supported File Types: PDF, CSV, DOC, DOCX, XLS, XLSX, HTML, TXT, MD. The media_type field in the source object should correspond to the actual format of the document (e.g., application/pdf for PDF, text/csv for CSV). Maximum File Size: 4.5 MB Document Limit: Up to 5 documents can be included per message.

Documents can be provided in three primary ways:

  1. URL-based documents: Provide a direct HTTPS URL to the document. The media_type field indicates the document format.

    {
    "content": [
    {
    "type": "text",
    "text": "Summarize the attached report."
    },
    {
    "type": "document",
    "source": {
    "type": "url",
    "url": "https://example.com/annual_report.pdf",
    "media_type": "application/pdf" // E.g., application/pdf, text/csv, etc.
    },
    "document_title": "Annual Report 2023" // Optional
    }
    ],
    "role": "user"
    }
  2. Base64-encoded documents: Embed the document data directly as a base64 encoded string. The media_type is crucial here.

    {
    "content": [
    {
    "type": "text",
    "text": "What are the main points in this text file?"
    },
    {
    "type": "document",
    "source": {
    "type": "base64",
    "data": "<base64-encoded-document-content>",
    "media_type": "application/pdf" // E.g., text/plain, application/msword, etc.
    },
    "document_title": "Meeting Notes" // Optional
    }
    ],
    "role": "user"
    }

    Note: The media_type field is mandatory and crucial for correctly interpreting the base64 data.

  3. S3 path documents: Reference a document stored in an S3 bucket. Specify the S3 path and the media_type.

    {
    "content": [
    {
    "type": "text",
    "text": "Analyze this document from S3"
    },
    {
    "type": "document",
    "source": {
    "type": "s3_path",
    "path": "s3://bucket-name/path/to/document.xlsx",
    "media_type": "text/markdown" // E.g., application/vnd.ms-excel, etc.
    },
    "document_title": "Sales Data Q1" // Optional
    }
    ],
    "role": "user"
    }

    Note: This feature is soon to be added. The media_type field is mandatory and crucial for correctly interpreting the document.

The document_title field is optional but recommended, as it can help the agent identify and refer to specific documents, especially when multiple documents are provided. This field remains at the top level of the document object.

Best Practices for Documents
  • Use document_title: Provide a descriptive title for each document. This field is at the top-level of the document object.
  • Specify media_type: Always include the correct media_type (e.g., application/pdf, text/csv, application/msword) in the source object. This is essential for proper processing and should correspond to one of the Supported File Types.
  • Respect Limits: Do not exceed the 5-document limit per message or the 4.5 MB file size limit.
  • Choose the Right Source Type:
    • Use source.type: "url" with source.url for publicly accessible documents.
    • Use source.type: "s3_path" with source.path for documents in S3 (once feature is available).
    • Use source.type: "base64" with source.data for smaller files or when direct content embedding is necessary. Ensure proper base64 encoding.
  • Clear Prompts: Accompany documents with clear text prompts instructing the agent on what to do with them.
  • Supported Formats: Ensure your documents are in one of the supported file formats and the media_type accurately reflects this.

Best Practices

  1. System Prompts

    • Be specific about the agent's role
    • Define clear boundaries
    • Include error handling instructions
  2. Tool Selection

    • Choose tools that complement each other
    • Provide clear tool descriptions
    • Limit tools to necessary ones only
  3. Error Handling

    • Include retry logic
    • Provide fallback options
    • Handle tool failures gracefully

Best Practices for Structured Output

  1. Schema Design

    • Keep schemas focused and specific
    • Use clear property names
    • Include descriptions for complex fields
    • Mark required fields appropriately
  2. Data Validation

    • Use appropriate data types
    • Include format specifications when needed
    • Consider adding enum values for restricted choices
  3. Error Handling

    • Provide fallback values for optional fields
    • Handle missing or invalid data gracefully
    • Include validation messages in the schema

Limitations

  • Tool execution timeout is fixed
  • Limited to predefined tools (multiply, semantic_search, mcp, structured_output)

Future Enhancements

  • Additional tool types planned
  • Enhanced memory capabilities
  • Dynamic tool loading