Skip to main content

Tool Agent

What is an Agent?

An agent is an AI system that can perceive its environment, make decisions, and take actions to achieve specific goals. In our context, agents are specialized LLM-powered systems that can:

  • Understand natural language inputs
  • Plan sequences of actions
  • Use tools to accomplish tasks
  • Maintain context through conversations
  • Generate coherent responses

Tool Agent Architecture

Tool Agents use a ReAct (Reasoning + Acting) pattern combined with function calling capabilities from LLM providers. This enables them to:

  1. Reason about what tools to use
  2. Execute tools with appropriate parameters
  3. Observe results
  4. Plan next steps

Agent Workflow

Configuration Parameters

Essential Parameters

ParameterDescriptionRequiredExample
llm_model_idBedrock model identifierYesanthropic.claude-3-haiku-20240307-v1:0
system_promptInstructions for the agentYes"You are a helpful assistant..."
toolsArray of available toolsYesSee tools configuration
inference_configLLM parametersNo{"max_tokens": 4000}

Tool Configuration

Each tool requires:

{
"tool_type": "function|rag|mcp",
"name": "tool_name",
"description": "What the tool does",
"func_name": "function_identifier"
}

Example Configurations

Basic Example

{
"llm_model_id": "anthropic.claude-3-haiku-20240307-v1:0",
"system_prompt": "You are a helpful assistant that uses available tools to answer questions.",
"tools": [
{
"tool_type": "function",
"name": "multiply",
"description": "Multiplies two numbers",
"func_name": "multiply_numbers"
}
],
"inference_config": {
"max_tokens": 4000,
"temperature": 0.7
}
}
Example Usage

This basic configuration enables the agent to perform multiplication operations. More complex configurations can be created by adding additional tools and customizing parameters.

:::

Structured Output Tools

Experimental Feature

Structured Output on Tool Agent is currently an experimental feature. The API and functionality may change in future releases.

Tool Agents can be configured to output structured data in JSON format using the structured_output tool type. This is useful when you need to extract specific information in a consistent format or transform unstructured text into structured data.

Configuration Example

{
"tool_type": "structured_output",
"name": "structured_output",
"description": "Extracts structured information about a person from text",
"output_schema": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Full name of the person"
},
"age": {
"type": "integer",
"description": "Age of the person"
},
"occupation": {
"type": "string",
"description": "Person's job or profession"
}
},
"required": ["name"]
}
}

Sample Output

// Input: "John Doe is a 35-year-old software engineer"
// Output:
{
"name": "John Doe",
"age": 35,
"occupation": "software engineer"
}

Usage Examples

  1. Sample- Product Information

    {
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts product information from text",
    "output_schema": {
    "type": "object",
    "properties": {
    "product_name": { "type": "string" },
    "price": { "type": "number" },
    "currency": { "type": "string" },
    "in_stock": { "type": "boolean" }
    },
    "required": ["product_name", "price"]
    }
    }
  2. Sample- Event Detail with nested schemas

    {
    "tool_type": "structured_output",
    "name": "structured_output",
    "description": "Extracts event information from text",
    "output_schema": {
    "type": "object",
    "properties": {
    "event_name": { "type": "string" },
    "date": { "type": "string", "format": "date" },
    "location": { "type": "string" },
    "attendees": {
    "type": "array",
    "items": { "type": "string" }
    }
    },
    "required": ["event_name", "date"]
    }
    }

Working with Images

Tool Agents support processing images alongside text inputs. Images can be provided in three ways:

  1. URL-based images:
{
"content": [
{
"type": "input_text",
"text": "What's in this image?"
},
{
"type": "input_image",
"image_url": "https://example.com/image.jpg"
}
],
"role": "user"
}
  1. S3 file path (Experimental):
{
"content": [
{
"type": "input_text",
"text": "Analyze this image"
},
{
"type": "input_image",
"image_path": "s3://bucket-name/path/to/image.jpg"
}
],
"role": "user"
}
  1. Base64-encoded images:
{
"content": [
{
"type": "input_text",
"text": "Describe this image"
},
{
"type": "input_image",
"image_bytes": "<base64-encoded-image>"
}
],
"role": "user"
}
Best Practices
  • Ensure images are in common formats (JPG, PNG)
  • Keep image sizes reasonable to avoid timeouts
  • Include clear text prompts to guide the analysis
  • Consider rate limits and costs when processing multiple images

Best Practices

  1. System Prompts

    • Be specific about the agent's role
    • Define clear boundaries
    • Include error handling instructions
  2. Tool Selection

    • Choose tools that complement each other
    • Provide clear tool descriptions
    • Limit tools to necessary ones only
  3. Error Handling

    • Include retry logic
    • Provide fallback options
    • Handle tool failures gracefully

Best Practices for Structured Output

  1. Schema Design

    • Keep schemas focused and specific
    • Use clear property names
    • Include descriptions for complex fields
    • Mark required fields appropriately
  2. Data Validation

    • Use appropriate data types
    • Include format specifications when needed
    • Consider adding enum values for restricted choices
  3. Error Handling

    • Provide fallback values for optional fields
    • Handle missing or invalid data gracefully
    • Include validation messages in the schema

Limitations

  • Tool execution timeout is fixed
  • Limited to predefined tools (multiply, semantic_search, mcp, structured_output)

Future Enhancements

  • Additional tool types planned
  • Enhanced memory capabilities
  • Dynamic tool loading