Skip to main content

Understanding Context Enrichment Actions

This document provides an overview of the various Context Enrichment actions available in the Knowledge Enrichment and explains how each works with examples.

Introduction

The Context Enrichment API provides various actions to extract insights, generate metadata, and create vector representations from content. These actions can be applied to both images and text documents depending on the content type.

Image Actions

Image Description

The Image Description action analyzes an image and generates a textual description of its contents.

How it works:

  • The API uses AI models to identify objects, scenes, and activities in the image
  • It synthesizes these elements into a coherent description
  • The result is returned as a natural language text string

Example:

"A blue Honda CR-V SUV with visible damage to the front bumper parked in a driveway."

Image Classification

Image Classification categorizes an image into one or more predefined classes.

How it works:

  • You provide at least two classification classes (categories)
  • The AI model analyzes the image content and determines the best matching class
  • The result is the name of the matching classification

Example: If you provide classes like "damaged_vehicle", "undamaged_vehicle", and "not_a_vehicle", the API might return:

"damaged_vehicle"

Image Embeddings

Image Embeddings convert visual information into a high-dimensional vector representation.

How it works:

  • The image is processed through an AI model designed to extract visual features
  • The network converts these features into a dense vector (typically 512-1024 dimensions)
  • These vectors place visually similar images closer together in the vector space
  • The result is an array of floating-point numbers

Example:

[0.021, -0.065, 0.127, 0.036, -0.198, ... ]

These embeddings can be used for:

  • Finding visually similar images
  • Building image search systems
  • Clustering similar images together

Named Entity Recognition (Image)

This action identifies specific entities visible in images, such as people, organizations, locations, etc.

How it works:

  • The model analyzes the image to detect text and visual entities
  • It categorizes detected entities into predefined types
  • The result is a structured object containing entity types and values

Example:

{
"organization": ["Honda", "CR-V"],
"person": ["None"],
"location": ["driveway"],
"object": ["car", "bumper"]
}

Image Metadata Generation

This action creates structured metadata about the image based on its contents and any provided example metadata.

How it works:

  • You can provide example metadata structure to guide the generation
  • The model analyzes the image and extracts relevant information
  • It structures the information following your metadata templates
  • The result is a structured JSON object containing the metadata

Example:

{
"car_metadata": {
"manufacturer": "Honda",
"model": "CR-V",
"color": "blue",
"damage_identified": {
"car_part": "bumper",
"damage_type": "cracked",
"damage_severity": "mild"
}
}
}

Text Actions

Text Summarization

Text Summarization condenses long documents into brief summaries that capture key information.

How it works:

  • The model analyzes the document to identify main topics and important information
  • It generates a concise summary highlighting the essential points
  • The result is a natural language summary text

Example:

"This policy document outlines Hyland's employee conduct guidelines, including confidentiality requirements, acceptable use of company resources, and disciplinary procedures for violations."

Text Classification

Similar to image classification, this action categorizes text into predefined classes.

How it works:

  • You provide at least two classification classes
  • The model analyzes the text content and determines the best matching class
  • The result is the name of the matching classification

Example: If you provide classes like "policy_document", "technical_manual", and "marketing_material", the API might return:

"policy_document"

Text Embeddings

Text Embeddings convert text into numerical vector representations that capture semantic meaning.

How it works:

  • The text is processed through language models that understand context and meaning
  • The model generates vectors where semantically similar texts are closer together
  • The result is an array of floating-point numbers (typically 768-1536 dimensions)

Example:

[0.041, 0.082, -0.153, 0.027, 0.194, ... ]

Text embeddings enable:

  • Semantic search capabilities
  • Document similarity comparison
  • Content recommendation systems
  • Clustering similar documents

Named Entity Recognition (Text)

This action identifies and categorizes named entities mentioned in text documents.

How it works:

  • The model processes the text to identify entities like people, organizations, locations
  • It categorizes each entity into predefined types
  • The result is a structured object containing entity types and values

Example:

{
"person": ["John Smith", "Jane Doe"],
"organization": ["Hyland Software", "HR Department"],
"date": ["2023-06-15", "January 1, 2024"],
"location": ["Westlake, OH"]
}

Text Metadata Generation

This action creates structured metadata for text documents, particularly useful for PDFs and long-form content.

How it works:

  • You can provide example metadata structures to guide generation
  • The model analyzes the document content to extract relevant information
  • It organizes the information according to your metadata template
  • The result is a structured JSON object following specific schema conventions

Example:

{
"document:title": "2024 Employee Handbook",
"document:date": "2024-01-15",
"document:type": "policy",
"document:category": "Human Resources",
"entity:company": "Hyland Software, Inc.",
"entity:organization": "HR Department",
"keywords:tags": "policy|procedures|guidelines|employees",
"summary:text": "This handbook outlines company policies and procedures for all employees."
}

Working with Embeddings

Embeddings are particularly powerful for building intelligent applications:

What are embeddings?

  • Vector representations that encode semantic meaning of content
  • High-dimensional arrays of floating-point numbers
  • Similar content has similar vector representations

How to use embeddings effectively:

  1. Vector storage: Store embeddings in vector databases like Pinecone, Weaviate, or FAISS
  2. Similarity search: Find related content by calculating vector similarity (cosine similarity)
  3. Clustering: Group similar content by clustering vectors
  4. Recommendation systems: Recommend similar content based on vector proximity

Example application flow:

  1. Generate embeddings for your content library using the API
  2. Store both content and embeddings in your database
  3. When a user searches or views content, find similar items by comparing embeddings
  4. Display semantically related content to enhance user experience

By combining these different enrichment actions, you can build sophisticated knowledge management systems that understand content at a deeper level than traditional keyword-based approaches.

Working with Text Summarization

Text summarization is a powerful tool for managing large volumes of content:

Why use text summarization?

  • Quickly extract key information from lengthy documents
  • Improve content discovery and browsing experiences
  • Create document previews for users
  • Generate metadata for document indexing

Best practices for summarization:

  1. Consider document length: Longer documents often benefit from more detailed summaries
  2. Use in combination with classification: Classify documents first to provide context for better summaries
  3. Validate summary quality: Periodically review summaries to ensure they capture key information
  4. Consider multi-stage summarization: For very large documents, you may want to summarize sections first, then combine them

Example application flow:

  1. Upload a collection of large PDF documents to your system
  2. For each document, generate a summary using the Text Summarization API
  3. Store the summaries alongside the documents
  4. Present the summaries in your UI to help users quickly identify relevant content
  5. Only load the full document when a user decides to engage with it

Working with Named Entity Recognition

Named Entity Recognition (NER) helps extract structured information from unstructured content:

How to leverage NER effectively:

  • Build advanced search capabilities by entity type
  • Create entity-based navigation systems
  • Generate metadata automatically
  • Identify relationships between entities

Common entity types:

  1. People: Names of individuals
  2. Organizations: Companies, agencies, institutions
  3. Locations: Countries, cities, addresses
  4. Dates/Times: Temporal references
  5. Product names: Names of products or services
  6. Monetary values: Currency amounts
  7. Percentages: Numerical percentages

Example application flow:

  1. Process a document collection with the NER API
  2. Index all extracted entities in a searchable database
  3. Create entity graphs showing relationships between documents
  4. Enable faceted search by entity type
  5. Highlight entities within document viewers

Working with Classification

Classification helps organize and structure content collections:

Strategic uses of classification:

  • Automatically route documents to appropriate departments or workflows
  • Create taxonomies for better content organization
  • Filter content by category for improved findability
  • Identify outliers or misclassified content

Tips for effective classification:

  1. Define clear categories: Ensure classes are mutually exclusive and collectively exhaustive
  2. Use hierarchical classifications: Start broad and get more specific
  3. Include "Other" category: Allow for content that doesn't fit established categories
  4. Consider multi-label classification: Some content may belong to multiple categories
  5. Periodically review and refine: Classification schemes should evolve as content changes

Example application flow:

  1. Define a set of business-relevant categories
  2. Use the Classification API to automatically categorize incoming documents
  3. Apply category tags to content for filtering and organization
  4. Monitor classification accuracy and refine categories as needed

Working with Metadata Generation

Metadata Generation creates structured information that powers intelligent content systems:

Benefits of automated metadata:

  • Consistent metadata application across large content collections
  • Discovery of hidden attributes and relationships
  • Enhanced search and filtering capabilities
  • Reduced manual tagging requirements

Metadata strategy best practices:

  1. Define a metadata schema: Create a consistent structure for your metadata
  2. Provide examples: Use example metadata to guide generation
  3. Combine with other enrichments: Use classification, NER, and summarization to inform metadata
  4. Validate and enhance: Use AI-generated metadata as a starting point, then refine

Example application flow:

  1. Define a metadata schema relevant to your business needs
  2. Process content through the Metadata Generation API with example templates
  3. Store structured metadata alongside content
  4. Use metadata fields to power search, filtering, and recommendations
  5. Allow for human review and enhancement of generated metadata

Combining Context Enrichment Actions

The true power of Context Enrichment comes from combining multiple actions:

Powerful action combinations:

  1. Classification + Summarization: Customize summaries based on content type
  2. NER + Metadata Generation: Use identified entities to populate metadata fields
  3. Embeddings + Classification: Build specialized vector spaces for different content categories
  4. Image Description + Text Summarization: Create unified representations of multimedia content

Example integrated workflow:

  1. Content enters your system
  2. Classification determines content type and processing path
  3. NER extracts structured entities
  4. Metadata Generation creates a rich metadata profile
  5. Summarization generates a concise description
  6. Embeddings enable similarity-based retrieval
  7. All enrichments are stored alongside the original content
  8. User interfaces leverage this enriched context for improved experiences

By strategically combining these enrichment actions, you can create content systems that truly understand the meaning and context of your information assets.

Building AI Agents with Context Enrichment

Context enrichment actions provide the foundational capabilities needed to develop sophisticated AI agents:

What is an AI Agent?

  • An autonomous system that perceives its environment, makes decisions, and takes actions
  • Uses AI models to understand context and generate appropriate responses
  • Capable of reasoning, planning, and adapting to new situations
  • Can interact with both human users and other systems

How Context Enrichment Powers AI Agents:

  • Enhanced Perception: Image and text understanding capabilities help agents perceive their environment
  • Knowledge Extraction: NER and metadata generation help agents extract structured knowledge
  • Memory Systems: Embeddings enable efficient storage and retrieval of information
  • Reasoning Frameworks: Classification and summarization support decision-making processes

Key Agent Capabilities Enabled by Context Enrichment:

  1. Document Understanding

    • Agents can process and understand complex documents using text enrichment actions
    • Extracted entities and metadata become part of the agent's knowledge base
    • Summaries allow agents to quickly grasp document content
  2. Visual Processing

    • Image description provides agents with "vision" capabilities
    • Image classification helps agents categorize visual information
    • Image-based NER extracts text and other entities from visual content
  3. Knowledge Representation

    • Embeddings create a semantic space for the agent's knowledge
    • Classification provides taxonomic structure to information
    • Metadata generation creates structured knowledge representations
  4. Contextual Memory

    • Embeddings enable similarity-based memory retrieval
    • Classification helps segment memory into relevant domains
    • Metadata provides structured memory indexing

Building an Agent: Example Architecture

  1. Input Processing Layer

    • Uses image and text enrichment actions to understand user inputs
    • Generates embeddings of user queries for context matching
  2. Knowledge Base

    • Stores enriched content from documents and images
    • Organizes information using classifications and metadata
    • Uses embedding vectors for similarity-based retrieval
  3. Reasoning Engine

    • Combines retrieved knowledge with user context
    • Uses classification to determine appropriate response strategies
    • Leverages summary generation for concise outputs
  4. Action Generation

    • Produces responses based on processed information
    • Maintains context through persistent memory of interactions
    • Continuously learns from new interactions

Implementation Example: Document Assistant Agent

  1. User uploads a complex contract document

  2. Agent processes the document using:

    • Text Classification to identify document type
    • NER to extract parties, dates, and key terms
    • Metadata Generation to create structured representation
    • Summarization to create an overview
    • Embeddings to enable semantic search
  3. User asks questions about the contract

  4. Agent:

    • Converts question to embeddings
    • Retrieves relevant sections using similarity search
    • Uses extracted metadata to provide structured answers
    • Generates summaries of complex clauses when needed
    • Maintains conversation context through session embeddings

By combining context enrichment actions with agent architectures, you can create AI systems that not only understand content but can also reason about it and take appropriate actions based on user needs.