Understanding Context Enrichment Actions

This document provides an overview of the various Context Enrichment actions available in the Knowledge Enrichment and explains how each works with examples.

Introduction

The Context Enrichment API provides various actions to extract insights, generate metadata, and create vector representations from content. These actions can be applied to both images and text documents depending on the content type.

Image Actions

Image Description

The Image Description action analyzes an image and generates a textual description of its contents.

How it works:

The API uses AI models to identify objects, scenes, and activities in the image
It synthesizes these elements into a coherent description
The result is returned as a natural language text string

Example:

"A blue Honda CR-V SUV with visible damage to the front bumper parked in a driveway."

Image Classification

Image Classification categorizes an image into one or more predefined classes.

How it works:

You provide at least two classification classes (categories)
The AI model analyzes the image content and determines the best matching class
The result is the name of the matching classification

Example: If you provide classes like "damaged_vehicle", "undamaged_vehicle", and "not_a_vehicle", the API might return:

"damaged_vehicle"

Image Embeddings

Image Embeddings convert visual information into a high-dimensional vector representation.

How it works:

The image is processed through an AI model designed to extract visual features
The network converts these features into a dense vector (typically 512-1024 dimensions)
These vectors place visually similar images closer together in the vector space
The result is an array of floating-point numbers

Example:

[0.021, -0.065, 0.127, 0.036, -0.198, ... ]

These embeddings can be used for:

Finding visually similar images
Building image search systems
Clustering similar images together

Named Entity Recognition (Image)

This action identifies specific entities visible in images, such as people, organizations, locations, etc.

How it works:

The model analyzes the image to detect text and visual entities
It categorizes detected entities into predefined types
The result is a structured object containing entity types and values

Example:

{
  "organization": ["Honda", "CR-V"],
  "person": ["None"],
  "location": ["driveway"],
  "object": ["car", "bumper"]
}

Image Metadata Generation

This action creates structured metadata about the image based on its contents and any provided example metadata.

How it works:

You can provide example metadata structure to guide the generation
The model analyzes the image and extracts relevant information
It structures the information following your metadata templates
The result is a structured JSON object containing the metadata

Example:

{
  "car_metadata": {
    "manufacturer": "Honda",
    "model": "CR-V",
    "color": "blue",
    "damage_identified": {
      "car_part": "bumper",
      "damage_type": "cracked",
      "damage_severity": "mild"
    }
  }
}

Text Actions

Text Summarization

Text Summarization condenses long documents into brief summaries that capture key information.

How it works:

The model analyzes the document to identify main topics and important information
It generates a concise summary highlighting the essential points
The result is a natural language summary text

Example:

"This policy document outlines Hyland's employee conduct guidelines, including confidentiality requirements, acceptable use of company resources, and disciplinary procedures for violations."

Text Classification

Similar to image classification, this action categorizes text into predefined classes.

How it works:

You provide at least two classification classes
The model analyzes the text content and determines the best matching class
The result is the name of the matching classification

Example: If you provide classes like "policy_document", "technical_manual", and "marketing_material", the API might return:

"policy_document"

Text Embeddings

Text Embeddings convert text into numerical vector representations that capture semantic meaning.

How it works:

The text is processed through language models that understand context and meaning
The model generates vectors where semantically similar texts are closer together
The result is an array of floating-point numbers (typically 768-1536 dimensions)

Example:

[0.041, 0.082, -0.153, 0.027, 0.194, ... ]

Text embeddings enable:

Semantic search capabilities
Document similarity comparison
Content recommendation systems
Clustering similar documents

Named Entity Recognition (Text)

This action identifies and categorizes named entities mentioned in text documents.

How it works:

The model processes the text to identify entities like people, organizations, locations
It categorizes each entity into predefined types
The result is a structured object containing entity types and values

Example:

{
  "person": ["John Smith", "Jane Doe"],
  "organization": ["Hyland Software", "HR Department"],
  "date": ["2023-06-15", "January 1, 2024"],
  "location": ["Westlake, OH"]
}

Text Metadata Generation

This action creates structured metadata for text documents, particularly useful for PDFs and long-form content.

How it works:

You can provide example metadata structures to guide generation
The model analyzes the document content to extract relevant information
It organizes the information according to your metadata template
The result is a structured JSON object following specific schema conventions

Example:

{
  "document:title": "2024 Employee Handbook",
  "document:date": "2024-01-15",
  "document:type": "policy",
  "document:category": "Human Resources",
  "entity:company": "Hyland Software, Inc.",
  "entity:organization": "HR Department",
  "keywords:tags": "policy|procedures|guidelines|employees",
  "summary:text": "This handbook outlines company policies and procedures for all employees."
}

Working with Embeddings

Embeddings are particularly powerful for building intelligent applications:

What are embeddings?

Vector representations that encode semantic meaning of content
High-dimensional arrays of floating-point numbers
Similar content has similar vector representations

How to use embeddings effectively:

Vector storage: Store embeddings in vector databases like Pinecone, Weaviate, or FAISS
Similarity search: Find related content by calculating vector similarity (cosine similarity)
Clustering: Group similar content by clustering vectors
Recommendation systems: Recommend similar content based on vector proximity

Example application flow:

Generate embeddings for your content library using the API
Store both content and embeddings in your database
When a user searches or views content, find similar items by comparing embeddings
Display semantically related content to enhance user experience

By combining these different enrichment actions, you can build sophisticated knowledge management systems that understand content at a deeper level than traditional keyword-based approaches.

Working with Text Summarization

Text summarization is a powerful tool for managing large volumes of content:

Why use text summarization?

Quickly extract key information from lengthy documents
Improve content discovery and browsing experiences
Create document previews for users
Generate metadata for document indexing

Best practices for summarization:

Consider document length: Longer documents often benefit from more detailed summaries
Use in combination with classification: Classify documents first to provide context for better summaries
Validate summary quality: Periodically review summaries to ensure they capture key information
Consider multi-stage summarization: For very large documents, you may want to summarize sections first, then combine them

Example application flow:

Upload a collection of large PDF documents to your system
For each document, generate a summary using the Text Summarization API
Store the summaries alongside the documents
Present the summaries in your UI to help users quickly identify relevant content
Only load the full document when a user decides to engage with it

Working with Named Entity Recognition

Named Entity Recognition (NER) helps extract structured information from unstructured content:

How to leverage NER effectively:

Build advanced search capabilities by entity type
Create entity-based navigation systems
Generate metadata automatically
Identify relationships between entities

Common entity types:

People: Names of individuals
Organizations: Companies, agencies, institutions
Locations: Countries, cities, addresses
Dates/Times: Temporal references
Product names: Names of products or services
Monetary values: Currency amounts
Percentages: Numerical percentages

Example application flow:

Process a document collection with the NER API
Index all extracted entities in a searchable database
Create entity graphs showing relationships between documents
Enable faceted search by entity type
Highlight entities within document viewers

Working with Classification

Classification helps organize and structure content collections:

Strategic uses of classification:

Automatically route documents to appropriate departments or workflows
Create taxonomies for better content organization
Filter content by category for improved findability
Identify outliers or misclassified content

Tips for effective classification:

Define clear categories: Ensure classes are mutually exclusive and collectively exhaustive
Use hierarchical classifications: Start broad and get more specific
Include "Other" category: Allow for content that doesn't fit established categories
Consider multi-label classification: Some content may belong to multiple categories
Periodically review and refine: Classification schemes should evolve as content changes

Example application flow:

Define a set of business-relevant categories
Use the Classification API to automatically categorize incoming documents
Apply category tags to content for filtering and organization
Monitor classification accuracy and refine categories as needed

Working with Metadata Generation

Metadata Generation creates structured information that powers intelligent content systems:

Benefits of automated metadata:

Consistent metadata application across large content collections
Discovery of hidden attributes and relationships
Enhanced search and filtering capabilities
Reduced manual tagging requirements

Metadata strategy best practices:

Define a metadata schema: Create a consistent structure for your metadata
Provide examples: Use example metadata to guide generation
Combine with other enrichments: Use classification, NER, and summarization to inform metadata
Validate and enhance: Use AI-generated metadata as a starting point, then refine

Example application flow:

Define a metadata schema relevant to your business needs
Process content through the Metadata Generation API with example templates
Store structured metadata alongside content
Use metadata fields to power search, filtering, and recommendations
Allow for human review and enhancement of generated metadata

Combining Context Enrichment Actions

The true power of Context Enrichment comes from combining multiple actions:

Powerful action combinations:

Classification + Summarization: Customize summaries based on content type
NER + Metadata Generation: Use identified entities to populate metadata fields
Embeddings + Classification: Build specialized vector spaces for different content categories
Image Description + Text Summarization: Create unified representations of multimedia content

Example integrated workflow:

Content enters your system
Classification determines content type and processing path
NER extracts structured entities
Metadata Generation creates a rich metadata profile
Summarization generates a concise description
Embeddings enable similarity-based retrieval
All enrichments are stored alongside the original content
User interfaces leverage this enriched context for improved experiences

By strategically combining these enrichment actions, you can create content systems that truly understand the meaning and context of your information assets.

Building AI Agents with Context Enrichment

Context enrichment actions provide the foundational capabilities needed to develop sophisticated AI agents:

What is an AI Agent?

An autonomous system that perceives its environment, makes decisions, and takes actions
Uses AI models to understand context and generate appropriate responses
Capable of reasoning, planning, and adapting to new situations
Can interact with both human users and other systems

How Context Enrichment Powers AI Agents:

Enhanced Perception: Image and text understanding capabilities help agents perceive their environment
Knowledge Extraction: NER and metadata generation help agents extract structured knowledge
Memory Systems: Embeddings enable efficient storage and retrieval of information
Reasoning Frameworks: Classification and summarization support decision-making processes

Key Agent Capabilities Enabled by Context Enrichment:

Document Understanding
- Agents can process and understand complex documents using text enrichment actions
- Extracted entities and metadata become part of the agent's knowledge base
- Summaries allow agents to quickly grasp document content
Visual Processing
- Image description provides agents with "vision" capabilities
- Image classification helps agents categorize visual information
- Image-based NER extracts text and other entities from visual content
Knowledge Representation
- Embeddings create a semantic space for the agent's knowledge
- Classification provides taxonomic structure to information
- Metadata generation creates structured knowledge representations
Contextual Memory
- Embeddings enable similarity-based memory retrieval
- Classification helps segment memory into relevant domains
- Metadata provides structured memory indexing

Building an Agent: Example Architecture

Input Processing Layer
- Uses image and text enrichment actions to understand user inputs
- Generates embeddings of user queries for context matching
Knowledge Base
- Stores enriched content from documents and images
- Organizes information using classifications and metadata
- Uses embedding vectors for similarity-based retrieval
Reasoning Engine
- Combines retrieved knowledge with user context
- Uses classification to determine appropriate response strategies
- Leverages summary generation for concise outputs
Action Generation
- Produces responses based on processed information
- Maintains context through persistent memory of interactions
- Continuously learns from new interactions

Implementation Example: Document Assistant Agent

User uploads a complex contract document
Agent processes the document using:
- Text Classification to identify document type
- NER to extract parties, dates, and key terms
- Metadata Generation to create structured representation
- Summarization to create an overview
- Embeddings to enable semantic search
User asks questions about the contract
Agent:
- Converts question to embeddings
- Retrieves relevant sections using similarity search
- Uses extracted metadata to provide structured answers
- Generates summaries of complex clauses when needed
- Maintains conversation context through session embeddings

By combining context enrichment actions with agent architectures, you can create AI systems that not only understand content but can also reason about it and take appropriate actions based on user needs.

Introduction​

Image Actions​

Image Description​

Image Classification​

Image Embeddings​

Named Entity Recognition (Image)​

Image Metadata Generation​

Text Actions​

Text Summarization​

Text Classification​

Text Embeddings​

Named Entity Recognition (Text)​

Text Metadata Generation​

Working with Embeddings​

Working with Text Summarization​

Working with Named Entity Recognition​

Working with Classification​

Working with Metadata Generation​

Combining Context Enrichment Actions​

Building AI Agents with Context Enrichment​

Introduction

Image Actions

Image Description

Image Classification

Image Embeddings

Named Entity Recognition (Image)

Image Metadata Generation

Text Actions

Text Summarization

Text Classification

Text Embeddings

Named Entity Recognition (Text)

Text Metadata Generation

Working with Embeddings

Working with Text Summarization

Working with Named Entity Recognition

Working with Classification

Working with Metadata Generation

Combining Context Enrichment Actions

Building AI Agents with Context Enrichment