AI-Ready Content Starts Here - Introducing Knowledge Enrichment

July 7, 2025 · 8 min read

Nabih Metri

Product Manager

AI isn’t failing because models aren’t good enough. It’s failing because the data isn’t.

Today, we’re launching Knowledge Enrichment, a breakthrough API that transforms fragmented, unstructured enterprise content into deeply structured, semantically rich, AI-ready data. Powered by Hyland’s proven Document Filters and AI. It delivers what developers and architects have been missing, clean, contextual, vectorized output across 600+ file types, ready for direct integration into your AI stack.

If you're building AI Agents, retrieval-augmented generation (RAG) systems, or enterprise LLMs, stop what you’re doing. This changes everything.

Why AI Is Only As Good As Its Input

Large language models are powerful, but they work best when the content they see is accurate, complete, and grounded in the original source.

That’s not how most enterprise content looks. It’s buried in PDFs, spreadsheets, presentations, and proprietary formats. It lives across ECMs, network drives, and cloud repositories. And even when you extract it, traditional methods often strip away layout, context, and structure, leaving behind only raw text.

Some teams try to solve this by prompting LLMs to infer structure, meaning, or metadata from that text. But LLMs don’t actually know what was in the file, they’re guessing. Their outputs are based on pattern recognition and probability, not ground truth.

Knowledge Enrichment does the opposite. It pulls structure, semantics, and content directly from the file itself using deterministic extraction, not AI hallucination. Backed by Hyland Document Filters, it knows what was a table, what was a heading, what was a footer, and what content belonged together, with no inference required.

The result is structured outputs you can trust, and clean, contextual content your LLMs don’t have to guess about.

Under the Hood: How Knowledge Enrichment Transforms Content

Knowledge Enrichment is built on two core components:

1. Hyland Document Filters - The Content Intelligence Foundation

At the foundation of Knowledge Enrichment is Hyland Document Filters, the same high-performance SDK trusted by leading security tools, compliance platforms, and search products. It processes over 600 file formats across documents, emails, CAD files, and more, with zero reliance on external services or cloud APIs.

Document Filters doesn’t just extract text. It:

Preserves layout and hierarchy (e.g., columns, tables, headers)
Maintains semantic zones (titles, footnotes, sidebars)
Identifies metadata, annotations, embedded objects, and more

This is the raw substrate Knowledge Enrichment builds on; a consistent, portable, and accurate foundation for AI structuring.

2. Contextual Structuring and Semantic Enrichment

On top of that, Knowledge Enrichment performs deep analysis and transforms content into structured outputs using AI where needed:

Capability	Description
Entity Extraction	Identifies named entities (people, orgs, places) with labeling and context
Table Structuring	Extracts and normalizes tables with headers, rows, and data types
Contextual Chunking	Segments content into meaningfully grouped sections based on structure and semantics
Summarization	Generates document-level summaries
Classification	Tags documents by type, topic, or intent using AI classifiers
Contextual Metadata	Derives rich metadata by interpreting document meaning, layout, and semantics, far beyond basic file properties
Embeddings Generation	Creates vector representations for content, enabling RAG and clustering

These outputs are returned in Markdown or JSON formats and are ready for direct ingestion into:

Vector databases (e.g., Pinecone, Weaviate, FAISS)
Data lakes (e.g., Delta Lake, S3, ADLS)
Data catalogs (e.g., Unity Catalog, Informatica, Collibra)
MLOps pipelines (e.g., LangChain, LlamaIndex, Haystack)
LLMs (e,g., OpenAI, Claude, Llama)

Real-World Workflows: Where Knowledge Enrichment Delivers

Insurance – Claims Intake and Routing

Scanned documents, handwritten notes, and damage photos are all enriched into structured JSON with extracted metadata (claim number, policy ID, repair estimate) and chunked content for routing or summarization. LLMs consume the cleaned data to assist adjusters or validate claims.

Legal – Discovery Preparation and Context Extraction

Contracts, pleadings, and emails are parsed into structured sections with semantic labeling and summaries. The output can be ingested into eDiscovery tools or used to fine-tune LLMs that assist with clause comparison or timeline construction.

Financial Services – RAG-Powered Agent Workflows

Annual reports, prospectuses, and policies are vectorized and embedded with layout-preserved markup. RAG agents can then retrieve precisely the right section for question answering or summarization with full traceability back to the source.

Built for Developers: API Access from Day One

Knowledge Enrichment is designed as a developer-first API. Every feature is exposed via simple REST endpoints.

Example: Convert a file to markdown and get chunked content with embeddings

curl -L 'https://knowledge-enrichment.ai.experience.hyland.com/latest/api/data-curation/presign' \
-H 'Content-Type: application/json' \
-H 'Accept: text/json' \
-H 'Authorization: Bearer <token>' \
-d '{
  "normalization": {
    "quotations": true
  },
  "chunking": true,
  "embedding": true,
}'

The response contains structured content in Markdown, as well as contextually chunked content and embeddings. Each element is aligned by page and coordinate so you can trace insights back to the original document with full fidelity. See more information on this endpoint in the documentation.

Note: The above UI was created to showcase the capabilities of Knowledge Enrichment and is not part of the Knowledge Enrichment product.

Example: Generate a summary and additional metadata of a file

curl -L 'https://cin-context-api.experience.hyland.com/context/api/content/process' \
-H 'Content-Type: application/json' \
-H 'Accept: text/plain' \
-H 'Authorization: Bearer <token>' \
-d '{
  "objectKeys": [
    "string"
  ],
  "actions": [
    "image-description, image-metadata-generation"
  ],
  "kSimilarMetadata": [
    {
  "estimate_details": {
    "job_number": "R-2024-0568",
    "creation_date": "2024-06-15",
    "expiration_date": "2024-07-15",
    "estimate_total": "8,750.00",
    "status": "pending"
  },
  "property": {
    "address": "123 Main Street",
    "city": "Springfield",
    "state": "IL",
    "zip": "62701",
    "year_built": "1995",
    "roof_size_sqft": "2,400"
  },
  "damage_assessment": {
    "damage_cause": "hail_storm",
    "date_of_damage": "2024-05-20",
    "affected_areas": "southwest_slope|ridge_caps|flashing",
    "severity": "moderate",
    "roof_condition": "significantly damaged",
    "potential_cause": "recent storm or wind event",
    "damage_types": "water intrusion|shingle displacement|structural exposure",
    "additional_risk": "potential hidden damage in surrounding roof areas",
    "urgency_level": "moderate to high"
  },
  "repair_scope": {
    "materials": "asphalt_shingles|underlayment|flashing",
    "warranty_period": "15 years",
    "estimated_completion_time": "3 days",
    "repair_recommendations": [
      "replace damaged shingles",
      "inspect and repair flashing",
      "clean gutters"
    ]
  }
}
  ]
}'

The response contains a JSON representation of the information we asked for from the document, including a summary and additional metadata interpreted from the document. See more information on this endpoint in the documentation.

Note: The above UI was created to showcase the capabilities of Knowledge Enrichment and is not part of the Knowledge Enrichment product.

Flexible Deployment Across the Enterprise

Whether you’re building an LLM-powered copilot or modernizing legacy document automation, Knowledge Enrichment unlocks new possibilities.

Popular Patterns:

RAG Pipelines – Enrich large document sets, vectorize, and index
Pre-Training Data Prep – Generate structured corpora from enterprise content
Compliance Automation – Extract key fields for audit, alerting, and validation
LLM Fine-Tuning – Create summaries and entity-tagged data for better models

No Centralization Needed: Enrich Content Where It Lives

Knowledge Enrichment doesn't require content centralization to deliver value. It's built to operate across:

ECM platforms (Hyland, OpenText, SharePoint, Box)
Cloud storage (S3, Azure Blob, Google Cloud Storage)
On-prem document stores (file shares, FTP, local repositories)

Its lightweight API model allows you to point it at content wherever it lives, without migration.

600+ File Formats, 0 Headaches

Enterprise content is messy. That’s why Knowledge Enrichment inherits the full breadth of file support from Document Filters, including:

Documents: DOCX, PDF, PPTX, RTF, ODT, EPUB
Spreadsheets: XLSX, CSV, NUMBERS
Emails: MSG, EML, OLK14
CAD & Engineering: DWG, DXF, DGN
Text and Markup: XML, JSON, HTML
Raster Image: PSD, DCM
Vector Image: INDD, VSDX

Using other services outside of Document Filters, Knowledge Enrichment is able to support additional formats, including:

Images: JPEG, PNG, TIFF
Audio: FLAC, M4A, MP3, WAV
Video: MP4, WebM

Each format is parsed with layout, structure, and content zones preserved, giving AI systems a richer, more accurate context to reason with.

Fuel for Agents, Automation, and the Next Generation of AI

As enterprise AI shifts from model development to agent deployment, the need for structured, explainable, and semantically meaningful content becomes critical. Knowledge Enrichment creates the substrate that enables agents to:

Understand document context, not just isolated facts
Respond based on intent-rich information rather than unstructured noise
Justify answers with traceable references back to source documents

Whether you’re building a smart assistant for internal operations or an external-facing customer support AI, the difference between helpful and harmful will be the data layer underneath.

Get Started: Structure Smarter, Deploy Faster

Knowledge Enrichment is available now as part of the Content Innovation Cloud.

The future of enterprise AI won’t be defined by prompts, it’ll be defined by the data you feed the models. With Knowledge Enrichment, you finally have the ability to deliver content that’s not just extracted, but structured, enriched, and ready for reasoning.

Let your AI start smarter. Let your content speak with context.

Explore the full developer documentation or express interest in getting access.

Why AI Is Only As Good As Its Input​

Under the Hood: How Knowledge Enrichment Transforms Content​

1. Hyland Document Filters - The Content Intelligence Foundation​

2. Contextual Structuring and Semantic Enrichment​

Real-World Workflows: Where Knowledge Enrichment Delivers​

Insurance – Claims Intake and Routing​

Legal – Discovery Preparation and Context Extraction​

Financial Services – RAG-Powered Agent Workflows​

Built for Developers: API Access from Day One​

Example: Convert a file to markdown and get chunked content with embeddings​

Example: Generate a summary and additional metadata of a file​

Flexible Deployment Across the Enterprise​

Popular Patterns:​

No Centralization Needed: Enrich Content Where It Lives​

600+ File Formats, 0 Headaches​

Fuel for Agents, Automation, and the Next Generation of AI​

Get Started: Structure Smarter, Deploy Faster​