Hyland Document Filters Blog

Document Filters 26.1 Release

February 18, 2026 · 2 min read

Nabih Metri

Product Manager

AI and analytics workflows rely on two things, faithful conversions and output you can trust downstream. Document Filters 26.1 improves both, empowering developers to build validated, high-precision applications. This release accelerates proof-of-concept phases by unlocking Markdown output for trial users, enabling immediate testing of LLM ingestion pipelines. It also enhances downstream AI capabilities with word-level location tracking in MDAST, providing the granular context needed for citation and grounding. Additionally, expanded format coverage for healthcare imaging ensures that specialized medical records are processed with the same reliability as standard business documents.

Document Filters 25.4 Release

November 12, 2025 · 2 min read

Nabih Metri

Product Manager

Every detail matters when preparing documents for AI and analytics. The 25.4 release of Document Filters refines how precision meets performance by introducing new controls for rendering, OCR, and text extraction that make every output cleaner and more usable. This release brings flexible PDF rendering modes for display or print consistency, embedded image OCR across Excel and PDF formats for deeper text capture, and fine-grained exclusion margins to remove unwanted headers and footers from extracted text. Together, these enhancements create cleaner data, sharper visual fidelity, and more reliable results for AI and other systems.

Document Filters 25.3 Release

August 20, 2025 · 3 min read

Nabih Metri

Product Manager

Every document hides more insight than what first meets the eye. Screenshots and diagrams often lock away critical text, healthcare records carry complex structures, and tables can be deceptively hard to detect. With Document Filters 25.3, these blind spots are brought into the open. This release adds OCR for embedded images, HL7 medical record processing with preserved structure, and smarter table detection in PDFs. Together these advances capture more context earlier, reduce manual cleanup, and deliver higher-quality inputs for AI and analytics.

AI-Ready Content Starts Here - Introducing Knowledge Enrichment

July 7, 2025 · 8 min read

Nabih Metri

Product Manager

AI isn’t failing because models aren’t good enough. It’s failing because the data isn’t.

Today, we’re launching Knowledge Enrichment, a breakthrough API that transforms fragmented, unstructured enterprise content into deeply structured, semantically rich, AI-ready data. Powered by Hyland’s proven Document Filters and AI. It delivers what developers and architects have been missing, clean, contextual, vectorized output across 600+ file types, ready for direct integration into your AI stack.

If you're building AI Agents, retrieval-augmented generation (RAG) systems, or enterprise LLMs, stop what you’re doing. This changes everything.

Document Filters 25.2 Release

May 14, 2025 · 4 min read

Nabih Metri

Product Manager

Document Filters 25.2 advances our shift-left strategy by enhancing traceability, data integrity, and extensibility within content pipelines. With this release, Document Filters becomes the first solution to embed positional metadata directly into Markdown for all our supported formats, setting a new benchmark for transparency and explainability in AI and search-driven workflows. We’ve also improved Markdown’s handling of complex tables, enabling seamless extraction of structured data from even the most irregular layouts. In addition, table extraction is now supported for XFA PDFs, a long-standing challenge for automation and compliance initiatives. Finally, a new custom OCR callback interface gives teams the freedom to integrate any OCR engine into their workflow, unlocking multilingual, domain-specific, and image-heavy content for broader automation. Each of these updates contributes to cleaner, more connected data earlier in the process—reducing errors, manual fixes, and integration complexity. Let’s take a closer look at what’s new.

Beyond Magic Numbers: The Complexity of File Type Identification

March 5, 2025 · 8 min read

Ben Truscott

Document Filters Principal Engineer

In the realm of enterprise software, managing and processing files from diverse sources is a common challenge. Whether you're developing AI-driven solutions, building compliance-focused applications, or ensuring data security, the ability to accurately identify file types is crucial. The files you encounter could be anything—from legacy documents dating back to the 1980s and 1990s to modern formats uploaded from smartphones or cloud services.

When people think about identifying a file type, they often assume that the first few bytes—commonly known as a "magic number"—are enough to determine what kind of file they’re dealing with. While this works for some formats, it’s far from a general rule. Modern file formats frequently use container structures that obscure their actual content. For example, many file types—including Microsoft Office documents (DOCX, XLSX, PPTX) and EPUB ebooks—are essentially ZIP archives with structured data inside. Similarly, older Microsoft formats like DOC and XLS rely on the Compound File Binary (CFB) format, which acts like a mini file system within a file. At a glance, these container formats don’t immediately reveal what kind of document they hold.

Document Filters 25.1 Release

February 19, 2025 · 4 min read

Nabih Metri

Product Manager

Document Filters 25.1 continues our 'shift-left' strategy by enhancing structured data extraction earlier in the pipeline, reducing the need for downstream corrections and transformations. This release introduces expanded structured output with improved heading detection and list recognition across multiple formats, ensuring cleaner, more reliable data for AI/ML workflows and business applications. PDF processing is also more precise, with better list mapping and automatic text unwrapping for a more natural reading experience. Additionally, file type identification now works even when only part of a file is available, minimizing unnecessary data transfers and improving processing efficiency. By delivering cleaner, more structured data from the start, Document Filters 25.1 helps streamline integration, reduce complexity, and optimize large-scale document workflows. Let’s take a closer look at what’s new.

Document Filters 24.4 Release

November 13, 2024 · 4 min read

Nabih Metri

Product Manager

The latest release of Hyland Document Filters introduces features that streamline document processing and enhance efficiency, supporting the broader 'shift-left' strategy. By empowering users to control data earlier in the workflow, these updates reduce complexity and improve performance across AI/ML applications. New content cleaning options simplify data preparation, making it easier to generate machine-friendly content, while the Simplified JSON output format accelerates data extraction and processing. Additionally, the new text-mode Markdown support lowers resource consumption, allowing for more efficient handling of large documents. With the addition of a Python package, users can also integrate Document Filters seamlessly into development workflows, enhancing overall productivity and workflow efficiency.

Shifting Left with Document Filters – A Vision for the Future

September 26, 2024 · 6 min read

Nabih Metri

Product Manager

As the digital landscape continues to evolve, the need for efficient document processing has never been greater. Applications demand accurate, structured data that's ready for immediate use, reducing the steps required to prepare it. At Hyland, we've embraced this challenge with our Document Filters product. Our strategy? Shift left.

Document Filters 24.3 Release

August 21, 2024 · 4 min read

Nabih Metri

Product Manager

We're excited to announce the latest release of Document Filters, packed with powerful new features designed to enhance your document processing capabilities. This update introduces a JSON Output Type for structured data handling, a Markdown Output Type for streamlined document conversion, advanced PDF Table Extraction for improved data accuracy, and MSI Installer Sub-File Extraction for comprehensive file analysis. Additionally, we've added community-inspired support for Hancom Hangul HWPX text extraction and HD rendering. Read on to discover how these new features can elevate your workflows and drive better results.