Skip to content

Getting started with Document Filters

Document Filters is an all-in-one SDK that helps software developers embed file identification, data extraction, document transformation, and format conversion capabilities into their applications.

Document Filters is implemented as a set of Dynamic Link Libraries (DLLs) on Windows and Shared Objects (SOs) on UNIX-based systems. Document Filters allows an application developer to perform the following actions:

  • Identify almost any type of file
  • Extract text and metadata from hundreds of different document formats
  • Extract sub-documents and attachments from many document and archive formats, including MS Office documents, Zips, RARs, 7-Zips, ISOs, CABs, PSTs, & OSTs
  • Convert the most popular document formats to High-Definition output (with styles, layout and images). Supported modes include several image types, HTML, PDF, TIFF and Structured XML
  • Apply Canvas and Drawing functions to achieve document markup, permanent annotations and redaction

Sample code and header definitions are available in C++, C#, HTML, Java, and Python at https://github.com/Hyland/DocumentFilters.