Getting started with Document Filters¶
Document Filters is an all-in-one SDK that helps software developers embed file identification, data extraction, document transformation, and format conversion capabilities into their applications.
Document Filters is implemented as a set of Dynamic Link Libraries (DLLs) on Windows and Shared Objects (SOs) on UNIX-based systems. Document Filters allows an application developer to perform the following actions:
- Identify almost any type of file
- Extract text and metadata from hundreds of different document formats
- Extract sub-documents and attachments from many document and archive formats, including MS Office documents, Zips, RARs, 7-Zips, ISOs, CABs, PSTs, & OSTs
- Convert the most popular document formats to High-Definition output (with styles, layout and images). Supported modes include several image types, HTML, PDF, TIFF and Structured XML
- Apply Canvas and Drawing functions to achieve document markup, permanent annotations and redaction
Sample code and header definitions are available in C++, C#, HTML, Java, and Python at https://github.com/Hyland/DocumentFilters.