Extractor interface¶
The Extractor interface allows you to extract the content of a document and/or enumerate its sub-documents, such as email attachments and ZIP archives.
To obtain this interface, call the DocumentFilters.GetExtractor method. The Extractor interface contains the following methods and properties.
Extractor::Close method | The Close method releases the document resources referenced by this Extractor object. |
Extractor::Compare method | The Compare method allows you to compare two documents returning the differences. |
Extractor::CopyTo method | The CopyTo method extracts the binary content of the sub-document to a file. |
Extractor::EOF property | The EOF property is only valid for documents where the SupportsText property is TRUE. The EOF property will be set to TRUE when no more text can be extracted from the document with calls to GetText. If the document needs to be re-read, call Close and Open first. |
Extractor::FileType property | The FileType property is the document format code, as listed in Document Format Codes chart on page . The function is overloaded to be able to return the format name as a string. |
Extractor::GetFirstImage method | The GetFirstImage method obtains a SubFile object representing the first embedded image of the current document when converting using classic HTML. |
Extractor::GetFirstPage method | The GetFirstPage method returns the first page object of an opened document. The document must be opened in image mode (IGR_FORMAT_IMAGE). |
Extractor::GetFirstSubFile method | The GetFirstSubFile method obtains a SubFile object representing the first sub-document of the current document. |
Extractor::GetHashMD5 method | The getHashMD5 methods obtain a string representing the calculated hash of the current document for unique identification. |
Extractor::GetHashSHA1 method | The getHashSHA1 methods obtain a string representing the calculated hash of the current document for unique identification. |
Extractor::GetNextImage method | The GetNextImage method obtains a SubFile object representing the next embedded image of the current document when converting using classic HTML. |
Extractor::GetNextPage method | The GetNextPage method returns the next page object of an opened document. The document must be opened in image mode (IGR_FORMAT_IMAGE). |
Extractor::GetNextSubFile method | The GetNextSubFile method obtains a SubFile object representing the next sub-document of the current document. |
Extractor::GetPage method | The GetPage method returns the page at the given index, where the page index is 0-based. An exception is raised if the index is invalid. |
Extractor::GetPageCount method | Returns the number of pages in the current document, the document must be opened in image mode for the page count to be populated. |
Extractor::GetRootBookmark method | The GetRootBookmark method returns a Bookmark node representing the top-most node of the bookmark hierarchy. The root bookmark only has Children data, it has no title or destination properties. |
Extractor::GetSubFile method | The GetSubFile method obtains a SubFile object representing the nominated sub-file of the current document. |
Extractor::GetText method | The GetText method extracts the next portion of text content from the document. |
Extractor::Images property | The Images method property provides an enumerable collection of SubFile objects representing the embedded image of the current document when converting using classic HTML. |
Extractor::Localize property | Utility function that allows for localization of metadata without providing a callback. Any localization options must be set before an |
Extractor::MimeType property | Returns the MimeType of the file. |
Extractor::Open method | The Open method opens a document for processing. |
Extractor::PageCount property | Returns the number of pages in the current document, the document must be opened in image mode for the page count to be populated. |
Extractor::Pages property | The Pages property provides an enumerable collection of pages for an opened document. The document must be opened in image mode (IGR_FORMAT_IMAGE). |
Extractor::SaveTo method | The SaveTo method extracts the entire text content of the document in a single call. The text may be saved to a file with the given name or via an instance of an IStream (COM) object. |
Extractor::SubFiles property | Returns an enumerable set of SubFiles. |
Extractor::getFileType method | The FileType method allows for extended information to be returned about the file type. |
Extractor::getSupportsHTML method | getSupportsHTML method is TRUE if document can be converted to classic HTML. |
Extractor::getSupportsSubFiles property | getSupportsSubFiles property is TRUE if the document is a compound or archive document, potentially with sub-documents. |
Extractor::getSupportsText method | getSupportsText method return TRUE if text content can be extracted from the document. This property must be TRUE to be able to call to the Extractor::SaveTo and Extractor::GetText methods. |