Getting Started with .NET¶
Hyland Document Filters provides robust document processing capabilities that can be easily integrated into your .NET applications. Follow these instructions to set up your environment.
Installing the Bindings¶
The Document Filters .NET bindings are available on NuGet under the package name Hyland.DocumentFilters, which provides the easiest and recommended way to integrate Document Filters into your .NET applications.
If needed, you can also access the source code for the bindings in the bindings/dotnet
directory of the Document Filters GitHub repository. This allows for modifications and manual builds when customization is required.
Note
Earlier versions utilized Perceptive.DocumentFilters
as a pre-compiled project, which remains included in the package. To upgrade a project to use Hyland.DocumentFilters
, simply update the project reference and change the namespace accordingly.
Note
DocumentFilters DLLs are designed for a specific architecture. When your application targets AnyCPU, the architecture can be determined at runtime, so both 32-bit and 64-bit versions of the DocFilters DLLs must be present.
If you're using the NuGet package and targeting .NET Core, the selection of the appropriate DLLs is automated. However, if you are targeting .NET Framework, you must ensure that the DLLs are discoverable according to the Default Probing rules for Unmanaged (native) libraries.
Using Published Bindings¶
To use the official Document Filters .NET bindings, you can add the Hyland.DocumentFilters
package from NuGet to your project using Visual Studio or the command line.
Via the Command Line:
Run the following command in your terminal:
dotnet add package Hyland.DocumentFilters
Via Visual Studio:
Follow these steps to add the bindings via Visual Studio:
- Open your project in Visual Studio.
- In the Solution Explorer, right-click on your project and select Manage NuGet Packages.
- In the NuGet Package Manager window, select the Browse tab.
- In the search box, type
Hyland.DocumentFilters
. - Once found, click on the
Hyland.DocumentFilters
package and select the desired version. - Click Install to add the package to your project.
- After installation, the package will be listed under your project’s Dependencies in Solution Explorer, and you can start using the Document Filters API.
This ensures that you always have the latest stable version of the bindings, making integration and updates straightforward.
Building the Bindings¶
If you need to modify the .NET bindings or build them manually, follow these steps:
- Clone the Document Filters GitHub repository.
- Navigate to the
bindings/dotnet
directory. - Open the solution file in Visual Studio or another preferred IDE.
- Make any necessary changes to the code.
- Build the project to generate the custom .NET bindings for your application.
Once built, you can reference the custom bindings in your .NET project manually, or package them into a local NuGet package for reuse.
Initializing and calling Document Filters¶
1 2 3 4 |
|
The preceding code loads the DocumentFilters
package into global scope, then creates a new Api
singleton and Initialize
s it with a license key. Replace "YOUR_LICENSE_KEY_HERE" with your actual license key.
The second argument controls where DocFilters should look for resources, such as configuration files and fonts. The .
tells it to look in the same directory as the DocFilter's shared libraries.
Note: ISYSdf11.dll must be either in same folder as the currently executing Assembly or found by Default Probing rules for Unmanaged (native) libraries.
Extracting Text¶
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The preceding code loads the file filename.doc
into an extractor doc
. By using a scoped using
block, the extractor will be closed when doc
goes out of scope.
It then opens the extractor with BodyAndMeta
indicating that we want to extract both the text (body) and the metadata of the document.
Finally, it loops over calling GetText
until the extractor reports EndOfStream
.
Converting a Document¶
1 2 3 4 5 6 7 8 9 10 11 |
|
The preceding code loads the file filename.doc
into an extractor doc
. By using a scoped using
block, the extractor will be closed when doc
goes out of scope.
It also creates a new canvas
object of type CanvasType.PDF
and stores it as canvas
. The canvas will also be automatically closed when it goes out of scope.
It then opens the extractor with FormatImage
indicating that we want to convert the file to an image based output. This triggers the file to be paginated.
Finally, it calls RenderPages
to output every page from doc
into the canvas
. You could also manually iterate of over the pages and call RenderPages
.
Did you know?
You can render more than one document to a canvas. If you want to stitch multiple files together, simply load each document into it's own Extractor
, then call RenderPage/s
onto a single canvas.