Getting Stated with Java¶
Hyland Document Filters provides robust document processing capabilities that can be easily integrated into your Java applications. Follow these instructions to set up your environment.
Clone and Include the Document Filters Repository¶
The Document Filters GitHub repository contains the necessary files and libraries.
Installing the Bindings¶
The Java bindings JAR file, ISYS11df.jar
, can be found in the bindings/java/lib
directory of the Document Filters GitHub repository. While the same JAR file can be used across all platforms, you will need to obtain the appropriate native binaries for each platform you wish to support. The native binaries are included in the release ZIP files for each platform.
Note
The run.sh
and run.cmd
scripts included with the Java samples in the Document Filters GitHub repository automatically handle downloading the release binaries for the current platform.
Integrating with Maven¶
- Add the JAR and native binaries: Since
ISYS11df.jar
is not hosted on Maven Central, you'll need to manually include the JAR file and native binaries. - Add the dependencies: Copy the
ISYS11df.jar
into your project directory (e.g.,libs
folder). -
Update your
pom.xml
:pom.xml<dependencies> <!-- Add Document Filters JAR as a system-scoped dependency --> <dependency> <groupId>com.perceptive</groupId> <artifactId>documentfilters</artifactId> <version>11.0</version> <scope>system</scope> <systemPath>${project.basedir}/libs/ISYS11df.jar</systemPath> </dependency> </dependencies> <build> <plugins> <!-- Ensure native libraries are accessible by setting up system properties --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-antrun-plugin</artifactId> <version>1.8</version> <executions> <execution> <phase>process-resources</phase> <configuration> <tasks> <copy file="path/to/native/binaries/ISYS11df.dll" todir="${project.build.directory}/native/"/> </tasks> </configuration> </execution> </executions> </plugin> </plugins> </build>
-
Configure Native Libraries: You may need to set the library path in your code or Maven build script using the java.library.path system property.
Integrating with Gradle¶
- Add the JAR and native binaries: Copy the
ISYS11df.jar
to your project'slibs
directory. -
Update your build.gradle:
build.gradlegroovy Copy code dependencies { // Add Document Filters JAR as a compile-time dependency implementation files('libs/ISYS11df.jar') } task copyNativeLibs(type: Copy) { from 'path/to/native/binaries' into "$buildDir/nativeLibs" } // Ensure native libraries are available run { dependsOn copyNativeLibs systemProperty 'java.library.path', "$buildDir/nativeLibs" }
-
Configure Native Libraries: Like Maven, you can set the
java.library.path
in your Gradle run configuration.
Integrating with Ant¶
- Add the JAR and native binaries: Place the
ISYS11df.jar
and native binaries in your project folder (e.g.,lib
andnative
folders). -
Update
build.xml
:build.xml<project name="DocumentFiltersProject" basedir="." default="run"> <path id="classpath"> <pathelement location="lib/ISYS11df.jar"/> </path> <target name="run"> <java classname="com.perceptive.App" fork="true"> <classpath refid="classpath"/> <jvmarg value="-Djava.library.path=./native"/> </java> </target> </project>
-
Configure Native Libraries: Set the
java.library.path
using thejvmarg
to point to the directory containing the native binaries.
Initializing and calling Document Filters¶
Once the package is installed, you can begin using it in your application.
import com.perceptive.documentfilters.*;
public class App {
private static final String LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";
public static void main(String[] args) {
try {
DocumentFilters api = new DocumentFilters();
api.Initialize(LICENSE_KEY, ".");
} catch (Exception e) {
e.printStackTrace();
}
}
}
Explanation:
- The code imports the DocumentFilters package.
- A new
DocumentFilters
instance is created and initialized using a license key. Replace "YOUR_LICENSE_KEY_HERE" with your actual license key. - The second parameter
.
specifies the directory for configuration files and resources, such as fonts.
Note: ISYS11df.(dll/so/dylib)
will be loaded by a call to System.loadLibrary("ISYS11df")
. For more details, refer to System.LoadLibrary.
Extracting Text¶
Once the Document Filters library is initialized, you can begin extracting text from documents. The following Java code snippet demonstrates how to load a document and extract its contents using the Document Filters API. This example focuses on extracting text from a Word document (.doc file).
import com.perceptive.documentfilters.*;
public class App {
private static final String LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";
public static void main(String[] args) {
try {
DocumentFilters api = new DocumentFilters();
api.Initialize(LICENSE_KEY, ".");
try (Extractor doc = api.GetExtractor("filename.doc")) {
doc.Open(isys_docfilters.IGR_BODY_AND_META);
while (!doc.getEOF()) {
String text = doc.GetText(4096);
System.out.println(text);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Explanation:
- The code initializes
DocumentFilters
and loads a documentfilename.doc
into anExtractor
instance. - It uses
IGR_BODY_AND_META
to extract both the document body and metadata. - The
GetText
method reads the document's content in chunks of 4096 characters, looping until theEOF
(End of File) is reached.
Converting a Document¶
After initializing the Document Filters library, you can convert documents into different formats, such as PDF. The following Java code snippet demonstrates how to load a Word document (.doc
file) and convert it into a PDF using the Document Filters API.
import com.perceptive.documentfilters.*;
public class App {
private static final String LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";
public static void main(String[] args) {
try {
DocumentFilters api = new DocumentFilters();
api.Initialize(LICENSE_KEY, ".");
try (Extractor doc = api.GetExtractor("filename.doc");
Canvas canvas = api.MakeOutputCanvas("output.pdf", isys_docfilters.IGR_DEVICE_IMAGE_PDF, "")) {
doc.Open(isys_docfilters.IGR_FORMAT_IMAGE);
for (int pageIndex = 0, pageCount = doc.GetPageCount(); pageIndex < pageCount; ++pageIndex) {
try (Page page = doc.GetPage(pageIndex)) {
canvas.RenderPage(page);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Explanation:
- This code converts
filename.doc
to a PDF by rendering each page into aCanvas
. - The extractor is opened with
IGR_FORMAT_IMAGE
, which sets the document for image-based output, triggering pagination. - Each page of the document is rendered using
RenderPage
, looping through all available pages until complete.