Getting Started with C++¶
Hyland Document Filters allows you to integrate powerful document processing capabilities into your C++ applications. Below are instructions on how to set up your application depending on your build system, such as CMake, Visual Studio, or Make.
Clone and Include the Document Filters Repository¶
The Document Filters GitHub repository contains the necessary header files and libraries. If you are not using CMake's FetchContent, clone the repository manually into your project:
git clone https://github.com/Hyland/DocumentFilters.git
The primary header file required for your work is DocumentFiltersObjects.h
, which can be found in the bindings/cpp{VERSION}/include
directory. When using CMake's FetchContent
, there is no need to manually clone the repository (details are provided below).
Note
Using the C++ bindings is not mandatory; your application can opt to interact directly with the C API instead. Follow the instructions on Getting Started with C
Choosing a C++ binding version¶
The Document Filters library offers bindings for C++17, located in the bindings/cpp17
directory, as well as deprecated bindings for C++11 found in the bindings/cpp11
directory. The C++17 bindings deliver comprehensive Object API functionality, encompassing annotations and document comparison, while the C++11 bindings offer limited support. Although the C++11 bindings are labeled as deprecated, they should only be utilized when a modern C++ compiler is unavailable.
Memory Management:
- C++11 bindings: Requires manual memory management with raw pointers.
- C++17 bindings: Automatic object lifetime management, eliminating manual
delete
calls. Objects are internally reference counted, and can be safely stored on the stack or heap without the caller needing to track their life-time. - Migration strategy: REplace instances of manual memory management, including explicit calls to delete, or wrapping objects in smartptrs. Instead, objects should be placed on the stack, and allow them to be self-managed.
String Handling:
- C++11 bindings: Provided a mix of ansi string and wide string.
- C++17 bindings: Returned strings are normalized to std::wstring, with utility functions for converting between UTF-8 variants. Function parameters can be provided in either utf-8 or wide strings.
- Migration strategy: Use the provided string conversion function to convert between string types if your application does not work natively with
std::wstring
.
Object API Coverage:
- C++11 bindings: Limited coverage, restricted to text extraction, subfile handling, and rendering. Advanced features such as annotations and document comparison are not supported directly in the C++11 bindings and must be achieved via the C API.
- C++17 bindings: Provides full coverage of the Object API, including advanced features like annotations and document comparison, with feature parity across .NET, Python, and Java.
- Migration strategy: No required changes unless leveraging new features.
Stream Handling:
- C++11 bindings: Streams are handled via raw pointers and manual read/write operations.
- C++17 bindings: Streams are supported using modern C++ constructs, including std::istream and std::ostream, allowing for seamless integration with standard C++ I/O streams.
- Migration strategy: Replace raw pointer-based stream handling with std::istream and std::ostream-based implementations. This will ensure more efficient and safer stream operations.
Collection Iteration:
- C++11 bindings: Collections like lists or arrays are often iterated manually, typically using raw loops or index-based access.
- C++17 bindings: The bindings take advantage of range-based loops and standard C++ iterator patterns, improving readability and reducing potential off-by-one errors.
- Migration strategy: Migrate to range-based for-loops or use iterators for collections. This will make the code more concise and easier to maintain.
Callbacks:
- C++11 bindings: Callbacks are implemented using function pointers, requiring manual management and less flexibility when passing stateful functions.
- C++17 bindings: Callbacks can now be implemented using std::function, allowing the use of lambdas and stateful objects to handle callbacks more elegantly.
- Migration strategy: Replace function pointers with std::function where callbacks are used. This allows greater flexibility, enabling the use of lambdas or member functions with bound state.
Lambda Improvements:
- C++11 bindings: Basic lambda support exists, but capturing more complex objects or state (like move-only types) requires more manual management.
- C++17 bindings: C++17 lambdas support move-only types and generalized lambda captures, which allow for more flexible handling of objects within closures.
- Migration strategy: Refactor complex lambdas, especially those that handle state, to take advantage of C++17’s generalized captures. This reduces boilerplate and makes callbacks and functional programming patterns easier to manage.
Filesystem Support:
- C++11 bindings: File handling may rely on older APIs or require third-party libraries to handle filesystem operations like directory iteration and path manipulations.
- C++17 bindings: C++17 brings native support for filesystem operations via the std::filesystem library, which includes functions for path manipulation, file I/O, and directory traversal.
- Migration strategy: Replace any custom or third-party filesystem libraries with std::filesystem. This simplifies file and directory handling with a standardized and efficient API.
Integrating with CMake¶
If you're using CMake, you can either manually clone the repository or use the FetchContent
module to automatically fetch it without cloning. Here’s how:
Using FetchContent¶
CMake's FetchContent
allows you to fetch the repository directly without manually cloning it. Add the following to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
HylandDocumentFiltersCpp17
GIT_REPOSITORY https://github.com/Hyland/DocumentFilters.git
GIT_TAG <desired-tag-or-branch>
SOURCE_SUBDIR bindings/cpp17
)
FetchContent_MakeAvailable(HylandDocumentFiltersCpp17)
Once included, link the DocumentFilters::Cpp17
library to your target:
target_link_libraries(your_target PRIVATE DocumentFilters::Cpp17)
Manual Cloning with CMake¶
If you prefer to manually clone the repository, first clone the repo into your project and then link it manually:
git clone https://github.com/Hyland/DocumentFilters.git
After cloning, add the include directory and the library manually in your CMakeLists.txt
:
add_subdirectory(path/to/DocumentFilters/bindings/cpp17)
Integrating with Visual Studio projects (vcxproj-based)¶
To use Hyland Document Filters in a Visual Studio project, follow these steps:
Include the DocumentFiltersCpp17 project in your Solution¶
-
Right-click on your Solution in Solution Explorer, then select Add > Existing Project.
-
Navigate to the
bindings/cpp17
directory and select theDocumentFiltersCpp17.vcxproj
file. -
In your project, right-click on the project name, then select *Add > Reference. Ensure that
DocumentFiltersCpp17
is checked.
Include the Header Files¶
-
Go to Project > Properties > Configuration Properties > C/C++ > General > Additional Include Directories.
-
Add the path to the
bindings/cpp17/include
andbindings/c/include
directories, as follows:c:\path\to\DocumentFilters\bindings\cpp17\include;c:\path\to\DocumentFilters\bindings\c\include;
Note
You need to include both the
Cpp
andC
include directories in your project.
Link the ISYS11df Library¶
-
No additional steps are required to manually link the
ISYS11df
library, as it is already a dependency of theDocumentFiltersCpp17
project. -
However, ensure that the
ISYS11df.dll
file is accessible at runtime. You can either copy it to your project output directory or make sure it is available in your system’sPATH
.
Integrating with Makefiles projects¶
If you're using a manual build system, such as Make
with a Makefile
, you will need to manually specify the include paths and link the required libraries.
Include Header Files, build object api and Link Libraries in Makefile¶
You can modify your Makefile
to compile the source files and link the required libraries as specified in the CMake script. Below is an example Makefile
setup:
CXX = g++
CXXFLAGS = -std=c++17 -Ipath/to/DocumentFilters/bindings/c/include -Ipath/to/DocumentFilters/bindings/cpp17/include
LDFLAGS = -Lpath/to/DocumentFiltersBinaries -lISYS11df
SOURCES = src/DocumentFiltersObjects.cpp src/DocFiltersAnnotations.cpp src/DocFiltersBookmark.cpp \
src/DocFiltersCanvas.cpp src/DocFiltersCommon.cpp src/DocFiltersCompareDocumentSettings.cpp \
src/DocFiltersCompareDocumentSource.cpp src/DocFiltersCompareResultDifference.cpp \
src/DocFiltersCompareResultDifferenceDetail.cpp src/DocFiltersCompareResults.cpp \
src/DocFiltersCompareSettings.cpp src/DocFiltersDateTime.cpp src/DocFiltersExtractor.cpp \
src/DocFiltersFormat.cpp src/DocFiltersFormElement.cpp src/DocFiltersHyperlink.cpp \
src/DocFiltersOption.cpp src/DocFiltersPage.cpp src/DocFiltersPageElement.cpp \
src/DocFiltersPagePixels.cpp src/DocFiltersRenderPageProperties.cpp src/DocFiltersStreams.cpp \
src/DocFiltersStrings.cpp src/DocFiltersSubFile.cpp src/DocFiltersWord.cpp
HEADERS = include/DocumentFiltersObjects.h
all: my_cpp_app
# Rule to build the DocumentFilters library
libDocumentFilters.a: $(SOURCES) $(HEADERS)
$(CXX) $(CXXFLAGS) -c $(SOURCES)
ar rcs libDocumentFilters.a *.o
# Link the library to the main application
my_cpp_app: main.o libDocumentFilters.a
$(CXX) -o my_cpp_app main.o libDocumentFilters.a $(LDFLAGS)
main.o: main.cpp
$(CXX) $(CXXFLAGS) -c main.cpp
clean:
rm -f *.o my_cpp_app libDocumentFilters.a
Compile the Application¶
After setting up the Makefile, you can build your application by running:
make
With this approach, you'll need to manually download the Document Filters Release Binaries and specify their location as a link library directory, referred to above as path/to/DocumentFiltersBinaries.
Initializing and calling Document Filters¶
After setting up the necessary includes and linking the required libraries, the next step is to initialize the Document Filters library. This initialization is essential, as it prepares the library to process documents effectively. Below is a sample code snippet demonstrating how to perform this initialization in your Cpp application.
#include <iostream>
#include <memory>
#include <DocumentFiltersObjects.h>
static const char* LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";
int main() {
try {
Hyland::DocFilters::Api api;
api.Initialize(LICENSE_KEY, ".");
} catch (const std::exception& error) {
std::cout << "Initialization error: " << error.what() << std::endl;
return 1;
}
std::cout << "Document Filters initialized successfully!" << std::endl;
return 0;
}
Explanation:
-
License Key: The LICENSE_KEY is provided as part of the initialization process and is required to unlock the full functionality of the Document Filters library. Replace "YOUR_LICENSE_KEY_HERE" with your actual license key.
-
Resource Directory: The second parameter to Initialize, ".", specifies the directory where Document Filters should look for resources such as configuration files and fonts. In this case, "." means it will search the current working directory where the shared libraries reside. You can adjust this path based on your setup.
-
Error Handling: The try-catch block captures any exceptions thrown during initialization and outputs the error message. This ensures you can handle initialization failures gracefully.
The DocumentFilters objects are now initialized and ready to be used for various document processing tasks, such as extraction, rendering, or comparison.
Extracting Text¶
Once the Document Filters library is initialized, you can begin extracting text from documents. The following C++ code snippet demonstrates how to load a document and extract its contents using the Document Filters API. This example focuses on extracting text from a Word document (.doc file).
#include <iostream>
#include <memory>
#include <DocumentFiltersObjects.h>
static const char* LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";
int main() {
try {
Hyland::DocFilters::Api api;
api.Initialize(LICENSE_KEY, ".");
// Load the document for text extraction
auto doc = api.GetExtractor(L"filename.doc");
// Open the document for both body text and metadata extraction
doc.Open(IGR_BODY_AND_META);
// Loop through the document until EOF and extract text
while (!doc->getEOF()) {
auto text = doc.GetText();
std::wcout << text << std::endl;
}
} catch (const std::exception& error) {
std::cout << "Error: " << error.what() << std::endl;
return 1;
}
return 0;
}
Explanation:
-
Document Loading: The document is loaded using
api.GetExtractor
, which returns an object for extracting content. In this case,"filename.doc"
is the input Word document. Update the file path accordingly. -
Opening the Document: The
doc.Open(IGR_BODY_AND_META)
method is called to open the document for text extraction. TheIGR_BODY_AND_META
flag specifies that both the document body and metadata should be extracted. -
Text Extraction: The code enters a loop, repeatedly calling
doc.GetText()
until thegetEOF()
method indicates the end of the document. The extracted text is output using std::wcout to handle wide characters. -
Automatic Resource Management: Since the document object is automatically managed, it will be closed when it goes out of scope, ensuring proper cleanup without manual intervention.
This approach efficiently extracts text from documents while managing resources automatically.
Converting a Document¶
After initializing the Document Filters library, you can convert documents into different formats, such as PDF. The following C++ code snippet demonstrates how to load a Word document (.doc
file) and convert it into a PDF using the Document Filters API.
#include <iostream>
#include <memory>
#include <DocumentFiltersObjects.h>
static const char* LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";
int main() {
try {
Hyland::DocFilters::Api api;
api.Initialize(LICENSE_KEY, ".");
// Load the document for conversion
auto doc = api.GetExtractor(L"filename.doc");
// Create an output canvas for PDF rendering
auto canvas = api.MakeOutputCanvas(L"result.pdf", IGR_DEVICE_IMAGE_PDF, "");
// Open the document in image format for rendering
doc.Open(IGR_FORMAT_IMAGE);
// Loop through the document pages and render each one to the PDF canvas
for (auto&& page : doc.pages()) {
canvas.RenderPage(page);
}
} catch (const std::exception& error) {
std::cout << "Error: " << error.what() << std::endl;
return 1;
}
return 0;
}
Explanation:
-
Document Loading: The document is loaded using
api.GetExtractor
with the path"filename.doc"
. Replace this with your actual file path. -
Canvas Creation: A PDF canvas is created with the
MakeOutputCanvas
function, specifying the output file as"result.pdf"
. TheIGR_DEVICE_IMAGE_PDF
flag is used to indicate that the output format is PDF. -
Opening the Document: The document is opened using
doc.Open(IGR_FORMAT_IMAGE)
, which prepares the document for rendering as an image-based format (used for PDF conversion). -
Page Rendering: The code iterates through each page of the document using a range-based loop. For each page, the
canvas.RenderPage(page)
method is called to render it onto the PDF canvas. -
Automatic Resource Management: Both the
doc
andcanvas
objects are automatically managed and will be closed when they go out of scope, ensuring proper cleanup.
This approach efficiently converts documents to PDF format while handling resources automatically.