Skip to content

Getting Started with C++

Hyland Document Filters allows you to integrate powerful document processing capabilities into your C++ applications. Below are instructions on how to set up your application depending on your build system, such as CMake, Visual Studio, or Make.

Clone and Include the Document Filters Repository

The Document Filters GitHub repository contains the necessary header files and libraries. If you are not using CMake's FetchContent, clone the repository manually into your project:

git clone https://github.com/Hyland/DocumentFilters.git

The primary header file required for your work is DocumentFiltersObjects.h, which can be found in the bindings/cpp{VERSION}/include directory. When using CMake's FetchContent, there is no need to manually clone the repository (details are provided below).

Note

Using the C++ bindings is not mandatory; your application can opt to interact directly with the C API instead. Follow the instructions on Getting Started with C

Choosing a C++ binding version

The Document Filters library offers bindings for C++17, located in the bindings/cpp17 directory, as well as deprecated bindings for C++11 found in the bindings/cpp11 directory. The C++17 bindings deliver comprehensive Object API functionality, encompassing annotations and document comparison, while the C++11 bindings offer limited support. Although the C++11 bindings are labeled as deprecated, they should only be utilized when a modern C++ compiler is unavailable.

Memory Management:

  • C++11 bindings: Requires manual memory management with raw pointers.
  • C++17 bindings: Automatic object lifetime management, eliminating manual delete calls. Objects are internally reference counted, and can be safely stored on the stack or heap without the caller needing to track their life-time.
  • Migration strategy: REplace instances of manual memory management, including explicit calls to delete, or wrapping objects in smartptrs. Instead, objects should be placed on the stack, and allow them to be self-managed.

String Handling:

  • C++11 bindings: Provided a mix of ansi string and wide string.
  • C++17 bindings: Returned strings are normalized to std::wstring, with utility functions for converting between UTF-8 variants. Function parameters can be provided in either utf-8 or wide strings.
  • Migration strategy: Use the provided string conversion function to convert between string types if your application does not work natively with std::wstring.

Object API Coverage:

  • C++11 bindings: Limited coverage, restricted to text extraction, subfile handling, and rendering. Advanced features such as annotations and document comparison are not supported directly in the C++11 bindings and must be achieved via the C API.
  • C++17 bindings: Provides full coverage of the Object API, including advanced features like annotations and document comparison, with feature parity across .NET, Python, and Java.
  • Migration strategy: No required changes unless leveraging new features.

Stream Handling:

  • C++11 bindings: Streams are handled via raw pointers and manual read/write operations.
  • C++17 bindings: Streams are supported using modern C++ constructs, including std::istream and std::ostream, allowing for seamless integration with standard C++ I/O streams.
  • Migration strategy: Replace raw pointer-based stream handling with std::istream and std::ostream-based implementations. This will ensure more efficient and safer stream operations.

Collection Iteration:

  • C++11 bindings: Collections like lists or arrays are often iterated manually, typically using raw loops or index-based access.
  • C++17 bindings: The bindings take advantage of range-based loops and standard C++ iterator patterns, improving readability and reducing potential off-by-one errors.
  • Migration strategy: Migrate to range-based for-loops or use iterators for collections. This will make the code more concise and easier to maintain.

Callbacks:

  • C++11 bindings: Callbacks are implemented using function pointers, requiring manual management and less flexibility when passing stateful functions.
  • C++17 bindings: Callbacks can now be implemented using std::function, allowing the use of lambdas and stateful objects to handle callbacks more elegantly.
  • Migration strategy: Replace function pointers with std::function where callbacks are used. This allows greater flexibility, enabling the use of lambdas or member functions with bound state.

Lambda Improvements:

  • C++11 bindings: Basic lambda support exists, but capturing more complex objects or state (like move-only types) requires more manual management.
  • C++17 bindings: C++17 lambdas support move-only types and generalized lambda captures, which allow for more flexible handling of objects within closures.
  • Migration strategy: Refactor complex lambdas, especially those that handle state, to take advantage of C++17’s generalized captures. This reduces boilerplate and makes callbacks and functional programming patterns easier to manage.

Filesystem Support:

  • C++11 bindings: File handling may rely on older APIs or require third-party libraries to handle filesystem operations like directory iteration and path manipulations.
  • C++17 bindings: C++17 brings native support for filesystem operations via the std::filesystem library, which includes functions for path manipulation, file I/O, and directory traversal.
  • Migration strategy: Replace any custom or third-party filesystem libraries with std::filesystem. This simplifies file and directory handling with a standardized and efficient API.

Integrating with CMake

If you're using CMake, you can either manually clone the repository or use the FetchContent module to automatically fetch it without cloning. Here’s how:

Using FetchContent

CMake's FetchContent allows you to fetch the repository directly without manually cloning it. Add the following to your CMakeLists.txt:

CMakeLists.txt
include(FetchContent)

FetchContent_Declare(
    HylandDocumentFiltersCpp17
    GIT_REPOSITORY https://github.com/Hyland/DocumentFilters.git
    GIT_TAG <desired-tag-or-branch>
    SOURCE_SUBDIR bindings/cpp17
)

FetchContent_MakeAvailable(HylandDocumentFiltersCpp17)

Once included, link the DocumentFilters::Cpp17 library to your target:

CMakeLists.txt
target_link_libraries(your_target PRIVATE DocumentFilters::Cpp17)

Manual Cloning with CMake

If you prefer to manually clone the repository, first clone the repo into your project and then link it manually:

bash
git clone https://github.com/Hyland/DocumentFilters.git

After cloning, add the include directory and the library manually in your CMakeLists.txt:

CMakeLists.txt
add_subdirectory(path/to/DocumentFilters/bindings/cpp17)

Integrating with Visual Studio projects (vcxproj-based)

To use Hyland Document Filters in a Visual Studio project, follow these steps:

Include the DocumentFiltersCpp17 project in your Solution

  • Right-click on your Solution in Solution Explorer, then select Add > Existing Project.

  • Navigate to the bindings/cpp17 directory and select the DocumentFiltersCpp17.vcxproj file.

  • In your project, right-click on the project name, then select *Add > Reference. Ensure that DocumentFiltersCpp17 is checked.

Include the Header Files

  • Go to Project > Properties > Configuration Properties > C/C++ > General > Additional Include Directories.

  • Add the path to the bindings/cpp17/include and bindings/c/include directories, as follows:

    c:\path\to\DocumentFilters\bindings\cpp17\include;c:\path\to\DocumentFilters\bindings\c\include;
    

    Note

    You need to include both the Cpp and C include directories in your project.

  • No additional steps are required to manually link the ISYS11df library, as it is already a dependency of the DocumentFiltersCpp17 project.

  • However, ensure that the ISYS11df.dll file is accessible at runtime. You can either copy it to your project output directory or make sure it is available in your system’s PATH.

Integrating with Makefiles projects

If you're using a manual build system, such as Make with a Makefile, you will need to manually specify the include paths and link the required libraries.

You can modify your Makefile to compile the source files and link the required libraries as specified in the CMake script. Below is an example Makefile setup:

Makefile
CXX = g++
CXXFLAGS = -std=c++17 -Ipath/to/DocumentFilters/bindings/c/include -Ipath/to/DocumentFilters/bindings/cpp17/include
LDFLAGS = -Lpath/to/DocumentFiltersBinaries -lISYS11df
SOURCES = src/DocumentFiltersObjects.cpp src/DocFiltersAnnotations.cpp src/DocFiltersBookmark.cpp \
          src/DocFiltersCanvas.cpp src/DocFiltersCommon.cpp src/DocFiltersCompareDocumentSettings.cpp \
          src/DocFiltersCompareDocumentSource.cpp src/DocFiltersCompareResultDifference.cpp \
          src/DocFiltersCompareResultDifferenceDetail.cpp src/DocFiltersCompareResults.cpp \
          src/DocFiltersCompareSettings.cpp src/DocFiltersDateTime.cpp src/DocFiltersExtractor.cpp \
          src/DocFiltersFormat.cpp src/DocFiltersFormElement.cpp src/DocFiltersHyperlink.cpp \
          src/DocFiltersOption.cpp src/DocFiltersPage.cpp src/DocFiltersPageElement.cpp \
          src/DocFiltersPagePixels.cpp src/DocFiltersRenderPageProperties.cpp src/DocFiltersStreams.cpp \
          src/DocFiltersStrings.cpp src/DocFiltersSubFile.cpp src/DocFiltersWord.cpp
HEADERS = include/DocumentFiltersObjects.h

all: my_cpp_app

# Rule to build the DocumentFilters library
libDocumentFilters.a: $(SOURCES) $(HEADERS)
    $(CXX) $(CXXFLAGS) -c $(SOURCES)
    ar rcs libDocumentFilters.a *.o

# Link the library to the main application
my_cpp_app: main.o libDocumentFilters.a
    $(CXX) -o my_cpp_app main.o libDocumentFilters.a $(LDFLAGS)

main.o: main.cpp
    $(CXX) $(CXXFLAGS) -c main.cpp

clean:
    rm -f *.o my_cpp_app libDocumentFilters.a

Compile the Application

After setting up the Makefile, you can build your application by running:

bash
make

With this approach, you'll need to manually download the Document Filters Release Binaries and specify their location as a link library directory, referred to above as path/to/DocumentFiltersBinaries.

Initializing and calling Document Filters

After setting up the necessary includes and linking the required libraries, the next step is to initialize the Document Filters library. This initialization is essential, as it prepares the library to process documents effectively. Below is a sample code snippet demonstrating how to perform this initialization in your Cpp application.

#include <iostream>
#include <memory>
#include <DocumentFiltersObjects.h>

static const char* LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";

int main() {
    try { 
        Hyland::DocFilters::Api api;
        api.Initialize(LICENSE_KEY, ".");
    } catch (const std::exception& error) {
        std::cout << "Initialization error: " << error.what() << std::endl;
        return 1;
    }

    std::cout << "Document Filters initialized successfully!" << std::endl;
    return 0;
}

Explanation:

  • License Key: The LICENSE_KEY is provided as part of the initialization process and is required to unlock the full functionality of the Document Filters library. Replace "YOUR_LICENSE_KEY_HERE" with your actual license key.

  • Resource Directory: The second parameter to Initialize, ".", specifies the directory where Document Filters should look for resources such as configuration files and fonts. In this case, "." means it will search the current working directory where the shared libraries reside. You can adjust this path based on your setup.

  • Error Handling: The try-catch block captures any exceptions thrown during initialization and outputs the error message. This ensures you can handle initialization failures gracefully.

The DocumentFilters objects are now initialized and ready to be used for various document processing tasks, such as extraction, rendering, or comparison.

Extracting Text

Once the Document Filters library is initialized, you can begin extracting text from documents. The following C++ code snippet demonstrates how to load a document and extract its contents using the Document Filters API. This example focuses on extracting text from a Word document (.doc file).

#include <iostream>
#include <memory>
#include <DocumentFiltersObjects.h>

static const char* LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";

int main() {
    try {
        Hyland::DocFilters::Api api;
        api.Initialize(LICENSE_KEY, ".");

        // Load the document for text extraction
        auto doc = api.GetExtractor(L"filename.doc");

        // Open the document for both body text and metadata extraction
        doc.Open(IGR_BODY_AND_META);

        // Loop through the document until EOF and extract text
        while (!doc->getEOF()) {
            auto text = doc.GetText();
            std::wcout << text << std::endl;
        }

    } catch (const std::exception& error) {
        std::cout << "Error: " << error.what() << std::endl;
        return 1;
    }

    return 0;
}

Explanation:

  • Document Loading: The document is loaded using api.GetExtractor, which returns an object for extracting content. In this case, "filename.doc" is the input Word document. Update the file path accordingly.

  • Opening the Document: The doc.Open(IGR_BODY_AND_META) method is called to open the document for text extraction. The IGR_BODY_AND_META flag specifies that both the document body and metadata should be extracted.

  • Text Extraction: The code enters a loop, repeatedly calling doc.GetText() until the getEOF() method indicates the end of the document. The extracted text is output using std::wcout to handle wide characters.

  • Automatic Resource Management: Since the document object is automatically managed, it will be closed when it goes out of scope, ensuring proper cleanup without manual intervention.

This approach efficiently extracts text from documents while managing resources automatically.

Converting a Document

After initializing the Document Filters library, you can convert documents into different formats, such as PDF. The following C++ code snippet demonstrates how to load a Word document (.doc file) and convert it into a PDF using the Document Filters API.

#include <iostream>
#include <memory>
#include <DocumentFiltersObjects.h>

static const char* LICENSE_KEY = "YOUR_LICENSE_KEY_HERE";

int main() {
    try {
        Hyland::DocFilters::Api api;
        api.Initialize(LICENSE_KEY, ".");

        // Load the document for conversion
        auto doc = api.GetExtractor(L"filename.doc");

        // Create an output canvas for PDF rendering
        auto canvas = api.MakeOutputCanvas(L"result.pdf", IGR_DEVICE_IMAGE_PDF, "");

        // Open the document in image format for rendering
        doc.Open(IGR_FORMAT_IMAGE);

        // Loop through the document pages and render each one to the PDF canvas
        for (auto&& page : doc.pages()) {
            canvas.RenderPage(page);
        }

    } catch (const std::exception& error) {
        std::cout << "Error: " << error.what() << std::endl;
        return 1;
    }

    return 0;
}

Explanation:

  • Document Loading: The document is loaded using api.GetExtractor with the path "filename.doc". Replace this with your actual file path.

  • Canvas Creation: A PDF canvas is created with the MakeOutputCanvas function, specifying the output file as "result.pdf". The IGR_DEVICE_IMAGE_PDF flag is used to indicate that the output format is PDF.

  • Opening the Document: The document is opened using doc.Open(IGR_FORMAT_IMAGE), which prepares the document for rendering as an image-based format (used for PDF conversion).

  • Page Rendering: The code iterates through each page of the document using a range-based loop. For each page, the canvas.RenderPage(page) method is called to render it onto the PDF canvas.

  • Automatic Resource Management: Both the doc and canvas objects are automatically managed and will be closed when they go out of scope, ensuring proper cleanup.

This approach efficiently converts documents to PDF format while handling resources automatically.