Exploring the Document Comparison APIs
The release of Hyland Document Filters 24.2 marks a significant milestone with the introduction of powerful Document Comparison APIs. These new features are designed to enhance the ability of developers to implement robust document comparison capabilities within their applications, facilitating the identification and management of changes across various document types.
Key Features of the Document Comparison APIs
The Document Comparison APIs in Hyland Document Filters 24.2 offer several advanced features that cater to the diverse needs of document management and processing. Here are the highlights:
Text Content Comparison
The APIs enable precise comparison of text content between two documents. This feature supports word-level granularity, allowing users to detect and highlight insertions, deletions, and modifications with accuracy. Whether you're comparing contract versions or tracking changes in a collaborative document, this functionality ensures that no alteration goes unnoticed.
Reading Order Analysis
Maintaining the logical flow of content is crucial in document comparison. The APIs analyze the reading order to ensure that the comparison respects the intended sequence of information. This is particularly useful for complex documents like reports and academic papers where the order of content significantly impacts comprehension.
Independence from File Types
One of the standout features of these APIs is their ability to compare documents regardless of their file types. Hyland Document Filters support a wide range of formats, including DOCX, PDF, HTML, and more. This flexibility means you can compare a Word document with a PDF file seamlessly, making it easier to manage diverse document types within your workflow.
Formatting and Pagination Considerations
The APIs are equipped to handle differences in formatting and pagination. Users can choose to include or exclude headers, footers, and fields in the comparison, providing control over which elements are considered. This level of customization ensures that the comparison results are tailored to your specific needs, whether you're focusing on content changes or layout adjustments.
Benefits of Using Hyland Document Filters for Document Comparison
Accuracy and Reliability
Hyland Document Filters are renowned for their precision and reliability. The Document Comparison APIs leverage this robust foundation to deliver accurate comparison results, ensuring that even the smallest changes are detected and reported.
Scalability
Whether you're comparing a handful of documents or processing thousands daily, the APIs are designed to scale with your needs. Their performance and efficiency make them suitable for enterprise-level applications.
Integration Flexibility
The APIs are designed to be easily integrated into various applications, whether desktop, web, or mobile. This flexibility allows you to incorporate document comparison capabilities into your existing workflows with minimal disruption.
Implementing Document Comparison
Integrating the Document Comparison APIs into your application is straightforward. Here's a high-level overview of how you can get started:
- Initialize the API: Begin by setting up the Document Filters environment and initializing the Document Comparison API.
Hyland.DocumentFilters.Api api = new Hyland.DocumentFilters.Api();
api.Initialize("License Code", ".")
- Load the Documents: Load the documents you wish to compare. The API supports various input formats, making it easy to work with different document types.
Extractor doc1 = api.OpenExtractor("path/to/document1.docx", OpenMode.Paginated);
Extractor doc2 = api.OpenExtractor("path/to/document2.pdf", OpenMode.Paginated);
- Perform the Comparison: Execute the comparison operation and retrieve the results. The API provides detailed output, highlighting the differences between the documents.
using (var compare = doc1.Compare(doc2)) {
// Work with compare results
}
- Process the Results: Utilize the comparison results within your application. You can display the differences, generate reports, or trigger further processing based on the detected changes.
while (compare.MoveNext())
{
var diff = compare.Current;
// Work with diff...
}
As you enumerate over the items in a compare result, the diffs will be one of the following types:
- Equals - These diff items are not included by default but can be enabled with a flag.
- Insert - Text exists in the new/revised document but not in the old/original document.
- Delete - Text exists in the old/original document but not in the new/revised document.
Using this information, we can create side-by-side comparison views that show the edits to both the original and revised documents.

To produce the marked-up version, we can use the Document Filter's Canvas object.
While enumerating the diffs, you may encounter situations where some pages do not have differences. Additionally, as content is added or removed, the page indexes of the original and revised documents can get out of sync.
Handling scenarios where pages do not have differences or where page indexes get out of sync due to content changes requires a careful approach. Here’s how we can manage this:
- Create Two Output Canvases: One for the original document and one for the revised document.
- Track the Current Page: Maintain a record of the current page being processed.
- Ensure Page Rendering: As we enumerate the diffs, ensure that all pages up to the "current page" have been rendered.
- Mark Up Diffs: Proceed to mark up the differences on the current page once rendering is confirmed.
This method ensures that the comparison view remains accurate and consistent, regardless of the number of differences or content shifts between the original and revised documents.
private void ensurePage(int pageIndex, Extractor doc, Canvas canvas, ref int currentPage, bool isLeft)
{
while (currentPage < pageIndex)
{
using var page = doc.GetPage(++currentPage);
canvas.RenderPage(page);
}
canvas.SetBrush(isLeft ? 0x50ff0000 : 0x5000ff00, 1);
canvas.SetPen(0, 0, 0);
}
The ensurePage function takes the pageIndex of the page being rendered, the Extractor containing either the original or revised document, the destination Canvas, the page index of the last page rendered (i.e., the current page on the canvas), and a boolean indicating whether the document is the left document (i.e., the original).
The function renders all pages up to pageIndex and then sets the brush to be either red or green with slight transparency, depending on whether the document is the original.
using Canvas leftOutputCanvas = api.MakeOutputCanvas(LeftOutput, CanvasType.PDF);
using Canvas rightOutputCanvas = _api.MakeOutputCanvas(RightOutput, CanvasType.PDF);
int currentLeft = -1, currentRight = -1;
while (compareResult.MoveNext())
{
if (compareResult.Current.Type == DifferenceType.Equal || compareResult.Current.Type == DifferenceType.NextBatch)
continue;
foreach (var hit in compareResult.Current.Details)
{
if (compareResult.Current.Type == DifferenceType.Delete)
{
ensurePage(hit.PageIndex, left, leftOutputCanvas, ref currentLeft, true);
leftOutputCanvas.Rect(FromRectF(hit.Bounds));
}
if (compareResult.Current.Type == DifferenceType.Insert)
{
ensurePage(hit.PageIndex, right, rightOutputCanvas, ref currentRight, false);
rightOutputCanvas.Rect(FromRectF(hit.Bounds));
}
}
}
ensurePage(left.PageCount, left, leftOutputCanvas, currentLeft, true);
ensurePage(right.PageCount, right, rightOutputCanvas, currentRight, true);
Upon completion, you will have two PDFs. The LeftOutput
will contain every
page from the original document, marked up with text that does not exist in the
revised document. Similarly, the RightOutput
will contain every page from the
revised document, marked up with text that does not exist in the original
document.
If you are building a viewing component, you may also want to keep track of the
hit.PageIndex
properties to provide page synchronization as the user navigates
between the differences.
Conclusion
The introduction of Document Comparison APIs in Hyland Document Filters 24.2 is a game-changer for developers and organizations looking to streamline their document management processes. With features like text content comparison, reading order analysis, and independence from file types, these APIs offer a comprehensive solution for detecting and managing document changes. By integrating these powerful tools into your applications, you can enhance your document comparison capabilities and ensure the integrity and accuracy of your document workflows.