Using the Data Curation API
Prerequisites
The following information is required to call the Data Curation API:
- Token endpoint for your OAuth instance
Client Id
andClient Secret
for authentication- API endpoint for your Data Curation API instance
Using the API
You can call the Data Curation API using one of the following methods:
- Using a Node.JS upload script (see Using the Upload Script).
- Sending HTTP requests directly (see Using an HTTP Request Tool).
Using the Upload Script
You must have Node.JS installed on your workstation to execute your upload script.
See Upload Script for more information on how to configure and use your upload script, including a sample upload.js
script to help you get started.
Using an HTTP Request Tool
Once you have acquired and used an access token (see Authentication), you can start making calls to the Data Curation API to upload files to an AWS S3 bucket for your pipeline and download the results. For full reference information on the Data Curation API's endpoints, see Endpoints.
Use an HTTP request tool such as Bruno to simplify the process of making the sequential API calls.
To make calls to the Data Curation API:
-
Open your preferred HTTP request tool.
-
Enter a request similar to the following example to create presigned URLs for uploading files to an AWS S3 bucket and retrieving the results:
POST <base_url>/api/data-curation/presign HTTP/1.1
Host: knowledge-enrichment.ai.experience.hyland.com
Content-Type: application/json
Accept: application/json
Authorization: Bearer <access_token>
Content-Length: 138where the placeholders represent the following:
Placeholder Description <base_url>
The URL of your host environment. <access_token>
The access token you retrieved when you completed authentication. To fine-tune your request, you can specify additional parameters in the request body. If you don't specify additional parameters, the following default values are used:
{
"normalization": {
"quotations": true,
"dashes": true
},
"chunking": true,
"chunk_size": 1000,
"embedding": true,
"json_schema": false
}A response similar to the following example is displayed:
{
"job_id": "6e1bb8a0-2bc3-43a2-b3a6-e87975799c8d",
"put_url": "https://data-curation-api-dev-step-1-drop.s3.amazonaws.com/ABCXYZ",
"get_url": "https://data-curation-api-dev-step-3-results.s3.amazonaws.com/ABCXYZ"
}For more information about Presigned URLs, see the Amazon Documentation.
-
Create a
PUT
request to theput_url
you just created to upload a file to the AWS S3 bucket.When creating the request to upload a file, also note the following:
- The content type header must be
application/octet-stream
. - Bruno does not support sending files directly through HTTP requests, but you can still send files using a script.
If you are uploading a file using a script, you can use the following example as a starting point:
const fs = require("fs");
const attachmentFilename = "C:\\path\\to\\file.pdf";
const attachment = fs.readFileSync(attachmentFilename);
const attachmentLength = attachment.length;
req.setHeader("Content-Type", "application/octet-stream");
req.setHeader("Content-Length", attachmentLength);
req.setBody(attachment); - The content type header must be
-
To check the status of the file you uploaded, use the
job_id
from the earlier response to enter a request similar to the following example:GET <base_url>/api/data-curation/status/<jobId> HTTP/1.1
Host: knowledge-enrichment.ai.experience.hyland.com
Accept: text/json
Authorization: Bearer <access_token>A response similar to the following example is displayed:
{
"jobId": "6e1bb8a0-2bc3-43a2-b3a6-e87975799c8d",
"status": "Done"
}noteThe initial status for a file is
Wait For Upload
, while the finished status isDone
. -
Create a
GET
request to theget_url
you created earlier to download the results from the AWS S3 bucket.If the results are available, a JSON response with text from the file similar to the following example is displayed:
{
"markdown": {
"output": "Document Text",
"chunks_with_embeddings": [
{
"chunk": "Chunk Text",
"embeddings": [
-0.042955305427312851, 0.077558189630508423, 0.0026660626754164696
]
}
]
},
"json": {
"output": {
"type": "root",
"children": [
{
"type": "paragraph",
"children": [
{
"type": "text",
"value": "Document Text"
}
]
}
]
}
}
}noteThe example response includes extra line breaks for readability.
If the file is not processed successfully, an error response similar to the following example is displayed:
{
"message": "Error: The file was not supported, corrupt, or blank."
}