Skip to main content

Using the Data Curation API

Prerequisites

The following information is required to call the Data Curation API:

  • Token endpoint for your OAuth instance
  • Client Id and Client Secret for authentication
  • API endpoint for your Data Curation API instance

Using the API

You can call the Data Curation API using one of the following methods:

Using the Upload Script

You must have Node.JS installed on your workstation to execute your upload script.

See Upload Script for more information on how to configure and use your upload script, including a sample upload.js script to help you get started.

Using an HTTP Request Tool

Once you have acquired and used an access token (see Authentication), you can start making calls to the Data Curation API to upload files to an AWS S3 bucket for your pipeline and download the results. For full reference information on the Data Curation API's endpoints, see Endpoints.

tip

Use an HTTP request tool such as Bruno to simplify the process of making the sequential API calls.

To make calls to the Data Curation API:

  1. Open your preferred HTTP request tool.

  2. Enter a request similar to the following example to create presigned URLs for uploading files to an AWS S3 bucket and retrieving the results:

    POST <base_url>/api/data-curation/presign HTTP/1.1
    Host: knowledge-enrichment.ai.experience.hyland.com
    Content-Type: application/json
    Accept: application/json
    Authorization: Bearer <access_token>
    Content-Length: 138

    where the placeholders represent the following:

    PlaceholderDescription
    <base_url>The URL of your host environment.
    <access_token>The access token you retrieved when you completed authentication.

    To fine-tune your request, you can specify additional parameters in the request body. If you don't specify additional parameters, the following default values are used:

    {
    "normalization": {
    "quotations": true,
    "dashes": true
    },
    "chunking": true,
    "chunk_size": 1000,
    "embedding": true,
    "json_schema": false
    }

    A response similar to the following example is displayed:

    {
    "job_id": "6e1bb8a0-2bc3-43a2-b3a6-e87975799c8d",
    "put_url": "https://data-curation-api-dev-step-1-drop.s3.amazonaws.com/ABCXYZ",
    "get_url": "https://data-curation-api-dev-step-3-results.s3.amazonaws.com/ABCXYZ"
    }

    For more information about Presigned URLs, see the Amazon Documentation.

  3. Create a PUT request to the put_url you just created to upload a file to the AWS S3 bucket.

    When creating the request to upload a file, also note the following:

    • The content type header must be application/octet-stream.
    • Bruno does not support sending files directly through HTTP requests, but you can still send files using a script.

    If you are uploading a file using a script, you can use the following example as a starting point:

    const fs = require("fs");

    const attachmentFilename = "C:\\path\\to\\file.pdf";
    const attachment = fs.readFileSync(attachmentFilename);
    const attachmentLength = attachment.length;

    req.setHeader("Content-Type", "application/octet-stream");
    req.setHeader("Content-Length", attachmentLength);

    req.setBody(attachment);
  4. To check the status of the file you uploaded, use the job_id from the earlier response to enter a request similar to the following example:

    GET <base_url>/api/data-curation/status/<jobId> HTTP/1.1
    Host: knowledge-enrichment.ai.experience.hyland.com
    Accept: text/json
    Authorization: Bearer <access_token>

    A response similar to the following example is displayed:

    {
    "jobId": "6e1bb8a0-2bc3-43a2-b3a6-e87975799c8d",
    "status": "Done"
    }
    note

    The initial status for a file is Wait For Upload, while the finished status is Done.

  5. Create a GET request to the get_url you created earlier to download the results from the AWS S3 bucket.

    If the results are available, a JSON response with text from the file similar to the following example is displayed:

    {
    "markdown": {
    "output": "Document Text",
    "chunks_with_embeddings": [
    {
    "chunk": "Chunk Text",
    "embeddings": [
    -0.042955305427312851, 0.077558189630508423, 0.0026660626754164696
    ]
    }
    ]
    },
    "json": {
    "output": {
    "type": "root",
    "children": [
    {
    "type": "paragraph",
    "children": [
    {
    "type": "text",
    "value": "Document Text"
    }
    ]
    }
    ]
    }
    }
    }
    note

    The example response includes extra line breaks for readability.

    If the file is not processed successfully, an error response similar to the following example is displayed:

    {
    "message": "Error: The file was not supported, corrupt, or blank."
    }