The ALE Learnset Manager provides RESTful web services for managing field extractors, batch stream sets, and configuration. It enables applications to train, learn, and extract data from documents using machine learning capabilities.
The ALE server exposes its services via a REST API which can be consumed by applications that are coded in standard development languages, such as Java, C++, or C#, and that are also compatible with HTTP/HTTPS based Web services, to send and receive data from ALE Server.
However this documentation doesn't always link to the return types that are used for requests.
The types are mentioned in the description - please check the Data Model section
for details on the referenced type.
In order to connect to the ALE Server, append ALM/service to the server base URL. For example the base address to communicate with the ALE server will look similar to:
"http://{serverName}:{PortNumber}/ALM/service"
The ALE Server uses session-based authentication with role-based access control.
Most management endpoints (Configuration Service, LearnSet Manager Service, Scheduler Service)
require the admin role. The core Field Extractor and Batch Stream Set services
are accessible to all authenticated users.
The Field Extractor Service (/extractor) provides operations for creating, managing, and using field extractors.
Field extractors are used to train and extract structured data from documents.
GZIP compression is enabled on this service.
To create a new field extractor, use the method PUT /extractor.
Optional query parameters: id, useUTF8 (boolean),
usePositionalInformationForClassification (boolean), and persistent (boolean).
Returns the extractor ID as a string.
RequestCreateExtractor.Address = "http://{serverName}:{PortNumber}/ALM/service/extractor";
Response createResponse = client.SendPut(RequestCreateExtractor);
To check if an extractor exists, use HEAD /extractor/{id}.
To delete an extractor, use DELETE /extractor/{id}.
To declare fields for a field extractor, use the method POST /extractor/{id}/fields
where {id} is the identifier of the field extractor.
A list of FieldDeclaration objects is required.
To retrieve declared fields, use GET /extractor/{id}/fields.
RequestDeclareFields.Address = "http://{serverName}:{PortNumber}/ALM/service/extractor/{id}/fields";
RequestDeclareFields.Content = FieldDeclarations.toJson();
Response declareResponse = client.SendPost(RequestDeclareFields);
After declaring fields, the extractor can be trained with document data:
GET /extractor/{id}/streamset/{streamSetId}/learn — trains
from a batch stream set. Optional parameter relearnable (boolean). Returns the class count.POST /extractor/{id}/learn — trains from a list of
DocumentAdapter objects. Returns the class count.POST /extractor/{id}/relearn — relearns a class from documents.
To locate a specific field value within a document, use POST /extractor/{id}/fieldtargets
with query parameters fieldValue, fieldType, and a
DocumentAdapter as the request body.
Returns a list of WordInfo groups.
The trained extractor model can be downloaded and uploaded as a binary file:
GET /extractor/{id}/file/extractor — downloads the extractor file.HEAD /extractor/{id}/file/extractor — checks if the extractor file exists.POST /extractor/{id}/file/extractor — uploads an extractor file (multipart attachment).
To extract data from a document, use the extraction endpoint with a trained field extractor.
The service returns ExtractedData objects with extracted field values and confidence scores.
GET /extractor/{id}/streamset/{streamSetId}/extract/{docNum}POST /extractor/{id}/extract with a
DocumentAdapter body.POST /extractor/{id}/extract with a .pos file attachment.GET /extractor/{id}/usePositionalInformation and PUT /extractor/{id}/usePositionalInformationGET /extractor/{id}/utf8 and PUT /extractor/{id}/utf8
The Batch Stream Set Service (/streamset) manages stream sets for batch processing of documents.
To create a new batch stream set, use the method PUT /streamset.
Optional query parameters: id and persistent (boolean). Returns the stream set ID.
RequestCreateStreamSet.Address = "http://{serverName}:{PortNumber}/ALM/service/streamset";
Response createResponse = client.SendPut(RequestCreateStreamSet);
To check if a stream set exists, use HEAD /streamset/{id}.
To delete a stream set, use DELETE /streamset/{id}.
Stream sets support uploading, downloading, and checking of PTB, CBM, and ZIP files:
POST /streamset/{id}/file/ptb, download with GET /streamset/{id}/file/ptb, check with HEAD /streamset/{id}/file/ptbPOST /streamset/{id}/file/cbm, download with GET /streamset/{id}/file/cbm, check with HEAD /streamset/{id}/file/cbmGET /streamset/{id}/file/zip, check with HEAD /streamset/{id}/file/zip
To add documents to a stream set, use POST /streamset/{id}/document.
Accepts either a DocumentAdapter JSON body
or multipart .pos/.ival file attachments. Returns a boolean indicating success.
To retrieve field declarations for a stream set, use GET /streamset/{id}/fields.
To set field declarations, use POST /streamset/{id}/fields with a list of
FieldDeclaration objects.
GET /streamset/{id}/utf8 and PUT /streamset/{id}/utf8The Configuration Service provides endpoints for managing server configuration, including authorized user management.
To retrieve the list of authorized users, use GET /config/authorizedUsers.
To update the list, use POST /config/authorizedUsers.
To check whether the authorized users list is editable, use GET /config/authorizedUsers/editable.
The LearnSet Manager Service (/learnset) manages training projects, document classes,
training documents, and field learning. All endpoints require the admin role unless otherwise noted.
Projects are the top-level containers for organizing document classes and training data.
PUT /learnset/project — creates a new project with optional parameters:
name, useUTF8, usePositionalInformationForClassification,
useClassification, threshold, distance. Returns the project ID.PUT /learnset/project/{projectId}GET /learnset/project — returns a list of all projects.GET /learnset/project/{projectId} — returns the project details.DELETE /learnset/project/{projectId}GET /learnset/project/{projectId}/isLearnable — checks if the project has enough data to learn.GET /learnset/project/{projectId}/learn — trains the project. Returns the class count.HEAD /learnset/project/{projectId}/updateStreamSet — creates or updates the batch stream set for the project.GET /learnset/project/{projectId}/numDocs — returns the total number of training documents.DELETE /learnset/project/{projectId}/doc/oldest/{count} — deletes the oldest training documents.GET /learnset/project/{projectId}/fields — returns the field declarations for the project.POST /learnset/project/{projectId}/fields — sets the field declarations.GET /learnset/project/{projectId}/fields/statistics — returns field value statistics.GET /learnset/project/{projectId}/classfields — returns class-specific field declarations.Document classes organize training documents into categories within a project.
PUT /learnset/project/{projectId}/class — creates a new document class. Returns the class ID.PUT /learnset/project/{projectId}/class/{classId}GET /learnset/project/{projectId}/classGET /learnset/project/{projectId}/class/{classId}DELETE /learnset/project/{projectId}/class/{classId}POST /learnset/project/{projectId}/class/{classId}/fields — sets visible fields for a class.GET /learnset/project/{projectId}/class/{classId}/fieldsTraining documents are used to teach the extractor to recognize and extract fields.
POST /learnset/project/{projectId}/class/{classId}/doc/ — creates a training document. Returns the document ID.GET /learnset/project/{projectId}/class/{classId}/doc — returns metadata for all training documents in a class.GET /learnset/project/{projectId}/class/{classId}/doc/{docId} — returns the full document adapter.GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/metaDELETE /learnset/project/{projectId}/class/{classId}/doc/{docId}PUT /learnset/project/{projectId}/class/{classId}/doc/{docId}/class — moves a document to a different class.DELETE /learnset/project/{projectId}/class/{classId}/doc/oldest/{count}GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/image — returns the document image (binary).POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/imageGET /learnset/project/{projectId}/class/{classId}/doc/{docId}/pos — returns positional data.POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/posGET /learnset/project/{projectId}/class/{classId}/doc/{docId}/fields — returns field values for a document.POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/fieldsThese endpoints help verify field locations and validate training data quality.
GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/value — locates a field value in a document.GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/fieldsGET /learnset/project/{projectId}/class/{classId}/doc/{docId}/check/fieldtargetsGET /learnset/project/{projectId}/class/{classId}/doc/{docId}/check/extractionGET /learnset/project/{projectId}/class/{classId}/doc/{docId}/checkGET /learnset/project/{projectId}/check/fieldtargetsGET /learnset/project/{projectId}/check/extractionGET /learnset/project/{projectId}/checkPOST /learnset/project/{projectId}/docs — uploads a training set from a ZIP file.GET /learnset/project/{projectId}/docs — downloads the training set as a ZIP file.POST /learnset/project/{projectId}/upload — starts an asynchronous upload. Returns the upload ID.GET /learnset/project/{projectId}/upload/{uploadId}DELETE /learnset/project/{projectId}/upload/{uploadId}Test sets allow validating extractor accuracy against a separate set of documents.
POST /learnset/project/{projectId}/testset — creates a test set from uploaded files.PUT /learnset/project/{projectId}/testset — creates a test set from a server path.POST /learnset/project/{projectId}/testset/{testSetId}HEAD /learnset/project/{projectId}/testset/{testSetId}DELETE /learnset/project/{projectId}/testset/{testSetId}GET /learnset/project/{projectId}/testset/{testSetId}GET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/metaGET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/imageGET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/posGET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/extractPOST /learnset/project/{projectId}/class/{classId}/doc/{docId}/class — copies a test document into the training set.
Check if OCR processing is available on the server: GET /learnset/ocr/available.
This endpoint is accessible to all users.
The LearnSet Scheduler Service manages scheduled learning tasks for projects.
All endpoints require the admin role.
POST /learnset/projects/learn-scheduler/start — activates a learning scheduler with a
LearnsetSchedulerProperties body specifying timeInterval, timeUnit (MINUTE, HOUR, DAY),
startTimeOfDay (HH:mm:ss), and activateScheduling (Y/N).DELETE /learnset/projects/learn-scheduler/{schedulerId}GET /learnset/projects/learn-scheduler/{schedulerName}GET /learnset/projects/learn-scheduler
Unauthorized requests without proper authentication receive 403 as the response code.
Attempts to access a resource that does not exist generates a 404 response code.
All other errors usually produce a 5xx response code.
Represents a document with its words, pages, and field values.
| Property | Type | Description |
|---|---|---|
| id | int | Document ID |
| fileName | string | File name where document is located |
| companyFieldValue | string | Company field value (used for classification) |
| words | list of WordInfo | List of words with positioning info |
| pages | list of PageInfo | List of pages |
| fields | list of FieldInfo | List of fields (used for learning only) |
Represents a word in a document with its position.
| Property | Type | Description |
|---|---|---|
| pageNumber | int | Page number |
| word | string | The word text |
| boundingBox | BoundingBox | Position information |
Represents a page in a document.
| Property | Type | Description |
|---|---|---|
| rotationAngle | float | Rotation angle (0–360 degrees) |
| rotationOrigin | int | Rotation origin: 0=none, 1=upper-left, 2=upper-right, 3=lower-right, 4=lower-left, 5=center |
| boundingBox | BoundingBox | Page bounding box |
Defines a rectangular area on a page.
| Property | Type | Description |
|---|---|---|
| left | int | Left coordinate |
| top | int | Top coordinate |
| right | int | Right coordinate |
| bottom | int | Bottom coordinate |
Declares a field that the extractor should recognize and extract.
| Property | Type | Description |
|---|---|---|
| fieldId | int | Unique field identifier |
| name | string | Field name |
| type | string | Field type: int, amount, date, string, or phrase |
| format | string | Optional format specification |
| constant | string | Optional constant value |
| required | boolean | Whether the field is required |
Represents a field value in a document, used for training.
| Property | Type | Description |
|---|---|---|
| fieldId | int | Field ID (references FieldDeclaration) |
| value | string | Field value |
| pageNumber | int | Page number where the field is located |
| location | BoundingBox | Field location on the page |
Contains the extraction results for a single field, including candidate values sorted by confidence.
| Property | Type | Description |
|---|---|---|
| confidence | float | Overall confidence score (0–1) |
| fieldId | int | Field ID |
| identifier | string | Field name |
| data | list of DataCell | Sorted candidate values (best first) |
A single extraction candidate with its value, position, and confidence.
| Property | Type | Description |
|---|---|---|
| content | string | Extracted field value |
| wordIndex | int | Word index in the document |
| page | int | Page number |
| boundingBox | BoundingBox | Position of the extracted value |
| confidence | float | Confidence score (0–1) |
Current released version is ALE Learnset Manager 25.1
For more details on the version history, please refer to the product Release Notes - https://docs.hyland.com/r/Brainware/ALE-Learnset-Manager/25.1/ALE-Learnset-Manager-Release-Notes
The resources use a data model that is supported by a set of client-side libraries that are made available on the files and libraries page.
| name | path | methods | description |
|---|---|---|---|
| BatchStreamSetService |
|
|
Creation and management of batch stream sets. |
| FieldExtractorService |
|
|
Training of field extractors and extraction of fields. |
| LearnSetManagerService |
|
|
|
| LearnsetNoTouchModeService |
|
|
|
| LearnsetSchedulerService |
|
|
| type | description |
|---|---|
| BoundingBox | Container to carry the positional information of word. |
| Candidate | Container to carry the candidate related properties for inserting into the staging table. |
| ClassFieldDeclaration | Container to carry the information of fields of the Document Class. |
| DataCell | Simple container to carry extracted data string for a single cell. Used by ExtractedData. |
| DocumentAdapter | Carrier for document information.
When training an extractor, make sure to fill the word list, the page list and the field list. When extracting fields from a document you only need to fill the word list and page list. |
| DocumentClass | Information about a document class |
| DocumentHeaderField | Container to carry the corrected header fieldName and header fieldValue , as a part of request. |
| DocumentImportStatus | Container to carry the document import status information. |
| DocumentUploadStatus | Container to carry the uploading status information of the documents. It takes account of number of total and imported documents and their status |
| ExtendedClassFieldStatistics | Field statistics for a single class, covering how many values have been found at all for a field within that class and for how many of those values a target can be located. |
| ExtractedData | Contains the extraction result for a single field. There are usually multiple candidates which are provided as a list of DataCell. |
| FieldDeclaration | Declaration of a field that can be extracted. |
| FieldInfo | Container to carry field information. |
| FieldLocations | Container to carry the information of word locations respective to the field. |
| FieldStatistics | Basic field statistics, covering how many values exist for a given field in a project or class |
| LearnsetDocumentAddRequest | Container to carry the all the properties required to facilitate the request payload for adding documents from the staging tables to the learnset tables. |
| LearnsetSchedulerProperties | Container to carry the properties of a global scheduler. |
| PageInfo | Container to carry the page orientation and positional information. |
| Project | Container to carry the information of a Project created in the ALM application. |
| StagingDataAddRequest | Container to carry the all the properties required to facilitate the request payload for adding documents data to staging tables. |
| TmpALMDocument | Container to carry all the required properties for populating TMPALMDOCUMENT table's data. |
| TmpALMField | Container to carry all the required properties for populating TMPALMFIELDS table's data |
| TrainingDocumentIncident | Description of a failed plausibility check on a traing document |
| TrainingDocumentMetaData | Meta data about a stored training document |
| TrainingSetCheckResult | Result of a training set plausibility check, including found incidents and field statistics. |
| WordInfo | Container to carry word information. |