About ALE Learnset Manager

The ALE Learnset Manager provides RESTful web services for managing field extractors, batch stream sets, and configuration. It enables applications to train, learn, and extract data from documents using machine learning capabilities.

The ALE server exposes its services via a REST API which can be consumed by applications that are coded in standard development languages, such as Java, C++, or C#, and that are also compatible with HTTP/HTTPS based Web services, to send and receive data from ALE Server.

However this documentation doesn't always link to the return types that are used for requests. The types are mentioned in the description - please check the Data Model section for details on the referenced type.

How-to

Connecting to the Server

In order to connect to the ALE Server, append ALM/service to the server base URL. For example the base address to communicate with the ALE server will look similar to:

   "http://{serverName}:{PortNumber}/ALM/service"

Authentication & Authorization

The ALE Server uses session-based authentication with role-based access control. Most management endpoints (Configuration Service, LearnSet Manager Service, Scheduler Service) require the admin role. The core Field Extractor and Batch Stream Set services are accessible to all authenticated users.

Field Extractor Service

The Field Extractor Service (/extractor) provides operations for creating, managing, and using field extractors. Field extractors are used to train and extract structured data from documents. GZIP compression is enabled on this service.

Creating a Field Extractor

To create a new field extractor, use the method PUT /extractor. Optional query parameters: id, useUTF8 (boolean), usePositionalInformationForClassification (boolean), and persistent (boolean). Returns the extractor ID as a string.

  RequestCreateExtractor.Address = "http://{serverName}:{PortNumber}/ALM/service/extractor";
  Response createResponse = client.SendPut(RequestCreateExtractor);

Checking and Deleting Extractors

To check if an extractor exists, use HEAD /extractor/{id}. To delete an extractor, use DELETE /extractor/{id}.

Declaring Fields

To declare fields for a field extractor, use the method POST /extractor/{id}/fields where {id} is the identifier of the field extractor. A list of FieldDeclaration objects is required. To retrieve declared fields, use GET /extractor/{id}/fields.

  RequestDeclareFields.Address = "http://{serverName}:{PortNumber}/ALM/service/extractor/{id}/fields";
  RequestDeclareFields.Content = FieldDeclarations.toJson();
  Response declareResponse = client.SendPost(RequestDeclareFields);

Training and Learning

After declaring fields, the extractor can be trained with document data:

Learn from Batch Stream Set: GET /extractor/{id}/streamset/{streamSetId}/learn â€” trains from a batch stream set. Optional parameter relearnable (boolean). Returns the class count.
Learn from Documents: POST /extractor/{id}/learn â€” trains from a list of DocumentAdapter objects. Returns the class count.
Relearn: POST /extractor/{id}/relearn â€” relearns a class from documents.

Field Targeting

To locate a specific field value within a document, use POST /extractor/{id}/fieldtargets with query parameters fieldValue, fieldType, and a DocumentAdapter as the request body. Returns a list of WordInfo groups.

Extractor File Management

The trained extractor model can be downloaded and uploaded as a binary file:

Download: GET /extractor/{id}/file/extractor â€” downloads the extractor file.
Check: HEAD /extractor/{id}/file/extractor â€” checks if the extractor file exists.
Upload: POST /extractor/{id}/file/extractor â€” uploads an extractor file (multipart attachment).

Extracting Data

To extract data from a document, use the extraction endpoint with a trained field extractor. The service returns ExtractedData objects with extracted field values and confidence scores.

Extract from Batch: GET /extractor/{id}/streamset/{streamSetId}/extract/{docNum}
Extract from Document (JSON): POST /extractor/{id}/extract with a DocumentAdapter body.
Extract from .pos File: POST /extractor/{id}/extract with a .pos file attachment.

Extractor Settings

Positional Information: GET /extractor/{id}/usePositionalInformation and PUT /extractor/{id}/usePositionalInformation
UTF-8 Mode: GET /extractor/{id}/utf8 and PUT /extractor/{id}/utf8

Batch Stream Set Service

The Batch Stream Set Service (/streamset) manages stream sets for batch processing of documents.

Creating and Managing Batch Stream Sets

To create a new batch stream set, use the method PUT /streamset. Optional query parameters: id and persistent (boolean). Returns the stream set ID.

  RequestCreateStreamSet.Address = "http://{serverName}:{PortNumber}/ALM/service/streamset";
  Response createResponse = client.SendPut(RequestCreateStreamSet);

To check if a stream set exists, use HEAD /streamset/{id}. To delete a stream set, use DELETE /streamset/{id}.

File Operations

Stream sets support uploading, downloading, and checking of PTB, CBM, and ZIP files:

PTB Files: Upload with POST /streamset/{id}/file/ptb, download with GET /streamset/{id}/file/ptb, check with HEAD /streamset/{id}/file/ptb
CBM Files: Upload with POST /streamset/{id}/file/cbm, download with GET /streamset/{id}/file/cbm, check with HEAD /streamset/{id}/file/cbm
ZIP Files: Download with GET /streamset/{id}/file/zip, check with HEAD /streamset/{id}/file/zip

Adding Documents

To add documents to a stream set, use POST /streamset/{id}/document. Accepts either a DocumentAdapter JSON body or multipart .pos/.ival file attachments. Returns a boolean indicating success.

Field Declarations

To retrieve field declarations for a stream set, use GET /streamset/{id}/fields. To set field declarations, use POST /streamset/{id}/fields with a list of FieldDeclaration objects.

Stream Set Settings

UTF-8 Mode: GET /streamset/{id}/utf8 and PUT /streamset/{id}/utf8

Configuration Service

The Configuration Service provides endpoints for managing server configuration, including authorized user management.

Authorized Users

To retrieve the list of authorized users, use GET /config/authorizedUsers. To update the list, use POST /config/authorizedUsers. To check whether the authorized users list is editable, use GET /config/authorizedUsers/editable.

LearnSet Manager Service

The LearnSet Manager Service (/learnset) manages training projects, document classes, training documents, and field learning. All endpoints require the admin role unless otherwise noted.

Project Management

Projects are the top-level containers for organizing document classes and training data.

Create Project: PUT /learnset/project â€” creates a new project with optional parameters: name, useUTF8, usePositionalInformationForClassification, useClassification, threshold, distance. Returns the project ID.
Update Project: PUT /learnset/project/{projectId}
List Projects: GET /learnset/project â€” returns a list of all projects.
Get Project: GET /learnset/project/{projectId} â€” returns the project details.
Delete Project: DELETE /learnset/project/{projectId}
Check Learnable: GET /learnset/project/{projectId}/isLearnable â€” checks if the project has enough data to learn.
Learn Project: GET /learnset/project/{projectId}/learn â€” trains the project. Returns the class count.
Update Stream Set: HEAD /learnset/project/{projectId}/updateStreamSet â€” creates or updates the batch stream set for the project.
Count Documents: GET /learnset/project/{projectId}/numDocs â€” returns the total number of training documents.
Delete Oldest Docs: DELETE /learnset/project/{projectId}/doc/oldest/{count} â€” deletes the oldest training documents.

Project Fields

Get Fields: GET /learnset/project/{projectId}/fields â€” returns the field declarations for the project.
Set Fields: POST /learnset/project/{projectId}/fields â€” sets the field declarations.
Field Statistics: GET /learnset/project/{projectId}/fields/statistics â€” returns field value statistics.
Class Fields: GET /learnset/project/{projectId}/classfields â€” returns class-specific field declarations.

Document Class Management

Document classes organize training documents into categories within a project.

Create Class: PUT /learnset/project/{projectId}/class â€” creates a new document class. Returns the class ID.
Update Class: PUT /learnset/project/{projectId}/class/{classId}
List Classes: GET /learnset/project/{projectId}/class
Get Class: GET /learnset/project/{projectId}/class/{classId}
Delete Class: DELETE /learnset/project/{projectId}/class/{classId}
Set Class Fields: POST /learnset/project/{projectId}/class/{classId}/fields â€” sets visible fields for a class.
Get Class Fields: GET /learnset/project/{projectId}/class/{classId}/fields

Training Document Management

Training documents are used to teach the extractor to recognize and extract fields.

Add Document: POST /learnset/project/{projectId}/class/{classId}/doc/ â€” creates a training document. Returns the document ID.
List Documents: GET /learnset/project/{projectId}/class/{classId}/doc â€” returns metadata for all training documents in a class.
Get Document: GET /learnset/project/{projectId}/class/{classId}/doc/{docId} â€” returns the full document adapter.
Get Metadata: GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/meta
Delete Document: DELETE /learnset/project/{projectId}/class/{classId}/doc/{docId}
Move Document: PUT /learnset/project/{projectId}/class/{classId}/doc/{docId}/class â€” moves a document to a different class.
Delete Oldest: DELETE /learnset/project/{projectId}/class/{classId}/doc/oldest/{count}

Document Image & Position Data

Get Image: GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/image â€” returns the document image (binary).
Update Image: POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/image
Get .pos File: GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/pos â€” returns positional data.
Update .pos File: POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/pos

Document Field Values

Get Fields: GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/fields â€” returns field values for a document.
Set Fields: POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/fields

Field Targeting & Validation

These endpoints help verify field locations and validate training data quality.

Locate Field Value: GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/value â€” locates a field value in a document.
Get All Field Locations: GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/fields
Check Field Targets (Doc): GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/check/fieldtargets
Check Extraction (Doc): GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/check/extraction
Check All (Doc): GET /learnset/project/{projectId}/class/{classId}/doc/{docId}/check
Check Field Targets (Project): GET /learnset/project/{projectId}/check/fieldtargets
Check Extraction (Project): GET /learnset/project/{projectId}/check/extraction
Check All (Project): GET /learnset/project/{projectId}/check

Training Set Import & Export

Upload Training Set: POST /learnset/project/{projectId}/docs â€” uploads a training set from a ZIP file.
Download Training Set: GET /learnset/project/{projectId}/docs â€” downloads the training set as a ZIP file.
Start Async Upload: POST /learnset/project/{projectId}/upload â€” starts an asynchronous upload. Returns the upload ID.
Check Upload Status: GET /learnset/project/{projectId}/upload/{uploadId}
Abort Upload: DELETE /learnset/project/{projectId}/upload/{uploadId}

Test Document Management

Test sets allow validating extractor accuracy against a separate set of documents.

Create Test Set (Files): POST /learnset/project/{projectId}/testset â€” creates a test set from uploaded files.
Create Test Set (Path): PUT /learnset/project/{projectId}/testset â€” creates a test set from a server path.
Add Documents: POST /learnset/project/{projectId}/testset/{testSetId}
Check Exists: HEAD /learnset/project/{projectId}/testset/{testSetId}
Delete Test Set: DELETE /learnset/project/{projectId}/testset/{testSetId}
List Test Documents: GET /learnset/project/{projectId}/testset/{testSetId}
Get Test Doc Metadata: GET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/meta
Get Test Doc Image: GET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/image
Get Test Doc .pos: GET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/pos
Extract from Test Doc: GET /learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/extract
Copy to Training Set: POST /learnset/project/{projectId}/class/{classId}/doc/{docId}/class â€” copies a test document into the training set.

OCR Availability

Check if OCR processing is available on the server: GET /learnset/ocr/available. This endpoint is accessible to all users.

LearnSet Scheduler Service

The LearnSet Scheduler Service manages scheduled learning tasks for projects. All endpoints require the admin role.

Start Scheduler: POST /learnset/projects/learn-scheduler/start â€” activates a learning scheduler with a LearnsetSchedulerProperties body specifying timeInterval, timeUnit (MINUTE, HOUR, DAY), startTimeOfDay (HH:mm:ss), and activateScheduling (Y/N).
Delete Scheduler: DELETE /learnset/projects/learn-scheduler/{schedulerId}
Get Scheduler: GET /learnset/projects/learn-scheduler/{schedulerName}
List Schedulers: GET /learnset/projects/learn-scheduler

Error Codes

Unauthorized requests without proper authentication receive 403 as the response code.
Attempts to access a resource that does not exist generates a 404 response code.
All other errors usually produce a 5xx response code.

Data Model Reference

DocumentAdapter

Represents a document with its words, pages, and field values.

Property	Type	Description
id	int	Document ID
fileName	string	File name where document is located
companyFieldValue	string	Company field value (used for classification)
words	list of WordInfo	List of words with positioning info
pages	list of PageInfo	List of pages
fields	list of FieldInfo	List of fields (used for learning only)

WordInfo

Represents a word in a document with its position.

Property	Type	Description
pageNumber	int	Page number
word	string	The word text
boundingBox	BoundingBox	Position information

PageInfo

Represents a page in a document.

Property	Type	Description
rotationAngle	float	Rotation angle (0â€“360 degrees)
rotationOrigin	int	Rotation origin: 0=none, 1=upper-left, 2=upper-right, 3=lower-right, 4=lower-left, 5=center
boundingBox	BoundingBox	Page bounding box

BoundingBox

Defines a rectangular area on a page.

Property	Type	Description
left	int	Left coordinate
top	int	Top coordinate
right	int	Right coordinate
bottom	int	Bottom coordinate

FieldDeclaration

Declares a field that the extractor should recognize and extract.

Property	Type	Description
fieldId	int	Unique field identifier
name	string	Field name
type	string	Field type: `int`, `amount`, `date`, `string`, or `phrase`
format	string	Optional format specification
constant	string	Optional constant value
required	boolean	Whether the field is required

FieldInfo

Represents a field value in a document, used for training.

Property	Type	Description
fieldId	int	Field ID (references FieldDeclaration)
value	string	Field value
pageNumber	int	Page number where the field is located
location	BoundingBox	Field location on the page

ExtractedData

Contains the extraction results for a single field, including candidate values sorted by confidence.

Property	Type	Description
confidence	float	Overall confidence score (0â€“1)
fieldId	int	Field ID
identifier	string	Field name
data	list of DataCell	Sorted candidate values (best first)

DataCell

A single extraction candidate with its value, position, and confidence.

Property	Type	Description
content	string	Extracted field value
wordIndex	int	Word index in the document
page	int	Page number
boundingBox	BoundingBox	Position of the extracted value
confidence	float	Confidence score (0â€“1)

Changelog

Current released version is ALE Learnset Manager 25.1

For more details on the version history, please refer to the product Release Notes - https://docs.hyland.com/r/Brainware/ALE-Learnset-Manager/25.1/ALE-Learnset-Manager-Release-Notes

The resources use a data model that is supported by a set of client-side libraries that are made available on the files and libraries page.

name	path	methods	description
BatchStreamSetService	`/streamset` `/streamset/{id}` `/streamset/{id}/document` `/streamset/{id}/fields` `/streamset/{id}/utf8` `/streamset/{id}/file/cbm` `/streamset/{id}/file/ptb` `/streamset/{id}/file/zip`	`PUT` `DELETE HEAD` `POST` `GET POST` `GET PUT` `GET HEAD POST` `GET HEAD POST` `GET HEAD`	Creation and management of batch stream sets.
FieldExtractorService	`/extractor` `/extractor/{id}` `/extractor/{id}/extract` `/extractor/{id}/fields` `/extractor/{id}/fieldtargets` `/extractor/{id}/learn` `/extractor/{id}/relearn` `/extractor/{id}/usePositionalInformation` `/extractor/{id}/utf8` `/extractor/{id}/file/extractor` `/extractor/{id}/streamset/{streamSetId}/learn` `/extractor/{id}/streamset/{streamSetId}/extract/{docNum}`	`PUT` `DELETE HEAD` `POST` `GET POST` `POST` `POST` `POST` `GET PUT` `GET PUT` `GET HEAD POST` `GET` `GET`	Training of field extractors and extraction of fields.
LearnSetManagerService	`/learnset/project` `/learnset/ocr/available` `/learnset/project/{projectId}` `/learnset/project/{projectId}/check` `/learnset/project/{projectId}/class` `/learnset/project/{projectId}/classfields` `/learnset/project/{projectId}/docs` `/learnset/project/{projectId}/fields` `/learnset/project/{projectId}/isLearnable` `/learnset/project/{projectId}/learn` `/learnset/project/{projectId}/numDocs` `/learnset/project/{projectId}/testset` `/learnset/project/{projectId}/updateStreamSet` `/learnset/project/{projectId}/upload` `/learnset/project/{projectId}/check/extraction` `/learnset/project/{projectId}/check/fieldtargets` `/learnset/project/{projectId}/class/{classId}` `/learnset/project/{projectId}/fields/statistics` `/learnset/project/{projectId}/testset/{testSetId}` `/learnset/project/{projectId}/upload/{uploadId}` `/learnset/project/{projectId}/class/{classId}/doc` `/learnset/project/{projectId}/class/{classId}/fields` `/learnset/project/{projectId}/doc/oldest/{noOfDocToBeDeleted}` `/learnset/project/{projectId}/class/{classId}/doc/{docId}` `/learnset/project/{projectId}/class/{classId}/doc/oldest/{noOfDocToBeDeleted}` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/check` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/class` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/fields` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/image` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/meta` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/pageCnt` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/pos` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/class` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/extract` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/image` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/meta` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/pos` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/check/extraction` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/check/fieldtargets` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/fields` `/learnset/project/{projectId}/class/{classId}/doc/{docId}/locations/value` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/locations/value` `/learnset/project/{projectId}/testset/{testSetId}/doc/{docId}/extract/forceLearning/{learn}`	`GET PUT` `GET` `DELETE GET PUT` `GET` `GET PUT` `GET` `GET POST` `GET POST` `GET` `GET` `GET` `POST PUT` `HEAD` `POST` `GET` `GET` `DELETE GET PUT` `GET` `DELETE GET HEAD POST` `DELETE GET` `GET POST` `GET POST` `DELETE` `DELETE GET` `DELETE` `GET` `PUT` `GET POST` `GET POST` `GET` `GET` `GET POST` `POST` `GET` `GET` `GET` `GET` `GET` `GET` `GET` `GET` `GET` `GET`
LearnsetNoTouchModeService	`/no-touch-mode/learnset/add-staging-data` `/no-touch-mode/learnset/project/doc`	`POST` `POST`
LearnsetSchedulerService	`/learnset/projects/learn-scheduler` `/learnset/projects/learn-scheduler/start` `/learnset/projects/learn-scheduler/{schedulerId}` `/learnset/projects/learn-scheduler/{schedulerName}`	`GET` `POST` `DELETE` `GET`

JSON

type	description
BoundingBox	Container to carry the positional information of word.
Candidate	Container to carry the candidate related properties for inserting into the staging table.
ClassFieldDeclaration	Container to carry the information of fields of the Document Class.
DataCell	Simple container to carry extracted data string for a single cell. Used by ExtractedData.
DocumentAdapter	Carrier for document information. When training an extractor, make sure to fill the word list, the page list and the field list. When extracting fields from a document you only need to fill the word list and page list.
DocumentClass	Information about a document class
DocumentHeaderField	Container to carry the corrected header fieldName and header fieldValue , as a part of request.
DocumentImportStatus	Container to carry the document import status information.
DocumentUploadStatus	Container to carry the uploading status information of the documents. It takes account of number of total and imported documents and their status
ExtendedClassFieldStatistics	Field statistics for a single class, covering how many values have been found at all for a field within that class and for how many of those values a target can be located.
ExtractedData	Contains the extraction result for a single field. There are usually multiple candidates which are provided as a list of DataCell.
FieldDeclaration	Declaration of a field that can be extracted.
FieldInfo	Container to carry field information.
FieldLocations	Container to carry the information of word locations respective to the field.
FieldStatistics	Basic field statistics, covering how many values exist for a given field in a project or class
LearnsetDocumentAddRequest	Container to carry the all the properties required to facilitate the request payload for adding documents from the staging tables to the learnset tables.
LearnsetSchedulerProperties	Container to carry the properties of a global scheduler.
PageInfo	Container to carry the page orientation and positional information.
Project	Container to carry the information of a Project created in the ALM application.
StagingDataAddRequest	Container to carry the all the properties required to facilitate the request payload for adding documents data to staging tables.
TmpALMDocument	Container to carry all the required properties for populating TMPALMDOCUMENT table's data.
TmpALMField	Container to carry all the required properties for populating TMPALMFIELDS table's data
TrainingDocumentIncident	Description of a failed plausibility check on a traing document
TrainingDocumentMetaData	Meta data about a stored training document
TrainingSetCheckResult	Result of a training set plausibility check, including found incidents and field statistics.
WordInfo	Container to carry word information.