Processing Options
All pipeline behavior can be customized by supplying a JSON options object in the body of the /presign request. Every field is optional; omitting a field falls back to the environment-level default.
POST /presign
Content-Type: application/json
Authorization: Bearer <token>
Reference
The following sections detail the processing options of the /presign request body.
normalization
Type: object | Default: environment-configured
Controls Unicode normalization applied to extracted text before further processing. Normalization replaces visually similar characters with their canonical ASCII equivalents, which improves downstream chunking, search, and embedding quality.
| Field | Type | Default | Description |
|---|---|---|---|
quotations | boolean | true | Replaces "smart" (curly) quotation marks and apostrophes with their straight, ASCII equivalents (" and '). |
dashes | boolean | true | Replaces en-dashes (–) and em-dashes (—) with a standard ASCII hyphen-minus (-). |
chunking
Type: boolean | Default: false
Enables or disables the chunking stage of the pipeline. When false, the extracted text is returned as a single block and the embedding stage is skipped regardless of the embedding flag.
chunking_strategy
Type: string | Default: context
The algorithm used to split text into chunks when chunking is true.
| Value | Description |
|---|---|
context | Text-aware chunking that respects sentence and paragraph boundaries, producing semantically coherent chunks. |
fixed | Fixed-size chunking that splits text into uniform-sized chunks. Use when consistent chunk sizes are required. |
chunk_size
Type: integer
Target character count for each chunk when chunking is true. Must be a positive integer no greater than the selected embedding model's maximum chunk size. Values outside this range or non-integer values fall back to the model's configured default chunk size.
embedding
Type: boolean | Default: false
Enables or disables the embedding stage of the pipeline. Requires chunking to also be true; if chunking is disabled, no embeddings are generated.
embeddings_model
Type: string | Default: environment default
Identifier of the embedding model to use. Must be one of the models available in the environment's configured allow-list. When omitted, the environment's default model is used (for example, cohere.embed-multilingual-v3).
json_schema
Type: string | Default: false
Controls whether a structured JSON representation of the document is included in the pipeline output, and which schema variant to use. Set to false or omit the field to exclude JSON output entirely.
| Value | Description |
|---|---|
false | No JSON output. |
MDAST | Markdown Abstract Syntax Tree — a structured representation of the document's Markdown content following the MDAST specification. |
FULL | Full document JSON including all extracted metadata, structural elements, and content. |
PIPELINE | Internal pipeline representation intended for debugging and integration testing. Includes intermediate processing artifacts. |
pii
Type: object | Default: false
Controls Personally Identifiable Information (PII) processing. Set to false or omit the field to skip PII processing entirely.
When PII processing is enabled, supply an object with the following fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
mode | string | Yes | — | Sets the processing mode (see below). |
entity_redaction | boolean | No | false | Controls whether named entities (such as people, organisations, or locations) are also redacted. Requires mode to also be redaction. |
mode values:
| Value | Description |
|---|---|
detection | Identifies and annotates PII entities in the output without modifying the source text. |
redaction | Replaces detected PII with placeholder tokens, removing sensitive data from the pipeline output. |
Examples
The following sections display examples of some common types of /presign request bodies.
All defaults (minimal request)
{}
All options explicitly set
{
"normalization": {
"quotations": true,
"dashes": true
},
"chunking": true,
"chunking_strategy": "context",
"chunk_size": 2000,
"embedding": true,
"embeddings_model": "cohere.embed-multilingual-v3",
"json_schema": "PIPELINE",
"pii": {
"mode": "redaction",
"entity_redaction": false
}
}
PII detection only
{
"normalization": {
"quotations": true,
"dashes": true
},
"chunking": true,
"chunk_size": 1500,
"embedding": true,
"json_schema": false,
"pii": {
"mode": "detection"
}
}
Chunking only (no embeddings)
{
"normalization": {
"quotations": true,
"dashes": true
},
"chunking": true,
"chunking_strategy": "fixed",
"chunk_size": 1000,
"embedding": false,
"json_schema": "MDAST",
"pii": false
}