Processing Options

All pipeline behavior can be customized by supplying a JSON options object in the body of the /presign request. Every field is optional; omitting a field falls back to the environment-level default.

POST /presign
Content-Type: application/json
Authorization: Bearer <token>

Reference

The following sections detail the processing options of the /presign request body.

`normalization`

Type: object | Default: environment-configured

Controls Unicode normalization applied to extracted text before further processing. Normalization replaces visually similar characters with their canonical ASCII equivalents, which improves downstream chunking, search, and embedding quality.

Field	Type	Default	Description
`quotations`	boolean	`true`	Replaces "smart" (curly) quotation marks and apostrophes with their straight, ASCII equivalents (`"` and `'`).
`dashes`	boolean	`true`	Replaces en-dashes (`–`) and em-dashes (`—`) with a standard ASCII hyphen-minus (`-`).

`chunking`

Type: boolean | Default: false

Enables or disables the chunking stage of the pipeline. When false, the extracted text is returned as a single block and the embedding stage is skipped regardless of the embedding flag.

`chunking_strategy`

Type: string | Default: context

The algorithm used to split text into chunks when chunking is true.

Value	Description
`context`	Text-aware chunking that respects sentence and paragraph boundaries, producing semantically coherent chunks.
`fixed`	Fixed-size chunking that splits text into uniform-sized chunks. Use when consistent chunk sizes are required.

`chunk_size`

Type: integer

Target character count for each chunk when chunking is true. Must be a positive integer no greater than the selected embedding model's maximum chunk size. Values outside this range or non-integer values fall back to the model's configured default chunk size.

`embedding`

Type: boolean | Default: false

Enables or disables the embedding stage of the pipeline. Requires chunking to also be true; if chunking is disabled, no embeddings are generated.

`embeddings_model`

Type: string | Default: environment default

Identifier of the embedding model to use. Must be one of the models available in the environment's configured allow-list. When omitted, the environment's default model is used (for example, cohere.embed-multilingual-v3).

`json_schema`

Type: string | Default: false

Controls whether a structured JSON representation of the document is included in the pipeline output, and which schema variant to use. Set to false or omit the field to exclude JSON output entirely.

Value	Description
`false`	No JSON output.
`MDAST`	Markdown Abstract Syntax Tree — a structured representation of the document's Markdown content following the MDAST specification.
`FULL`	Full document JSON including all extracted metadata, structural elements, and content.
`PIPELINE`	Internal pipeline representation intended for debugging and integration testing. Includes intermediate processing artifacts.

`pii`

Type: object | Default: false

Controls Personally Identifiable Information (PII) processing. Set to false or omit the field to skip PII processing entirely.

When PII processing is enabled, supply an object with the following fields:

Field	Type	Required	Default	Description
`mode`	string	Yes	—	Sets the processing mode (see below).
`entity_redaction`	boolean	No	`false`	Controls whether named entities (such as people, organisations, or locations) are also redacted. Requires `mode` to also be `redaction`.

mode values:

Value	Description
`detection`	Identifies and annotates PII entities in the output without modifying the source text.
`redaction`	Replaces detected PII with placeholder tokens, removing sensitive data from the pipeline output.

Examples

The following sections display examples of some common types of /presign request bodies.

All defaults (minimal request)

{}

All options explicitly set

{
  "normalization": {
    "quotations": true,
    "dashes": true
  },
  "chunking": true,
  "chunking_strategy": "context",
  "chunk_size": 2000,
  "embedding": true,
  "embeddings_model": "cohere.embed-multilingual-v3",
  "json_schema": "PIPELINE",
  "pii": {
    "mode": "redaction",
    "entity_redaction": false
  }
}

PII detection only

{
  "normalization": {
    "quotations": true,
    "dashes": true
  },
  "chunking": true,
  "chunk_size": 1500,
  "embedding": true,
  "json_schema": false,
  "pii": {
    "mode": "detection"
  }
}

Chunking only (no embeddings)

{
  "normalization": {
    "quotations": true,
    "dashes": true
  },
  "chunking": true,
  "chunking_strategy": "fixed",
  "chunk_size": 1000,
  "embedding": false,
  "json_schema": "MDAST",
  "pii": false
}

Reference​

normalization​

chunking​

chunking_strategy​

chunk_size​

embedding​

embeddings_model​

json_schema​

pii​

Examples​

All defaults (minimal request)​

All options explicitly set​

PII detection only​

Chunking only (no embeddings)​