POST v1/Extract
POST
https://api.subworkflow.ai/v1/extract
Upload a file for extraction
Summary
- Only splits and generates dataset items and filtering index
- If you want to enable searching, please use the
v1/vectorizeendpoint instead. - If you want to vectorize an existing dataset already uploaded through
v1/extract, you can use thev1/datasets/:id/vectorizeendpoint to trigger the vectorize process. - Total upload file size limit is determined by your subscription plan but there's also a technical limit for this endpoint which is 100mb. Use the
v1/upload-sessionfor files larger than 100mb. - Ability to upload is also dependent on your remaining data storage allocation. If the uploaded file on top of your current allocation exceeds the max limit for your subscription, the upload will fail. Delete existing datasets to free up this capacity or upgrade your subscription.
Parameters
| name | type | location | required | description |
|---|---|---|---|---|
| content-type | string | header | required | You must set the request content type to multipart/form-data |
| file | file | body | required | The file you want to upload. Accepted file formats: pdf,docx,pptx,xlsx. Max 100mb. |
| expiresInDays | number | body | optional | Overrides the default expiration time for the resulting dataset in "days" from created date. |
Response
Once the file is uploaded successfully, you'll receive a jobs response which displays the details of the job tracking the extract request. Take note of the following properties:
id- you can use this with thev1/jobs/:idto get an updated version of the jobdatasetId- this is the dataset record created for the upload file. You'll need this to fetch the dataset and dataset items when the job is finished.status- this is the progress of the job. "SUCCESS" and "ERROR" are the finished states you should check for.
- Success
- 400 Error
- 404 Error
{
"type": "object",
"properties": {
"success": { "type": "boolean" },
"total": { "type": "number" },
"data": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"datasetId": {
"type": "string"
},
"type": {
"type": "string",
"enum": ["datasets/extract","datasets/vectorize"]
},
"status": {
"type": "string",
"enum": ["NOT_STARTED","IN_PROGRESS","SUCCESS","ERROR"]
},
"statusText": {
"type": "string"
},
"startedAt": {
"type": "number"
},
"finishedAt": {
"type": "number"
},
"canceledAt": {
"type": "number"
},
"createdAt": {
"type": "number"
},
"updatedAt": {
"type": "number"
}
}
}
}
}
{
"success": { "type": "boolean" },
"error": { "type": "string" }
}
{
"success": { "type": "boolean" },
"error": { "type": "string" }
}
Example
- Curl
- JS/TS
curl -X POST https://api.subworkfow.ai/v1/extract
--header 'x-api-key: <YOUR-API-KEY>'
--header 'Content-Type: multipart/form-data'
--form "file=@/path/to/file"
const formdata = new FormData();
formdata.append("file", fileInput);
const req = await fetch("https://api.subworkflow.ai/v1/extract", {
method: "POST",
headers: {
"Content-Type": "multipart/form-data",
"x-api-key": "<YOUR-API-KEY>"
},
body: formdata,
});
// note: you can poll https://api.subworkflow.ai/v1/jobs/<jobId> for updates
{
"success": true,
"total": 1,
"data": {
"id": "dsj_5fwR7qoMXracJQaf",
"datasetId": "ds_VV08ECeQBQgDoVn6",
"type": "datasets/extract",
"status": "IN_PROGRESS",
"statusText": null,
"startedAt": null,
"finishedAt": null,
"canceledAt": null,
"createdAt": 1761910647113,
"updatedAt": 1761910647113
}
}