gettxt.AI API Documentation

Introduction

The gettxt.AI API provides powerful text extraction, summarization, and translation capabilities for various document types including PDFs, images, audio files, and videos.

Authentication

All API requests require authentication using an API key. Include your API key in the request headers as follows:


headers: {
  "x-api-key": "YOUR_API_KEY"
}

Endpoint: /extract

This endpoint allows you to extract text from various document types, with options for summarization and translation.

Request

Method: POST

URL: https://gettxt.ai/api/extract/

Request Body


{
  "documentUris": 
  [
  "https://example.com/document1.pdf",
    "https://example.com/document2.pdf",
    "https://example.com/document3.pdf"
  ],
  "summarize": false,
  "translate": "de",
  "newDocumentIndicator": "[New Document]",
  "outputFormat": "text"
}

documentUris (required): An array of URLs pointing to the documents you want to process.
summarize (optional): Set to true if you want to generate summaries of the extracted text.
translate (optional): Specify a language code (e.g. es, en, de) to translate the extracted text.
newDocumentIndicator (optional): A string to be inserted between documents in the all_text response field.
outputFormat (optional): Specify "text" or "markdown" for the desired output format. Text is the default.

Response

The API returns a JSON object with the following structure:


{
  "creditsUsed": 1,
  "creditsRemaining": 99,
  "totalWordCount": 500,
  "all_text": "Extracted text from all documents...",
  "documents": [
    {
      "documentUri": "https://example.com/document.pdf",
      "status": "succeeded",
      "createdDateTime": "2023-06-01T12:00:00Z",
      "lastUpdatedDateTime": "2023-06-01T12:01:00Z",
      "wordCount": 500,
      "extractedText": "Full extracted text...",
      "shortSummary": "Short summary if requested...",
      "longSummary": "Long summary if requested...",
      "translatedText": "Translated text if requested..."
    }
  ],
  "timestamp": "2023-06-01T12:01:00Z"
}

Error Codes

400 Bad Request
- Invalid request: documentUris must be an array of URLs
- Invalid request: documentUris array cannot be empty
- Some files have unsupported formats
- Some files are not accessible
- File size exceeds maximum allowed limit
- Invalid request: file type not supported
- Connection refused or timed out when accessing files
401 Unauthorized
- Invalid API key
- Authentication failed with external service
403 Forbidden
- Insufficient credits
- Access denied by external service
404 Not Found
- Resource not found in external service
413 Payload Too Large
- File size exceeds service limits
429 Too Many Requests
- Service rate limit exceeded. Please try again later.
500 Internal Server Error
- Audio processing failed
- Video processing failed
- File processing failed
- Transcription failed
- Translation failed
- Summarization failed
- Error during document file handling
- Service configuration error
- External service error
503 Service Unavailable
- External service temporarily unavailable
504 Gateway Timeout
- External service request timed out

Examples

Note: Document types can be mixed in a single request, but there is a maximum limit of 10 documents per request.

Basic Text Extraction

curl -X POST https://gettxt.ai/api/extract/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "documentUris": [
      "https://example.com/document.pdf",
      "https://example.com/image.jpg",
      "https://example.com/presentation.pptx"
    ],
    "outputFormat": "text"
  }'

Text Extraction with Summarization and Translation

curl -X POST https://gettxt.ai/api/extract/ \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "documentUris": [
      "https://example.com/document.pdf",
      "https://example.com/image.jpg",
      "https://example.com/presentation.pptx"
    ],
    "outputFormat": "text"
  }'

Rate Limits and Usage

The API uses a credit-based system. Each document processed consumes one credit. Ensure you have sufficient credits before making requests. You can check your remaining credits in the API response.

There may be rate limits in place to ensure fair usage. If you exceed these limits, you'll receive a 429 error. Please contact support for more information on rate limits and how to increase them for your account.

Supported Languages

Document & Image Transcription:

The following languages are supported for document and image transcription:

Abaza (abq)
Abkhazian (ab)
Achinese (ace)
Acoli (ach)
Adangme (ada)
Adyghe (ady)
Afar (aa)
Afrikaans (af)
Akan (ak)
Albanian (sq)
Algonquin (alq)
Angika (Devanagari) (anp)
Arabic (ar)
Asturian (ast)
Asu (Tanzania) (asa)
Avaric (av)
Awadhi-Hindi (Devanagari) (awa)
Aymara (ay)
Azerbaijani (Latin) (az)
Bafia (ksf)
Bagheli (bfy)
Bambara (bm)
Bashkir (ba)
Basque (eu)
Belarusian (Cyrillic) (be, be-cyrl)
Belarusian (Latin) (be, be-latn)
Bemba (Zambia) (bem)
Bena (Tanzania) (bez)
Bhojpuri-Hindi (Devanagari) (bho)
Bikol (bik)
Bini (bin)
Bislama (bi)
Bodo (Devanagari) (brx)
Bosnian (Latin) (bs)
Brajbha (bra)

Breton (br)
Bulgarian (bg)
Bundeli (bns)
Buryat (Cyrillic) (bua)
Catalan (ca)
Cebuano (ceb)
Chamling (rab)
Chamorro (ch)
Chechen (ce)
Chhattisgarhi (Devanagari) (hne)
Chiga (cgg)
Chinese Simplified (zh-Hans)
Chinese Traditional (zh-Hant)
Choctaw (cho)
Chukot (ckt)
Chuvash (cv)
Cornish (kw)
Corsican (co)
Cree (cr)
Creek (mus)
Crimean Tatar (Latin) (crh)
Croatian (hr)
Crow (cro)
Czech (cs)
Danish (da)
Dargwa (dar)
Dari (prs)
Dhimal (Devanagari) (dhi)
Dogri (Devanagari) (doi)
Duala (dua)
Dungan (dng)
Dutch (nl)
Efik (efi)
English (en)
Erzya (Cyrillic) (myv)

Estonian (et)
Faroese (fo)
Fijian (fj)
Filipino (fil)
Finnish (fi)
French (fr)
Frisian (fy)
Fulah (ff)
Ga (gaa)
Galician (gl)
Ganda (lg)
Georgian (ka)
German (de)
Greek (el)
Guarani (gn)
Gujarati (gu)
Haitian (ht)
Hausa (ha)
Hawaiian (haw)
Hebrew (he)
Hindi (hi)
Hungarian (hu)
Icelandic (is)
Igbo (ig)
Indonesian (id)
Inuktitut (iu)
Irish (ga)
Italian (it)
Japanese (ja)
Javanese (jv)
Kannada (kn)
Kashmiri (ks)
Kazakh (kk)
Khmer (km)
Korean (ko)

Note: This list includes the most common languages. For a complete and up-to-date list, please refer to our API documentation.

Handwritten Text:

Handwritten text recognition is supported for the following languages:

English (en)
Chinese Simplified (zh-Hans)
French (fr)
German (de)
Italian (it)
Thai (preview) (th)

Japanese (ja)
Korean (ko)
Portuguese (pt)
Spanish (es)
Russian (preview) (ru)
Arabic (preview) (ar)

Translations, Summaries & Audio / Video Transcription:

Supports over 50 languages including Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

Supported File Types

Audio: MP3, MPGA, M4A, WAV
Video: MP4, AVI, MKV, MOV, FLV, WMV, MPEG, 3GP, WEBM, MTS, M2TS, TS
Documents: PDF, DOCX, XLSX, PPTX, HTML, EPUB
Images: JPG, JPEG, PNG, BMP, TIFF, HEIF

Supported Max File Sizes

Audio: 100MB
Video: 100MB
Documents: 500MB / File or 2000 Pages

Additional Rules and Requirements

Files should be directly accessible; URIs that require authentication or that invoke interactive scripts before the file can be accessed aren't supported.
Image dimensions must be between 50 pixels x 50 pixels and 10,000 pixels x 10,000 pixels.
If your PDFs are password-locked, you must remove the lock before submission.
The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8 point text at 150 dots per inch (DPI).