Language Identification

Identifies the language of a piece of text.

The Language Identification API analyzes a piece of text that you provide and returns the language of the text.

You can use Language Identification to determine the correct language settings to use for other Haven OnDemand APIs, such as Sentiment Analysis or Entity Extraction.

Quick Start

You must provide input text. The following example adds the text as plain text:


The API returns the language and the encoding, and details of the UTF-8 character ranges that the input text includes.

  "language": "english",
  "language_iso639_2b": "ENG",
  "encoding": "UTF8",
  "unicode_scripts": [
    "Basic Latin"

You can also provide the text in a file. In File mode, Haven OnDemand uses the Text Extraction API to extract the text from the file and then uses the extracted text in the API.

Note: API input is subject to a maximum size quota. If you upload text or a file that is too large, the API returns an error. For more information, see Rate Limiting, Quotas, Data Expiry, and Maximums.

You must provide a minimum of three words for language identification. However, you can improve the accuracy by providing more text. The amount of text that you must provide for accurate language identification depends on both the language and the type of text. For UTF-8 encoded languages that use a unique script, the Language Identification API might be able to identify the language using only a few characters. For other languages, the API might need a few sentences to accurately identify the language, and it might need a large paragraph to distinguish between two similar languages.

The amount of text required also depends on the type of text. For example, it is difficult to identify the language from a list of places, numbers, and names. If your text contains these things, you might need to provide more text to identify the language. For natural language text, such as a news article, the API can usually detect the language from fewer characters.

A full list of supported languages is found in the Response tab.


This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.

This API accepts the following parameters:

Name Type Description
binary A file containing the document to process. Multipart POST only.
string A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
string The text to process. You must provide a minimum of three words.
string A publicly accessible HTTP URL from which the document can be retrieved.
Name Type Description
boolean Set to true to get additional metadata information on the identified language. Default value: false.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

This is an abstract definition of the response that describes each of the properties that might be returned.
Language Identification Response {
encoding ( enum<Encoding> ) The identified encoding of the input text.
language ( enum<Language> ) The identified language of the input text.
language_iso639_2b ( enum<Language_iso639_2b> ) The ISO639-2B code for the identified language of the input text, "UND" if the language could not be identified.
unicode_scripts ( array[string] , optional) The UTF-8 character ranges that your input text includes.
enum<Language Identification Response:Encoding> {
enum<Language Identification Response:Language> {
'afrikaans' , 'albanian' , 'amharic' , 'arabic' , 'armenian' , 'azeri' , 'basque' , 'belorussian' , 'bengali' , 'berber' , 'breton' , 'bulgarian' , 'burmese' , 'catalan' , 'cherokee' , 'chinese' , 'croatian' , 'czech' , 'danish' , 'dutch' , 'english' , 'esperanto' , 'estonian' , 'faroese' , 'finnish' , 'french' , 'gaelic' , 'georgian' , 'german' , 'greek' , 'greenlandic' , 'gujarati' , 'hebrew' , 'hindi' , 'hungarian' , 'icelandic' , 'indonesian' , 'isan' , 'italian' , 'japanese' , 'kannada' , 'kazakh' , 'khmer' , 'korean' , 'kurdish' , 'latin' , 'latvian' , 'lithuanian' , 'luxembourgish' , 'macedonian' , 'malayalam' , 'maltese' , 'maori' , 'mongolian' , 'nepali' , 'norwegian' , 'oriya' , 'pashto' , 'persian' , 'polish' , 'portuguese' , 'romanian' , 'russian' , 'serbian' , 'sindhi' , 'singhalese' , 'slovak' , 'slovenian' , 'somali' , 'spanish' , 'swahili' , 'swedish' , 'syriac' , 'tagalog' , 'tajik' , 'tamil' , 'telugu' , 'thai' , 'tibetan' , 'turkish' , 'ukrainian' , 'urdu' , 'uyghur' , 'uzbek' , 'vietnamese' , 'welsh' , 'unknown'
enum<Language Identification Response:Language_iso639_2b> {
'AFR' , 'ALB' , 'AMH' , 'ARA' , 'ARM' , 'AZE' , 'BAQ' , 'BEL' , 'BEN' , 'BER' , 'BRE' , 'BUL' , 'BUR' , 'CAT' , 'CHR' , 'CHI' , 'HRV' , 'CZE' , 'DAN' , 'DUT' , 'ENG' , 'EPO' , 'EST' , 'FAO' , 'FIN' , 'FRE' , 'GLE' , 'GEO' , 'GER' , 'GRE' , 'KAL' , 'GUJ' , 'HEB' , 'HIN' , 'HUN' , 'ICE' , 'IND' , 'ITA' , 'JPN' , 'KAN' , 'KAZ' , 'KHM' , 'KOR' , 'KUR' , 'LAO' , 'LAT' , 'LAV' , 'LIT' , 'LTZ' , 'MAC' , 'MAL' , 'MLT' , 'MAO' , 'MON' , 'NEP' , 'NPI' , 'NOR' , 'ORI' , 'PER' , 'POL' , 'POR' , 'PUS' , 'RUM' , 'RUS' , 'SRP' , 'SND' , 'SIN' , 'SLO' , 'SLV' , 'SOM' , 'SPA' , 'SWA' , 'SWE' , 'SYR' , 'TGL' , 'TGK' , 'TAM' , 'TEL' , 'THA' , 'TIB' , 'TUR' , 'UKR' , 'URD' , 'UIG' , 'UZB' , 'VIE' , 'WEL' , 'UND'
Model Schema
This is a JSON schema that describes the syntax of the response. See for a complete reference.
    "properties": {
        "encoding": {
            "enum": [
        "language": {
            "enum": [
        "language_iso639_2b": {
            "enum": [
        "unicode_scripts": {
            "items": {
                "type": "string"
            "type": "array"
    "required": [
    "type": "object"
See this API for yourself - select one of our examples below.
Identify English
New tests on human bones hidden in a Spanish cave for some 400,000 years set a new record for the oldest human DNA sequence ever decoded—and may scramble the scientific picture of our early relatives.
Identify German
Neue Versuche an menschlichen Knochen in einer spanischen Höhle versteckt für einige 400.000 Jahre einen neuen Rekord für den ältesten menschlichen DNA-Sequenz immer decodiert und kann den wissenschaftlichen Bild unserer frühen Verwandten klettern.
Identify Chinese
Identify Japanese
Select file Change Remove
Name Type Value
(Default: False)

Async – Response An error occurred making the API request
Response Code:
Response Body

Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

Make this call with curl

Note: The API key used in this call will expire after 24 hours. You can create a persistent API key for use in your application on the Account API Keys page

If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.