Entity Extraction

Extracts entities (words, phrases, or blocks of information) from your input text.

The Entity Extraction API allows you to find useful snippets of information from a larger body of text. The snippets of information (known as entities) can be words, phrases, or other blocks of information, such as a phone number. You provide the text to analyze and choose the kind of information that you want to find. The API provides a set of entity types, which includes people names, place names, company names, phone numbers, dates, Web addresses, and credit card numbers.

The API returns a list of the extracted entities, along with information about the type of matches found, and the position in the text where the entity occurs.

Quick Start

Entity extraction can extract the names of famous people from data, which can help to categorize your content. For example:

/1/api/[async|sync]/extractentities/v2?text=Who is better? Lebron James or Kobe Bryant?&entity_type=people_eng

This request returns the following entities and their locations:

{
  "entities": [
    {
      "normalized_text": "LeBron James",
      "type": "people_eng",
      "matches": [
        {
          "offset": 15,
          "original_text": "Lebron James",
          "original_length": 12
        }
      ],
      "score": 0.3161,
      "additional_information": {
        "person_profession": [
          "basketball player"
        ],
        "person_date_of_birth": "30/12/1984",
        "wikidata_id": 36159,
        "wikipedia_eng": "http://en.wikipedia.org/wiki/LeBron_James",
        "image": "https://upload.wikimedia.org/wikipedia/commons/3/3f/LeBron_James_at_GSW.jpg"
      },
      "components": [],
      "normalized_length": 12
    },
    {
      "normalized_text": "Kobe Bryant",
      "type": "people_eng",
      "matches": [
        {
          "offset": 31,
          "original_text": "Kobe Bryant",
          "original_length": 11
        }
      ],
      "score": 0.2853,
      "additional_information": {
        "person_profession": [
          "basketball player"
        ],
        "person_date_of_birth": "23/08/1978",
        "wikidata_id": 25369,
        "wikipedia_eng": "http://en.wikipedia.org/wiki/Kobe_Bryant",
        "image": "https://upload.wikimedia.org/wikipedia/commons/2/22/KBryant8.jpg"
      },
      "components": [],
      "normalized_length": 11
    }
  ]
}

Another example use is to extract dates from your documents. For example:

/1/api/[async|sync]/extractentities/v2?text=The event took place on the 25th of April 2014&entity_type=date_eng

{
  "entities": [
    {
      "normalized_text": "25th of April 2014",
      "type": "date_eng",
      "normalized_length": 18,
      "score": 1,
      "components": [],
      "matches": [
        {
          "offset": 28,
          "original_text": "25th of April 2014",
          "original_length": 18
        }
      ],
      "normalized_date": "04/25/2014"
    }
  ]
}

The API can also extract entities from remote files or Web pages, as well as from uploaded files. It uses the Text Extraction API to extract the text from the file, and passes it to the entity extraction process. For example, the following request extracts entities from a URL:

/1/api/[async|sync]/extractentities/v2?url=http://www.bbc.co.uk/news/business/companies/&entity_type=companies_eng

The following example uses a POST request to upload a file:

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/extractentities/v2 --form "file=@myfile.doc" --form "entity_type=companies_eng"

{
  "entities": [
    {
      "normalized_text": "Google Inc",
      "type": "companies_eng",
      "matches": [
        {
          "offset": 34,
          "original_text": "Google",
          "original_length": 6
        }
      ],
      "score": 0.2572,
      "additional_information": {
        "wikidata_id": 95,
        "wikipedia_eng": "http://en.wikipedia.org/wiki/Google",
        "company_ric": [
          "GOOGL*.MX",
          "GOOGL.OQ",
          "GGQ7.F",
          "GGQ6.F",
          "GOOG34.SA",
          "GOOG.OQ",
          "GGQ1.F",
          "GOOG*.MX"
        ],
        "company_google": [
          "NASDAQ:GOOGL",
          "FRA:GGQ7",
          "FRA:GGQ6",
          "BVMF:GOOG34",
          "NASDAQ:GOOG",
          "FRA:GGQ1"
        ],
        "company_bloomberg": [
          "GOOGL*:MM",
          "GOOGL:US",
          "GGQ7:GR",
          "GGQ6:GR",
          "GOOG34:BZ",
          "GOOG:US",
          "GGQ1:GR",
          "GOOG*:MM"
        ],
        "company_yahoo": [
          "GOOGL*.MX",
          "GOOGL",
          "GGQ7.F",
          "GGQ6.F",
          "GOOG34.SA",
          "GOOG",
          "GGQ1.F",
          "GOOG*.MX"
        ],
        "company_wikipedia": [
          "NASDAQ:GOOG",
          "NASDAQ:GOOGL"
        ],
        "url_homepage": "google.com"
      },
      "components": [],
      "normalized_length": 10
    },
    {
      "normalized_text": "Tesco PLC",
      "type": "companies_eng",
      "matches": [
        {
          "offset": 96,
          "original_text": "Tesco",
          "original_length": 5
        }
      ],
      "score": 0.2516,
      "additional_information": {
        "wikidata_id": 487494,
        "wikipedia_eng": "http://en.wikipedia.org/wiki/Tesco",
        "company_ric": [
          "TCO1.F",
          "TSCDY.PK",
          "TSCO.L",
          "TSCDF.PK",
          "TSCON.MX",
          "TCO.F"
        ],
        "company_google": [
          "FRA:TCO1",
          "OTCMKTS:TSCDY",
          "LON:TSCO",
          "OTCMKTS:TSCDF",
          "FRA:TCO"
        ],
        "company_bloomberg": [
          "TCO1:GR",
          "TSCDY:US",
          "TSCO:LN",
          "TSCDF:US",
          "TSCON:MM",
          "TCO:GR"
        ],
        "company_yahoo": [
          "TCO1.F",
          "TSCDY",
          "TSCO.L",
          "TSCDF",
          "TSCON.MX",
          "TCO.F"
        ],
        "company_wikipedia": [
          "ISE:TCO",
          "LSE:TSCO"
        ],
        "url_homepage": "http://www.tesco.com"
      },
      "components": [],
      "normalized_length": 9
    }
  ]
}

The Entity Extraction API supports multiple inputs of the same type, for example, multiple file inputs:

curl -X POST http://api.havenondemand.com/1/api/[async|sync]/extractentities/v2 --form "file=@myfile1.doc" --form "file=@myfile2.pdf" --form "entity_type=companies_eng"

For more information about the additional information that entity types can return, see Additional Entity Information.

Synchronous
https://api.havenondemand.com/1/api/sync/extractentities/v2
Asynchronous
https://api.havenondemand.com/1/api/async/extractentities/v2
Authentication

This API requires an authentication token to be supplied in the following parameter:

Parameter Description
apikey The API key to use to authenticate the API request.
Parameters

This API accepts the following parameters:

Required
Name Type Description
file
array<binary> A file containing the document to process. Multipart POST only.
reference
array<string> A Haven OnDemand reference obtained from either the Expand Container or Store Object API. The corresponding document is passed to the API.
text
array<string> The text content to process.
url
array<string> A publicly accessible HTTP URL from which the document to process can be retrieved.
entity_type
array<enum> The type of entity to extract from the specified text. See Additional Entity Information.
Optional
Name Type Description
show_alternatives
boolean Set to true to return multiple entries when there are multiple matches for a particular string. For example London, UK and London, Ontario. Default value: false.
Enumeration Types

This API's parameters use the enumerations described below:

entity_type
The type of entity to extract from the specified text.
people_eng English Notable People
Matches the names of over 118,000 famous people from the present day and history. For example, Barack Obama.
places_eng English Place Names
Matches English place names, and common abbreviations and alternative forms. For example, London. The place names data is compiled from www.geonames.org.
companies_eng English Company Names
Matches over 84,000 English company names. For example, Hewlett Packard Enterprise.
compliance_eng Compliance (English-Language)
Matches terms that you might want to flag as profanity or restricted content, such as swearing, racism, children, employment, health, drunkenness, drugs, alcohol, money, and sex.
drugs_eng Pharmaceutical Drug Names
Matches pharmaceutical drug names.
films_eng Films (English-Language)
Matches over 320,000 film titles.
holidays_eng Holidays (English-Language)
Matches public holidays. For example, Christmas.
languages_eng Languages (English-Language)
Matches languages.
medical_conditions_eng Medical Conditions (English-Language)
Matches medical conditions. For example, Cardiovascular disease.
organizations_eng Organization Names (English-Language)
Matches organization names.
professions_eng Professions (English-Language)
Matches professions.
teams_eng Sports Teams (English-Language)
Matches sports team names.
universities_eng Universities (English-Language)
Matches institutions of higher eduction.
address_au Australian Addresses
Matches Australian addresses. For example, Shop 17, Winnellie Shopping Centre, 347 Stuart Hwy, Winnellie, NT, 0820.
address_ca Canadian Addresses
Matches Canadian addresses. For example, 240 4th Avenue S.W., Suite 600, Calgary, Alberta T2P 4H4, Canada.
address_de German Addresses
Matches addresses from Germany. For example, Postfach 10 01 65, 32547, Bad Oeynhausen, GERMANY.
address_es Spanish Addresses
Matches addresses from Spain. For example, Av. de las Cortes de Cádiz, s/n, C. C. El Corte Inglés, 11011, Cádiz.
address_fr French Addresses
Matches addresses from France. For example, 3, Avenue Denis Semeria, Saint-Jean-Cap-Ferrat, Provence-Alpes-Côte d'Azur, 06230, France.
address_gb UK Addresses
Matches UK addresses. For example, Unit D, Acorn Business Park, Ling Road, Tower Park, Poole, Dorset, BH12 4NZ.
address_it Italian Addresses
Matches Italian addresses. For example, Strada del Masarone 67, 13900 Biella (MI).
address_us US Addresses
Matches US addresses. For example, 30 South Wacker Drive, 22nd Floor, Chicago, IL 60606.
address_zh Chinese Addresses
Matches Chinese addresses in both Simplified Chinese and English. For example, 武汉市汉口建设大道568号新世界国贸大厦I座9楼910室.
person_fullname_eng English Full Person Name
Matches multiple person name as components in pairs or longer strings, such as Firstname Lastname, Lastname Suffix, Firstname née Lastname. For example, Agnes (née Kozell).
person_name_component_eng Person Name components
Matches individual first names and last names as commonly found in English speaking countries. For example, Paul.
pii Personal Identifying Information
A superset entity type that aggregates the address, phone number, credit card, full person name, IP address, email address, Social Security, and National Insurance entity types. For entity types that have multiple locales, such as addresses and phone numbers, the PII entity type includes all locales.
pii_ext Extended Personal Identifying Information
An extended PII superset that in addition to the types matched by the PII type, also includes all bank account and driving license entity types except for US driving license. If required, you can use the driverslicense_us entity type along with pii_ext. However, US driving licenses have a wide range of formats, which might result in false positives.
number_phone_au Australian Phone Numbers
Matches Australian landline, mobile, and other phone numbers. For example, +61 3 34 45 56 67.
number_phone_ca Canadian Phone Numbers
Matches Canadian phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, 867-920-9209.
number_phone_gb UK Phone Numbers
Matches UK landline, mobile, freephone, and business phone numbers. It can also match area codes. For example, 01223 123456.
number_phone_us US Phone Numbers
Matches US phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, 598-3113 ext. 123.
number_phone_de German Phone Numbers
Matches German phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, 089 651285-299.
number_phone_fr French Phone Numbers
Matches French phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, +33 140633900.
number_phone_it Italian Phone Numbers
Matches Italian phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, 03 9494 4949.
number_phone_es Spanish Phone Numbers
Matches Spanish phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, (+34) 942 733 625.
number_phone_zh Chinese Phone Numbers
Matches Chinese phone numbers, in undelimited forms or using spaces, hyphens, or dots as delimiters. It includes alphanumeric phone numbers. For example, 021 26037128.
date_eng English Dates
Matches English dates, including month names (long and short forms), days of the week, years, times of the day, and dates formed from a combination of these formats. It also matches days and periods relative to the current date, and season names. For example, Saturday, January 5th, 2008.
date_ger German Dates
Matches German dates, including month names (long and short forms), days of the week, years, times of the day, and dates formed from a combination of these formats. It also matches days and periods relative to the current date, and season names. For example, Samstag, 5. Januar, 2008.
date_fre French Dates
Matches French dates, including month names (long and short forms), days of the week, years, times of the day, and dates formed from a combination of these formats. It also matches days and periods relative to the current date, and season names. For example, Samedi, 5 Janvier, 2008.
date_ita Italian Dates
Matches Italian dates, including month names (long and short forms), days of the week, years, times of the day, and dates formed from a combination of these formats. It also matches days and periods relative to the current date, and season names. For example, sabato 5 di gennaio del 2008.
date_spa Spanish Dates
Matches Spanish dates, including month names (long and short forms), days of the week, years, times of the day, and dates formed from a combination of these formats. It also matches days and periods relative to the current date, and season names. For example, Sabado 5 de enero de 2008.
date_chi Chinese Dates
Matches Chinese (Simplified & Traditional) dates, including month names (long and short forms), days of the week, years,times of the day, and dates formed from a combination of these formats. It also matches days and periods relative to the current date, and season names. For example, 十二月二十四日晚上六時.
internet Internet Addresses
Matches host names, IP addresses (IPv4, IPv6, and IPv4-mapped), and HTTP and HTTPS addresses. It also matches addresses for other Internet protocols including file, FTP, news, Telnet, and Gopher. For example, www.havenondemand.com.
internet_email Internet Email Addresses
Matches email addresses. For example, jane.smith@example.com.
ip_address Internet IP Addresses
Matches IP addresses (IPv4, IPv6). For example, 192.168.49.50.
number_cc Credit Card Numbers
Matches 12-digit to 19-digit credit card numbers, in undelimited forms or using hyphens or spaces as delimiters. For example, 3799 123456 78901.
nationalinsurance_gb UK National Insurance Numbers
Matches UK National Insurance numbers, in undelimited forms or using hyphens or spaces as delimiters. Matches are case sensitive (all letters must be in upper case to match). For example, AB 12 34 56 C.
socialsecurity_us US Social Security Numbers
Matches US Social Security numbers, in undelimited forms or using hyphens or spaces as delimiters. For example, 354 81 9114.
socialinsurance_ca Canadian Social Insurance Numbers
Matches Canadian Social Insurance numbers, in undelimited forms or using hyphens or spaces as delimiters. For example, 123-456-789.
licenseplate_us US Vehicle License Plates
Matches vehicle license plate numbers for all US states. For example, ABC D12.
licenseplate_gb UK Vehicle License Plates
Matches vehicle registration numbers for the UK. Matches are case sensitive (all letters must be in upper case to match). For example, AB07 XYZ.
licenseplate_fr French Vehicle License Plates
Matches vehicle registration numbers for France. For example, 1234 AB 56.
licenseplate_de German Vehicle License Plates
Matches vehicle number plates for Germany. For example, AB CD 1234.
licenseplate_ca Canadian Vehicle License Plates
Matches vehicle license plate numbers for all Canadian provinces. For example, ABC-1234.
driverslicense_us US Driver's License Numbers
Matches driver's license numbers for each US state. For example, F255-123-45-678-0.
driverslicense_gb UK Driver's License Numbers
Matches driver's license numbers for the UK. Matches are case sensitive (all letters must be in upper case to match). For example, ROBIN756024CJ8UL02.
driverslicense_fr French Driver's License Numbers
Matches driver's license numbers for France. For example, 010123456789.
driverslicense_de German Driver's License Numbers
Matches driver's license numbers for Germany. For example, G1234567890.
driverslicense_ca Canadian Driver's License Numbers
Matches driver's license numbers for each Canadian province. For example, A1234-12345-67890.
bankaccount_ca Canadian Bank Account Numbers
Matches Canadian bank account numbers. For example, 123-123456-789.
bankaccount_fr French Bank Account Numbers
Matches French bank account numbers. For example, 20041 01005 0500013M026 06.
bankaccount_gb UK Bank Account Numbers
Matches UK bank account numbers. The account numbers must include a sort code. For example, 60-16-13 31926819.
bankaccount_ie Irish Bank Account Numbers
Matches Irish bank account numbers. For example, 93-11-52 12345678.
bankaccount_us US Bank Account Numbers
Matches US bank account numbers. For example, 15-1234/6226 1234567890.
bankaccount_de German Bank Account Numbers
Matches German bank account numbers. For example, 150-500-12 1234567.
file_hash 32-digit and 40-digit hexidecimal file hashes
Matches 32-digit and 40-digit hexidecimal.
organizations Organization Names
This entity_type will be deprecated. Please use organizations_eng.
languages Languages
This entity_type will be deprecated. Please use languages_eng.
professions Professions
This entity_type will be deprecated. Please use professions_eng.
universities Universities
This entity_type will be deprecated. Please use universities_eng.
profanities Profanities and Compliance
This entity_type will be deprecated. Please use compliance_eng.
films Films
This entity_type will be deprecated. Please use films_eng.
teams Sports Teams
This entity_type will be deprecated. Please use teams_eng.
holidays Holidays
This entity_type will be deprecated. Please use holidays_eng.
medical_conditions Medical Conditions
This entity_type will be deprecated. Please use medical_conditions_eng.

This API returns a JSON response that is described by the model below. This single model is presented both as an easy to read abstract definition and as the formal JSON schema.

Asynchronous Use

Additional requests are required to get the result if this API is invoked asynchronously.

You can use /1/job/status/<job-id> to get the status of the job, including results if the job is finished.

You can also use /1/job/result/<job-id>, which waits until the job has finished and then returns the result.

Model
This is an abstract definition of the response that describes each of the properties that might be returned.
Entity Extraction Response {
entities ( array[Entities] ) The details of extracted items.
}
Entity Extraction Response:Entities {
additional_information ( Additional_information , optional)
components ( array[Components] ) The details of a component of a match.
normalized_date ( string , optional) A normalized date for extracted date entities, in the format MM/DD/YYYY.
normalized_length ( number ) The length of the extracted entity, after character normalization.
normalized_text ( string ) The name of the matched entity. In some cases, many values can match the same entity (for example, alternative names for people and places). Normalized_text is the normalized name for these aliases.
matches ( array[Matches] ) List of original texts that matched an entity, along with their offsets.
score ( number ) The confidence score of the match (higher scores indicate a more likely match).
type ( enum<Type> ) The type of the extracted entity.
documentIndex ( integer , optional) 0-based integer that states which document contains the match, in cases when more than one document is sent.
}
enum<Entity Extraction Response:Entities:Type> {
'people_eng' , 'places_eng' , 'companies_eng' , 'organizations_eng' , 'organizations' , 'languages_eng' , 'languages' , 'drugs_eng' , 'professions_eng' , 'professions' , 'address_au' , 'address_ca' , 'address_de' , 'address_es' , 'address_fr' , 'address_gb' , 'address_it' , 'address_us' , 'address_zh' , 'person_fullname_eng' , 'person_name_component_eng/lastname' , 'person_name_component_eng/firstname' , 'person_name_component_eng' , 'pii' , 'pii_ext' , 'number_phone_au' , 'number_phone_ca' , 'number_phone_gb' , 'number_phone_us' , 'number_phone_de' , 'number_phone_fr' , 'number_phone_it' , 'number_phone_es' , 'number_phone_zh' , 'number_phone_au/landline' , 'number_phone_au/mobile' , 'number_phone_au/other' , 'number_phone_gb/landline' , 'number_phone_gb/mobile' , 'number_phone_gb/freephone' , 'number_phone_gb/business' , 'number_phone_gb/non_geographic' , 'number_phone_gb/personal' , 'number_phone_it/landline' , 'number_phone_it/mobile' , 'number_phone_it/other' , 'number_phone_de/landline' , 'number_phone_de/mobile' , 'number_phone_de/other' , 'number_phone_fr/landline' , 'number_phone_fr/mobile' , 'number_phone_fr/other' , 'number_phone_es/landline' , 'number_phone_es/mobile' , 'number_phone_es/other' , 'number_phone_zh/landline' , 'number_phone_zh/mobile' , 'number_phone_zh/tollfree' , 'date_eng' , 'date_ger' , 'date_fre' , 'date_ita' , 'date_spa' , 'date_chi' , 'holidays_eng' , 'holidays' , 'internet' , 'internet/host' , 'internet/https' , 'internet/file' , 'internet/ftp' , 'internet/news' , 'internet/telnet' , 'internet/gopher' , 'internet_email' , 'ip_address' , 'medical_conditions_eng' , 'medical_conditions' , 'number_cc' , 'nationalinsurance_gb' , 'socialsecurity_us' , 'socialinsurance_ca' , 'teams_eng' , 'teams' , 'licenseplate_us' , 'licenseplate_gb' , 'licenseplate_fr' , 'licenseplate_de' , 'licenseplate_ca' , 'driverslicense_us' , 'driverslicense_gb' , 'driverslicense_fr' , 'driverslicense_de' , 'driverslicense_ca' , 'bankaccount_ca' , 'bankaccount_fr' , 'bankaccount_gb' , 'bankaccount_de' , 'bankaccount_ie' , 'bankaccount_us' , 'universities_eng' , 'universities' , 'compliance_eng' , 'profanities' , 'file_hash' , 'films_eng' , 'films'
}
Entity Extraction Response:Entities:Additional_information {
person_profession ( array[string] , optional) The professions of the extracted entity. Applies only to the people_eng entity.
person_date_of_birth ( string , optional) The date of birth of the extracted entity. Applies only to the people_eng entity.
person_date_of_death ( string , optional) The date of death of the extracted entity. Applies only to the people_eng entity.
lon ( number , optional) The longitude value of the extracted entity. Applies only to the places_eng entity.
lat ( number , optional) The latitude value of the extracted entity. Applies only to the places_eng entity.
country ( string , optional) The country of the extracted entity. Applies only to the universities and teams entities.
wikidata_id ( number , optional) The Wikidata ID related to the extracted entity. Applies to the companies_eng, people_eng, places_eng, drugs_eng, and organizations entities.
wikipedia_eng ( string , optional) The name of a Wikipedia page related to the extracted entity. Applies to the companies_eng, people_eng, places_eng, drugs_eng, and organizations entities.
image ( string , optional) The URL for an image related to the extracted entity on Wikipedia commons. Applies to the people_eng, places_eng, drugs_eng, and organizations entities.
company_ric ( array[string] , optional) The Reuters instrument code of the extracted entity. Applies only to the companies_eng entity.
company_google ( array[string] , optional) The Google finance code of the extracted entity. Applies only to the companies_eng entity.
company_yahoo ( array[string] , optional) The Yahoo finance code of the extracted entity. Applies only to the companies_eng entity.
company_wikipedia ( array[string] , optional) The Wikipedia ticker code of the extracted entity. Applies only to the companies_eng entity.
disease_icd10 ( array[string] , optional) The URL for an International Classification of Diseases page related to the disease. Applies only to the medical_conditions entity.
disease_diseasesdb ( array[string] , optional) The URL for a Diseases Database page related to the disease. Applies only to the medical_conditions entity.
url_homepage ( string , optional) The URL of a home page related to the extracted entity. Applies only to the companies_eng entity.
film_director ( array[string] , optional) A list of directors for the film. Applies only to the films entity.
film_producer ( array[string] , optional) A list of producers for the film. Applies only to the films entity.
film_writer ( array[string] , optional) A list of writers for the film. Applies only to the films entity.
film_starring ( array[string] , optional) A list of people that star in this film. Applies only to the films entity.
film_composer ( array[string] , optional) A list of people that have composed soundtrack music for the film. Applies only to the films entity.
film_studio ( array[string] , optional) A list of studios for the film. Applies only to the films entity.
film_year ( number , optional) The year the film was released. Applies only to the films entity.
film_runtime ( number , optional) The duration of the film given in minutes. Applies only to the films entity.
film_country ( array[string] , optional) A list of countries that the film was released in. Applies only to the films entity.
film_language ( array[string] , optional) The languages that the film was produced in. Applies only to the films entity.
film_screenwriter ( array[string] , optional) A list of screenwriters for the film. Applies only to the films entity.
film_imdb ( array[string] , optional) The URLs of the description of the film on http://www.imdb.com. Applies only to the films entity.
film_dir_photography ( array[string] , optional) A list of photography directors for the film. Applies only to the films entity.
film_distributor ( array[string] , optional) A list of distributors for the film. Applies only to the films entity.
film_budget ( string , optional) The budget amount for the film. Applies only to the films entity.
film_gross ( string , optional) The gross amount for the film. Applies only to the films entity.
film_genre ( array[string] , optional) A list of the film's genres. Applies only to the films entity.
place_timezone ( number , optional) The UTC timezone of the extracted entity. Applies only to the places_eng entity.
place_population ( integer , optional) The population of the extracted entity. Applies only to the places_eng entity.
place_country_code ( string , optional) The country code of the extracted entity. Applies only to the places_eng entity.
place_region1 ( string , optional) The first region of the extracted entity. Applies only to the places_eng entity.
place_region2 ( string , optional) The second region of the extracted entity. Applies only to the places_eng entity.
place_elevation ( integer , optional) The elevation of the extracted entity. Applies only to the places_eng entity.
place_type ( string , optional) The place type of the extracted entity. Applies only to the places_eng entity.
place_continent ( string , optional) The continent of the extracted entity. Applies only to the places_eng entity.
team_sport ( array[string] , optional) The name of the sport played by the team. Applies only to the teams entity.
team_league ( string , optional) The name of the league the team plays in. Applies only to the teams entity.
language_family ( string , optional) The language family for the extracted entity. Applies only to the languages entity.
language_iso639_1 ( string , optional) The iso639-1 language code for the extracted entity. Applies only to the languages entity.
language_iso639_2 ( array[string] , optional) The iso639-2 language code for the extracted entity. Applies only to the languages entity.
language_iso639_3 ( array[string] , optional) The iso639-3 language code for the extracted entity. Applies only to the languages entity.
language_script ( array[string] , optional) The language script for the extracted entity. Applies only to the languages entity.
language_group ( array[string] , optional) The language group for the extracted entity. Applies only to the languages entity.
language_official_country_code ( array[string] , optional) The official country code for the extracted entity. Applies only to the languages entity.
language_nativespeakers ( number , optional) The number of native speakers for the extracted entity. Applies only to the languages entity.
}
Entity Extraction Response:Entities:Components {
original_length ( number ) The original length of the component.
original_text ( string ) The original text of the component.
type ( enum<Type> ) The type of the component.
}
enum<Entity Extraction Response:Entities:Components:Type> {
'AMPM' , 'CHECKSUM' , 'DATE' , 'DAYSAFTER' , 'DAYSBEFORE' , 'DAYWEEKMONTH' , 'DOMAIN' , 'EW' , 'FIXED_DATE' , 'HOST' , 'HOUR12' , 'HOUR24' , 'LAT_DECIMAL' , 'LAT_DEGREES' , 'LAT_MINUTES' , 'LAT_SECONDS' , 'LOCAL' , 'LONG_DECIMAL' , 'LONG_DEGREES' , 'LONG_MINUTES' , 'LONG_SECONDS' , 'MINUTES' , 'MONTH' , 'NS' , 'NUMBER' , 'PASSWORD' , 'PORT' , 'REL_DAY' , 'REL_EASTER' , 'REL_MONTH' , 'REL_REL_WEEK' , 'REL_WEEK' , 'REL_WEEKFROMNOW' , 'REL_YEAR' , 'SECONDS' , 'TIMEZONE' , 'USER' , 'WEEK_OF_MONTH' , 'WEEKDAY' , 'WEEKDAYAFTER' , 'WEEKDAYBEFORE' , 'WEEKSAFTER' , 'WEEKSBEFORE' , 'YEAR' , 'YEARSHORT'
}
Entity Extraction Response:Entities:Matches {
original_length ( number ) The original length of the extracted entity.
original_text ( string ) The original text of the extracted entity.
offset ( number ) The offset from the start of the original input text to the matched original text.
}
Model Schema
This is a JSON schema that describes the syntax of the response. See json-schema.org for a complete reference.
{
    "entitytype": {
        "enum": [
            "people_eng",
            "places_eng",
            "companies_eng",
            "organizations_eng",
            "organizations",
            "languages_eng",
            "languages",
            "drugs_eng",
            "professions_eng",
            "professions",
            "address_au",
            "address_ca",
            "address_de",
            "address_es",
            "address_fr",
            "address_gb",
            "address_it",
            "address_us",
            "address_zh",
            "person_fullname_eng",
            "person_name_component_eng/lastname",
            "person_name_component_eng/firstname",
            "person_name_component_eng",
            "pii",
            "pii_ext",
            "number_phone_au",
            "number_phone_ca",
            "number_phone_gb",
            "number_phone_us",
            "number_phone_de",
            "number_phone_fr",
            "number_phone_it",
            "number_phone_es",
            "number_phone_zh",
            "number_phone_au/landline",
            "number_phone_au/mobile",
            "number_phone_au/other",
            "number_phone_gb/landline",
            "number_phone_gb/mobile",
            "number_phone_gb/freephone",
            "number_phone_gb/business",
            "number_phone_gb/non_geographic",
            "number_phone_gb/personal",
            "number_phone_it/landline",
            "number_phone_it/mobile",
            "number_phone_it/other",
            "number_phone_de/landline",
            "number_phone_de/mobile",
            "number_phone_de/other",
            "number_phone_fr/landline",
            "number_phone_fr/mobile",
            "number_phone_fr/other",
            "number_phone_es/landline",
            "number_phone_es/mobile",
            "number_phone_es/other",
            "number_phone_zh/landline",
            "number_phone_zh/mobile",
            "number_phone_zh/tollfree",
            "date_eng",
            "date_ger",
            "date_fre",
            "date_ita",
            "date_spa",
            "date_chi",
            "holidays_eng",
            "holidays",
            "internet",
            "internet/host",
            "internet/https",
            "internet/file",
            "internet/ftp",
            "internet/news",
            "internet/telnet",
            "internet/gopher",
            "internet_email",
            "ip_address",
            "medical_conditions_eng",
            "medical_conditions",
            "number_cc",
            "nationalinsurance_gb",
            "socialsecurity_us",
            "socialinsurance_ca",
            "teams_eng",
            "teams",
            "licenseplate_us",
            "licenseplate_gb",
            "licenseplate_fr",
            "licenseplate_de",
            "licenseplate_ca",
            "driverslicense_us",
            "driverslicense_gb",
            "driverslicense_fr",
            "driverslicense_de",
            "driverslicense_ca",
            "bankaccount_ca",
            "bankaccount_fr",
            "bankaccount_gb",
            "bankaccount_de",
            "bankaccount_ie",
            "bankaccount_us",
            "universities_eng",
            "universities",
            "compliance_eng",
            "profanities",
            "file_hash",
            "films_eng",
            "films"
        ]
    },
    "componenttype": {
        "enum": [
            "AMPM",
            "CHECKSUM",
            "DATE",
            "DAYSAFTER",
            "DAYSBEFORE",
            "DAYWEEKMONTH",
            "DOMAIN",
            "EW",
            "FIXED_DATE",
            "HOST",
            "HOUR12",
            "HOUR24",
            "LAT_DECIMAL",
            "LAT_DEGREES",
            "LAT_MINUTES",
            "LAT_SECONDS",
            "LOCAL",
            "LONG_DECIMAL",
            "LONG_DEGREES",
            "LONG_MINUTES",
            "LONG_SECONDS",
            "MINUTES",
            "MONTH",
            "NS",
            "NUMBER",
            "PASSWORD",
            "PORT",
            "REL_DAY",
            "REL_EASTER",
            "REL_MONTH",
            "REL_REL_WEEK",
            "REL_WEEK",
            "REL_WEEKFROMNOW",
            "REL_YEAR",
            "SECONDS",
            "TIMEZONE",
            "USER",
            "WEEK_OF_MONTH",
            "WEEKDAY",
            "WEEKDAYAFTER",
            "WEEKDAYBEFORE",
            "WEEKSAFTER",
            "WEEKSBEFORE",
            "YEAR",
            "YEARSHORT"
        ]
    },
    "properties": {
        "entities": {
            "items": {
                "properties": {
                    "additional_information": {
                        "properties": {
                            "person_profession": {
                                "items": {
                                    "type": "string"
                                },
                                "type": "array"
                            },
                            "person_date_of_birth": {
                                "type": "string"
                            },
                            "person_date_of_death": {
                                "type": "string"
                            },
                            "lon": {
                                "type": "number"
                            },
                            "lat": {
                                "type": "number"
                            },
                            "country": {
                                "type": "string"
                            },
                            "wikidata_id": {
                                "type": "number"
                            },
                            "wikipedia_eng": {
                                "type": "string"
                            },
                            "image": {
                                "type": "string"
                            },
                            "company_ric": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "company_google": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "company_yahoo": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "company_wikipedia": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "disease_icd10": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "disease_diseasesdb": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "url_homepage": {
                                "type": "string"
                            },
                            "film_director": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_producer": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_writer": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_starring": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_composer": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_studio": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_year": {
                                "type": "number"
                            },
                            "film_runtime": {
                                "type": "number"
                            },
                            "film_country": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_language": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_screenwriter": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_imdb": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_dir_photography": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_distributor": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "film_budget": {
                                "type": "string"
                            },
                            "film_gross": {
                                "type": "string"
                            },
                            "film_genre": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "place_timezone": {
                                "type": "number"
                            },
                            "place_population": {
                                "type": "integer"
                            },
                            "place_country_code": {
                                "type": "string"
                            },
                            "place_region1": {
                                "type": "string"
                            },
                            "place_region2": {
                                "type": "string"
                            },
                            "place_elevation": {
                                "type": "integer"
                            },
                            "place_type": {
                                "type": "string"
                            },
                            "place_continent": {
                                "type": "string"
                            },
                            "team_sport": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "team_league": {
                                "type": "string"
                            },
                            "language_family": {
                                "type": "string"
                            },
                            "language_iso639_1": {
                                "type": "string"
                            },
                            "language_iso639_2": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "language_iso639_3": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "language_script": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "language_group": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "language_official_country_code": {
                                "type": "array",
                                "items": {
                                    "type": "string"
                                }
                            },
                            "language_nativespeakers": {
                                "type": "number"
                            }
                        },
                        "type": "object"
                    },
                    "components": {
                        "items": {
                            "properties": {
                                "original_length": {
                                    "type": "number"
                                },
                                "original_text": {
                                    "type": "string"
                                },
                                "type": {
                                    "$ref": "#/componenttype"
                                }
                            },
                            "required": [
                                "original_length",
                                "original_text",
                                "type"
                            ],
                            "type": "object"
                        },
                        "type": "array"
                    },
                    "normalized_date": {
                        "type": "string"
                    },
                    "normalized_length": {
                        "type": "number"
                    },
                    "normalized_text": {
                        "type": "string"
                    },
                    "matches": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "original_length": {
                                    "type": "number"
                                },
                                "original_text": {
                                    "type": "string"
                                },
                                "offset": {
                                    "type": "number"
                                }
                            },
                            "required": [
                                "original_text",
                                "original_length",
                                "offset"
                            ]
                        }
                    },
                    "score": {
                        "type": "number"
                    },
                    "type": {
                        "$ref": "#/entitytype"
                    },
                    "documentIndex": {
                        "type": "integer"
                    }
                },
                "required": [
                    "normalized_text",
                    "matches",
                    "type",
                    "normalized_length",
                    "score",
                    "components"
                ],
                "type": "object"
            },
            "type": "array"
        }
    },
    "required": [
        "entities"
    ],
    "type": "object"
}
https://api.havenondemand.com/1/api/sync/extractentities/v2
/api/api-example/1/api/sync/extractentities/v2
Examples
See this API for yourself - select one of our examples below.
Quote
President Barack Obama paid tribute to anti-apartheid hero Nelson Mandela as he flew to South Africa on Friday but played down expectations of a meeting with the ailing black leader during an Africa tour promoting democracy and food security
Quote
The new website http://www.havenondemand.com converts your raw content in all its forms into a data resource that you can search and analyze.
Web Site
Extract company names from BBC Business News Web Site
Web Site
Get the dates of birth and a picture of all the people mentioned in this movie review
Parameters
Required
Add another value
Select files Change Remove
Add another value
Add another value
Name Type Value
entity_type
array
Optional
Name Type Value
show_alternatives
boolean
(Default: False)


Async – Response An error occurred making the API request
Response Code:
Response Body

	
Making API Request…
Checking result of job

To try this API with your own data and use it in your own applications, you need an API Key. You can create an API Key from your account page - API Keys.

Output Refresh An error occurred making the API request View Input
Rendered RawHtml Response
Result Display
Response Code:
Response Body:

			
Make this call with curl

Version 2 (2015-11-30)

This page outlines the changes to the Entity Extraction API from the previous version.

  • The response format has been improved.

    In particular, the response now includes a matches array, which includes details of all the individual matches in the input text for a particular entity.

    For example, if the input text contains both Barack Obama and President Obama, the response displays the matching entity only once, but the matches property contains details of both matches.

  • The unique_entities parameter has been removed. The improved response format means that this parameter is no longer necessary.


If you would like to provide us with more information then please use the box below:

We will use your submission to help improve our product.