Public Text Indexes

The following data sets contain content that you can use to search for information, for example using the Query Text Index, Find Similar, and Find Related Concepts APIs. The following section lists the fields that the public data set results return. For more information about the Haven OnDemand index field types, see Index Field Types.

Wikipedia

The Wikipedia data sets provide a searchable index of content from www.wikipedia.org. The content is available in the following languages:

Title Description Index Name
English Wikipedia https://en.wikipedia.org/ wiki_eng
French Wikipedia https://fr.wikipedia.org/ wiki_fra
Spanish Wikipedia https://es.wikipedia.org/ wiki_spa
Chinese Wikipedia https://zh.wikipedia.org/ wiki_chi
German Wikipedia https://de.wikipedia.org/ wiki_ger
Italian Wikipedia https://it.wikipedia.org/ wiki_ita

More Detail



For Wikipedia, all fields of Parametric type are default print fields1, if they are present in the referenced articles.

Field Name JSON Type Haven OnDemand Index Type Field Description
The following fields are available for all the Wikipedia content.
content String Index  
content_length Number Store Only  
modified_date Timestamp Date  
reference String Reference  
title String Index  
wikidata Number Numeric  
wikipedia_alias String Index Aliases of the title of this page (obtained from aliases in wikidata), returned as pipe-separated strings, for example, {“wikipedia_alias”: [ “Barack Obama|President Obama|Barack Hussein Obama” ]}.
wikipedia_alttitle String Index Alternative titles for this page (obtained from pages that redirect here), returned as pipe-separated strings.
wikipedia_category String Parametric Categories into which this page has been placed. Multiple categories are returned as separate items in a list, for example, { “wikipedia_category” : [ “Government”, “Government institutions” ] }.
wikipedia_id Number Numeric The internal Wikipedia ID of the page. It is unique for a page, but not all IDs are used because of deleted pages.
wikipedia_rank Number Rank A measure of the importance of this page (obtained from the number of links to this page and its redirects).
wikipedia_type Enum Parametric A broad classification of this page (for example, person, place, book), translated into the language of the index.
The following fields are available for entries for people.  
person_date_of_birth Date Date  
person_date_of_death Date Date  
person_profession String Parametric  
The following fields are available for entries for places.  
lat Number Numeric Latitude.
lon Number Numeric Longitude.
place_country_code Enum Parametric  
place_elevation Number Numeric Height above sea level in meters.
place_population Number Numeric  
place_region1 String Index First level region of the item. (US state, UK country, French region and so on).
place_region2 String Index Second level region of the item. (US county, UK county, French departement and so on).

1Default print fields are fields which appear in responses when the print parameter is set to fields and the print_fields parameter is not set. The title and reference fields are default print fields in all queries. Some Public Text Indexes have additional default print fields.

News

The News data set provides a searchable index of abstracts from leading News Websites, which is updated in real time.

The News data sets are available for the languages shown below. The data index has a start date of 2nd May 2014.

Title Description Index Name
News - English
News feed in English news_eng

The News datasets include information from the following News Websites:

  • New York Times
  • The Telegraph
  • The Guardian
  • Fox News
  • Reuters
  • Huffington Post
  • Wall Street Journal
  • CNN
  • BBC
  • Sky
  • Houston Chronicles
  • RT.com
  • The Australian
  • News.com.au
  • LA Times
News - French
News feed in French news_fra

The News datasets include information from the following News Websites:

  • Le Monde
  • L'Express
  • Le Figaro
  • Le Nouvel Observateur
  • Le Point
  • Rue89
  • L'Internaute
  • France 24
News - Italian
News feed in Italian news_ita

The News datasets include information from the following News Websites:

  • Corriere Della Sera
  • TG COM 24
  • TG COM
  • Adnkronos
  • Il Sole 24 Ore
  • Il Messagero
  • La Stampa
  • La Repubblica
  • ANSA
News - German
News feed in German news_ger

The News datasets include information from the following News Websites:

  • Die Welt
  • Berliner Morgenpost
  • Spiegel Online
  • N-TV
  • Deutsche Welle

More Detail

The following table describes the fields that are available for all the News content.

Field Name JSON Type Haven OnDemand Index Type
company String Parametric
content String Index
created_date Timestamp Date
person String Parametric
place String Parametric
reference String Reference
rss_url String Parametric
rss_rank Number Rank
rss_copyright String Store Only
rss_source String Parametric
rss_category String Parametric
title String Index

World Factbook

The World Factbook index is an index of the CIA world factbook, which contains a variety of structured and unstructured information for each country.

Title Description Index Name
CIA World Factbook CIA World Factbook world_factbook

More Detail

The following table describes the fields that are available for all the World Factbook content.

The field names in this text index are based on the field names in the CIA World Factbook. In the text index, spaces in the original field name are converted to underscores, and all field names have the prefix wfb_. For information about the content in each field, see https://www.cia.gov/library/publications/the-world-factbook/docs/notesanddefs.html

Field Name JSON Type Haven OnDemand Index Type
wfb_Background String Index
wfb_Location String Index
wfb_Climate String Index
wfb_Terrain String Index
wfb_Natural_resources String Index
wfb_Natural_hazards String Index
wfb_Natural_hazards String Index
wfb_Ethnic_groups String Index
wfb_Religions String Index
wfb_Dependency_status String Index
wfb_Constitution String Index
wfb_Legal_system String Index
wfb_Legislative_branch String Index
wfb_Judicial_branch String Index
wfb_Political_parties_and_leaders String Index
wfb_International_organization_participation String Index
wfb_Flag_description String Index
wfb_Economy_-_overview String Index
wfb_Industries-_overview String Index
wfb_Agriculture_-_products String Index
wfb_Exports_-_commodities String Index
wfb_Imports_-_commodities String Index
wfb_Illicit_drugs String Index
wfb_Area Number Numeric
wfb_Land_boundaries Number Numeric
wfb_Coastline Number Numeric
wfb_Irrigated_land Number Numeric
wfb_Total_renewable_water_resources Number Numeric
wfb_Irrigated_land Number Numeric
wfb_Population Number Numeric
wfb_Median_age Number Numeric
wfb_Population_growth_rate Number Numeric
wfb_Birth_rate Number Numeric
wfb_Death_rate Number Numeric
wfb_Net_migration_rate Number Numeric
wfb_Infant_mortality_rate Number Numeric
wfb_Total_fertility_rate Number Numeric
wfb_Health_expenditures Number Numeric
wfb_Physicians_density Number Numeric
wfb_Hospital_bed_density Number Numeric
wfb_Obesity_-_adult_prevalence_rate Number Numeric
wfb_Unemployment Number Numeric
wfb__youth_ages_15-24 Number Numeric
wfb_School_life_expectancy_(primary_to_tertiary_education) Number Numeric
wfb_GDP__purchasing_power_parity Number Numeric
wfb_GDP__official_exchange_rate Number Numeric
wfb_GDP_-_real_growth_rate Number Numeric
wfb_GDP_-_per_capita__PPP Number Numeric
wfb_Gross_national_saving Number Numeric
wfb_Labor_force Number Numeric
wfb_GDP_-_per_capita__PPP Number Numeric
wfb_Inflation_rate__consumer_prices Number Numeric
wfb_Public_debt Number Numeric
wfb_Central_bank_discount_rate Number Numeric
wfb_Stock_of_domestic_credit Number Numeric
wfb_Exports Number Numeric
wfb_Imports Number Numeric
wfb_Electricity_-_production Number Numeric
wfb_Electricity_-_consumption Number Numeric
reference String Reference

Patents

The Patents data set provides a searchable index of US patents filed since January 2002.

Title Description Index Name
US Patents US Patents 2002-2014 patents

More Detail



The following table describes the fields that are available for all the Patents content.

For Patents, all fields of Parametric type are default print fields1, if they are present in the referenced articles.

Field Name JSON Type Haven OnDemand Index Type Field Description
patent_pub_date String Date Publication date of the patent.
patent_pub_country_code Enum Parametric Country in which the patent was published. This field has only one value for this database, us.
patent_inventor String Parametric, Index  
patent_app_number String Parametric Application number of the patent.
patent_app_date String Date Application date of the patent.
patent_assignee_code Enum Parametric  
patent_assignee Number Parametric Person or business granted ownership interest in a patent application by assignment from the inventor.
patent_ipc_class Enum Parametric International Patent Classification.
patent_domestic_class String Store Only Domestic patent classification, as issued by the U.S. Patent and Trademark Office (USPTO).
patent_national_class String Store Only Patent classification, as issued by the U.S. Patent and Trademark Office (USPTO), with cross-references.
patent_image String Store Only  
reference String Reference  

1Default print fields are fields which appear in responses when the print parameter is set to fields and the print_fields parameter is not set. The title and reference fields are default print fields in all queries. Some Public Text Indexes have additional default print fields.

Arxiv

Arxiv is an index of the collection of over 900,000 scientific papers made available at http://arxiv.org.

Title Description Index Name
Arxiv Arxiv scientific papers arxiv

More Detail

The following table describes the fields that are available for all the Arxiv content.

Field Name JSON Type Haven OnDemand Index Type
content String Index
arxiv_identifier String Parametric
reference String Reference

Transport

The Transport data set provides a searchable list of airports and train stations.

Title Description Index Name
Transport World airports and stations transport

More Detail



The following table describes the fields that are available for all the Transport content.

For Transport, all fields of Parametric type are default print fields1, if they are present in the referenced articles.

Field Name JSON Type Haven OnDemand Index Type Field Description
title String Index  
place_location String Index Information on the location of the item. Generally a town or city.
place_region1 String Index First level region of the item (US state, UK country, French region, and so on).
place_region3 String Index Third level region of the item (for example, UK borough).
place_country_code Enum Parametric The two letter ISO 3166 'alpha2' country code.
transport_code String Parametric The National code of this item.
transport_faa String Parametric The airport's Federal Aviation Administration code (for US airports), three or four alphanumeric characters.
transport_iata String Parametric The airport's International Air Transport Association code, three letters.
transport_icao String Parametric The airport's International Civil Aviation Organization code, four letters.
transport_line String Parametric Populated for metro lines, for example, Line C, Line 12, Blue Line, Milan Metro Line 2.
transport_nlc Number Numeric National Location Code. A four-digit number allocated to railway stations and ticket issuing points in the UK. Only present if place_country_code is 'gb'.
transport_system String Parametric The transport system of which this item is part (for example, Paris Metro).
transport_type Enum Parametric Type of public transport (rail, underground, and so on).
transport_zone String Parametric Local transport zone.
lat Number Numeric Latitude.
lon Number Numeric Longitude.
place_elevation Number Numeric Height above sea level in meters.
transport_volume Number Numeric Yearly passenger volume.
wikidata Number Numeric  
date_opening Timestamp Date  
wikipedia_eng String Store Only  
alias_eng String Store Only  
wikipedia_image String Store Only  
transport_platforms String Store Only The number of platforms.
place_postalcode String Store Only The postal code in local format (for example, ZIP / Postcode).
transport_operator String Store Only The operating company of this item.
reference String Reference  

1Default print fields are fields which appear in responses when the print parameter is set to fields and the print_fields parameter is not set. The title and reference fields are default print fields in all queries. Some Public Text Indexes have additional default print fields.