Supported Document and Container File Formats
KeyView to provide a range of text extraction and format conversion APIs against a wide range of data formats.

Supported Document and Container File Formats

Haven OnDemand uses HPE KeyView to provide a range of text extraction and format conversion APIs against a wide range of data formats.

The sections below describe the different functions that are supported for various file formats.

The tables in this section use the following symbols to describe format support:

Symbol Description
Y

Format is supported.

N

Format is not supported.

P Partial metadata is extracted from this format. Some non-standard fields are not extracted. (ExtractMetadata only)
T Only text is extracted from this format. Formatting information is not extracted. (Near-Native Viewing only)

Note: The file extensions in this section are for illustration only. The APIs detect the file type of a file, rather than using the file extension to derive the format.

Text Extraction and Conversion APIs

The tables in this section describe the formats supported by the following APIs:

The Text Extraction API also includes Metadata extraction and Subfile extraction for many file formats. This support is also listed in the tables below.

It includes the following groups:

Word Processing Formats

Format Common Extensions View Document Text Extraction Metadata Extraction Subfile Extraction
Adobe FrameMaker Interchange Format MIF Y Y N N
Apple iChat Log ICHAT Y Y N N
Apple iWork Pages GZ Y Y Y N
Applix Words AW Y Y N N

Corel WordPerfect Linux
Corel WordPerfect Windows

WPS
WPD
Y Y P N
Corel WordPerfect Macintosh WPM Y Y N N
DisplayWrite IP Y Y N N
Folio Flat File FFF Y Y Y N
Founder Chinese E-paper Basic CEB N Y N N
Fujitsu Oasys OA2 Y Y P N
Haansoft Hangul HWP N Y N N
HWP T Y Y Y
Health level7 HL7 Y Y Y N
IBM DCA/RFT (Revisable Form Text) DC Y Y N N
JustSystems Ichitaro JTD Y Y P N
Lotus AMI Pro
Lotus Word Pro
SAM
LWP
Y Y P N
Lotus AMI Professional Write Plus AMI Y Y N N
Lotus SmartMaster MWP Y Y N N
Microsoft Word PC
Microsoft Word Windows versions 1.0 and 2.0
DOC Y Y N N
Microsoft Word Macintosh 4 to 6, and 98 DOC Y Y Y N
Microsoft Word Windows1
Microsoft Word Windows XML
Microsoft Word Macintosh 2001, v.X, and 2004

DOC
DOT
DOCM
DOCX
DOTX
DOTM

Y Y Y Y
Microsoft Works WPS Y Y N N
Microsoft Windows Write WRI Y Y N N
OASIS Open Document Format ODT
SXW
STW
Y Y Y Y
Omni Outliner OO3
OPML
OOUTLINE
Y Y N N
OpenOffice Writer
StarOffice Writer
SXW
ODT
T Y Y N
Open Publication Structure eBook EPUB Y Y Y N
Skype Log DBB Y Y N N
WordPad RTF Y Y P N
XML Paper Specification XPS T Y N N
XyWrite XY4 Y Y N N
Yahoo! Instant Messenger DAT Y Y N N

1 Subfile extraction is not supported for versions earlier than Microsoft Word 97.

Presentation Formats

Format Common Extensions View Document Text Extraction Metadata Extraction Subfile Extraction
Apple iWork Keynote GZ Y Y Y N
Applix Presents AG Y Y N N
Corel Presentations SHW Y Y N N
Extensible Forms Description Language XFD
XFDL
Y Y Y N
Lotus Freelance Graphics PRZ Y Y N N
Lotus Freelance Graphics 2 PRE Y Y N N
Macromedia Flash SWF Y Y N N
Microsoft OneNote ONE
ONETOC2
Y Y N Y
Microsoft PowerPoint Macintosh PPT
PPS
POT
Y Y P N
Microsoft PowerPoint PC
Microsoft PowerPoint Windows 95
PPT Y Y P N
Microsoft PowerPoint Windows 97 to 2003 PPT
PPS
POT
Y Y P Y
Microsoft PowerPoint Windows XML 2007 to 2013 PPTX
PPTM
POTX
POTM
PPSX
PPSM
Y Y Y Y
Microsoft Publisher PUB T Y Y Y
OASIS Open Document Format SXD
SXI
ODG
ODP
Y Y Y Y
OpenOffice Impress
StarOffice Impress
SXI
SXP
ODP
T Y Y N

Spreadsheet Formats

Format Common Extensions View Document Text Extraction Metadata Extraction Subfile Extraction
Apple iWork Numbers GZ Y Y Y N
Applix Spreadsheets AS Y Y N N
Comma Separated Values CSV Y Y N N
Corel Quattro Pro WB2
WB3
Y Y P N
QPW N Y P N
Data Interchange Format   Y Y N N
Lotus 1-2-3 123 Y Y P N
Lotus 1-2-3 WK4 Y Y N N
Lotus 1-2-3 Charts 123 Y N N N
Microsoft Excel Charts XLS Y N N N
Microsoft Excel Windows 2.2 to 2003
Microsoft Excel Windows XML 2007 to 2013
Microsoft Excel Macintosh 98 to 2004
XLS
XLW
XLT
XLSX
XLTX
XLSM
XLTM
Y Y Y Y
Microsoft Excel Binary Format XLSB Y Y N N
Microsoft Works Spreadsheet S30
S40
Y Y N N
OASIS Open Document Format ODS
SXC
STC
Y Y Y Y
OpenOffice Calc
StarOffice Calc
SXC
ODS
OTS
T Y Y N

Mail Formats

Format Common Extensions View Document Text Extraction Metadata Extraction Subfile Extraction
Documentum EMCMF EMCMF Y N Y Y
Domino XML Language1 DXL Y N Y Y
GroupWise FileSurf GWFS Y N Y Y
Legato Extender ONM Y N Y Y
Microsoft Outlook MSG
OFT
T Y Y Y
Microsoft Outlook DBX DBX Y N Y Y
Microsoft Outlook Express EML T Y Y Y
Microsoft Outlook for Macintosh OLM Y N N Y
Text Mail (MIME) various T Y Y Y
various T Y Y Y
Transport Neutral Encapsulation Format various Y N Y Y

1 Only supports non-encrypted embedded files.

Text and Markup Formats

Format Extension View Document Text Extraction Metadata Extraction Subfile Extraction
ANSI
ASCII
Unicode Text
TXT Y Y N N
HTML HTM Y Y P N
Unicode HTML
XHTML
HTM Y Y Y N
Microsoft Visio XML VDX
VTX
T Y Y N
MIME HTML MHT Y Y Y N
Rich Text Format RTF Y Y P N
XML (generic)
Microsoft Excel Windows XML
Microsoft Word Windows XML
XML T Y Y N

Computer-Aided Design Formats

Format Extension View Document Text Extraction Metadata Extraction Subfile Extraction
AutoCAD Drawing DWG Y1 Y Y N
AutoCAD Drawing Exchange DXF T Y Y N
CATIA formats CAT2 N Y Y N
Microsoft Visio 4 to 2010 VSD Y Y Y Y3
VSS
VST
Y Y Y N
Microsoft Visio 2013 VSDM
VSSM
VSTM
VSDX
VSSX
VSTX
Y Y Y Y

1 Graphic rendering is supported for versions R13, R14, R15, and R18 (2004). For other versions, only text extraction is supported.

2 All CAT file extensions, for example CATDrawing, CATProduct, CATPart, and so on.

3 Extraction of embedded OLE objects is supported for Text Extraction on Windows only.

Other Formats

Format Extension View Document Text Extraction Metadata Extraction Subfile Extraction
Adobe PDF PDF Y Y Y Y1
Microsoft Project MPP Y Y Y Y

1 Includes support for extraction of subfiles from PDF Portfolio documents.

Graphic Formats

Format Extension View Document Text Extraction Metadata Extraction Subfile Extraction
Computer Graphics Metafile CGM Y Y N N
CorelDRAW1 CDR Y N N N
DCX Fax System DCX Y N N N
Digital Imaging & Communications in Medicine (DICOM) DCM N N Y N
Encapsulated PostScript (raster) EPS Y N N N
Enhanced Metafile EMF Y Y Y N
GIF GIF Y N Y N
JBIG2 JBIG2 Y N N N
JPEG JPEG Y N Y N
Lotus AMIDraw Graphics SDW Y N N N
Lotus Pic PIC Y Y N N
Macintosh Raster PIC
PCT
Y N N N
MacPaint PNTG Y N N N
Microsoft Office Drawing MSO Y N N N
Omni Graffle GRAFFLE N Y Y N
PC PaintBrush PCX Y N N N
Portable Network Graphics PNG Y N Y N
SGI RGB Image RGB Y N N N
Sun Raster Image RS Y N N N
Tagged Image File TIFF Y N Y N
Truevision Targa TGA Y N N N
Windows Animated Cursor ANI Y N N N
Windows Bitmap BMP Y N Y N
Windows Icon Cursor ICO Y N N N
Windows Metafile WMF Y Y N N
WordPerfect Graphics 1
WordPerfect Graphics 2
WPG Y N N N

1 CDR/CDR with TIFF header.

Expand Container

This section describes the archive formats that the Expand Container API can extract files from.

Note: The Expand Container API can also extract subfiles (such as embedded documents) from file types listed in the previous sections with a Y in the Subfile Extraction column.

Format Extension Expand Container
7-Zip 7Z Y
GZIP GZ Y
PKZIP and WinZip ZIP Y
RAR archive RAR Y
Tape Archive TAR Y
Lotus Notes database NSF Y
Mailbox1 MBX Y
Microsoft Entourage Database various Y
Microsoft Outlook Offline Storage File OST Y
Microsoft Outlook Personal Folder PST Y

1 MBX files created by Eudora Email, and Mozilla Thunderbird are supported. MBX files created by other common mail applications are typically filtered, converted, and displayed.