public class SXWPFWordExtractorDecorator extends AbstractOOXMLExtractor
This will be better for some use cases than the classic docx extractor; and, it will be worse for others.
config, EMBEDDED_RELATIONSHIPS, extractor| Constructor and Description |
|---|
SXWPFWordExtractorDecorator(org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context,
XWPFEventBasedWordExtractor extractor) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml)
Populates the
XHTMLContentHandler object received as parameter. |
protected List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
This returns all items that might contain embedded objects:
main document, headers, footers, comments, etc.
|
getDocument, getEmbeddedPartMetadataMap, getJustFileName, getMetadataExtractor, getXHTML, handleEmbeddedFile, loadLinkedRelationshipspublic SXWPFWordExtractorDecorator(org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context,
XWPFEventBasedWordExtractor extractor)
protected void buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml)
throws SAXException,
org.apache.xmlbeans.XmlException,
IOException
AbstractOOXMLExtractorXHTMLContentHandler object received as parameter.buildXHTML in class AbstractOOXMLExtractorSAXExceptionorg.apache.xmlbeans.XmlExceptionIOExceptionprotected List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
getMainDocumentParts in class AbstractOOXMLExtractorCopyright © 2007–2024 The Apache Software Foundation. All rights reserved.