Package org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
- java.lang.Object
-
- org.apache.poi.xwpf.extractor.XWPFWordExtractor
-
- All Implemented Interfaces:
Closeable,AutoCloseable,POITextExtractor,POIXMLTextExtractor
public class XWPFWordExtractor extends Object implements POIXMLTextExtractor
Helper class to extract text from an OOXML Word file
-
-
Field Summary
Fields Modifier and Type Field Description static List<XWPFRelation>SUPPORTED_TYPES
-
Constructor Summary
Constructors Constructor Description XWPFWordExtractor(OPCPackage container)XWPFWordExtractor(XWPFDocument document)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidappendBodyElementText(StringBuilder text, IBodyElement e)voidappendParagraphText(StringBuilder text, XWPFParagraph paragraph)XWPFDocumentgetDocument()Returns opened documentXWPFDocumentgetFilesystem()StringgetText()booleanisCloseFilesystem()voidsetCloseFilesystem(boolean doCloseFilesystem)voidsetConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)Should we concatenate phonetic runs in extraction.voidsetFetchHyperlinks(boolean fetch)Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.poi.ooxml.extractor.POIXMLTextExtractor
checkMaxTextSize, close, getCoreProperties, getCustomProperties, getExtendedProperties, getMetadataTextExtractor, getPackage
-
-
-
-
Field Detail
-
SUPPORTED_TYPES
public static final List<XWPFRelation> SUPPORTED_TYPES
-
-
Constructor Detail
-
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container) throws IOException
- Throws:
IOException
-
XWPFWordExtractor
public XWPFWordExtractor(XWPFDocument document)
-
-
Method Detail
-
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
-
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction. Default istrue- Parameters:
concatenatePhoneticRuns- If phonetic runs should be concatenated
-
getText
public String getText()
- Specified by:
getTextin interfacePOITextExtractor
-
appendBodyElementText
public void appendBodyElementText(StringBuilder text, IBodyElement e)
-
appendParagraphText
public void appendParagraphText(StringBuilder text, XWPFParagraph paragraph)
-
getDocument
public XWPFDocument getDocument()
Description copied from interface:POIXMLTextExtractorReturns opened document- Specified by:
getDocumentin interfacePOITextExtractor- Specified by:
getDocumentin interfacePOIXMLTextExtractor- Returns:
- the opened document
-
setCloseFilesystem
public void setCloseFilesystem(boolean doCloseFilesystem)
- Specified by:
setCloseFilesystemin interfacePOITextExtractor
-
isCloseFilesystem
public boolean isCloseFilesystem()
- Specified by:
isCloseFilesystemin interfacePOITextExtractor
-
getFilesystem
public XWPFDocument getFilesystem()
- Specified by:
getFilesystemin interfacePOITextExtractor
-
-