public class PdfTextExtractor extends Object
Constructor and Description |
---|
PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object, using a
TextAssembler as the
render listener |
PdfTextExtractor(PdfReader reader,
boolean usePdfMarkupElements)
Creates a new Text Extractor object, using a
TextAssembler as the
render listener |
PdfTextExtractor(PdfReader reader,
TextAssembler renderListener)
Creates a new Text Extractor object.
|
Modifier and Type | Method and Description |
---|---|
String |
getTextFromPage(int page)
Gets the text from a page.
|
String |
getTextFromPage(int page,
boolean useContainerMarkup)
get the text from the page
|
void |
processContent(byte[] contentBytes,
PdfDictionary resources,
PdfContentStreamHandler handler)
Processes PDF syntax
|
public PdfTextExtractor(PdfReader reader)
TextAssembler
as the
render listenerreader
- the reader with the PDFpublic PdfTextExtractor(PdfReader reader, boolean usePdfMarkupElements)
TextAssembler
as the
render listenerreader
- the reader with the PDFusePdfMarkupElements
- should we use higher level tags for PDF markup entities?public PdfTextExtractor(PdfReader reader, TextAssembler renderListener)
reader
- the reader with the PDFrenderListener
- the render listener that will be used to analyze renderText
operations and provide resultant textpublic String getTextFromPage(int page) throws IOException
page
- the 1-based page number of pageIOException
- on errorpublic String getTextFromPage(int page, boolean useContainerMarkup) throws IOException
page
- page number we are interested inuseContainerMarkup
- should we put tags in for PDf markup container elements (not
really HTML at the moment).IOException
- on errorpublic void processContent(byte[] contentBytes, PdfDictionary resources, PdfContentStreamHandler handler)
contentBytes
- the bytes of a content streamresources
- the resources that come with the content streamhandler
- interprets events caused by recognition of operations in a
content stream.Copyright © 2018. All rights reserved.