public class PdfTextExtractor extends Object
| Constructor and Description |
|---|
PdfTextExtractor(PdfReader reader)
Creates a new Text Extractor object, using a
TextAssembler as the
render listener |
PdfTextExtractor(PdfReader reader,
boolean usePdfMarkupElements)
Creates a new Text Extractor object, using a
TextAssembler as the
render listener |
PdfTextExtractor(PdfReader reader,
TextAssembler renderListener)
Creates a new Text Extractor object.
|
| Modifier and Type | Method and Description |
|---|---|
String |
getTextFromPage(int page)
Gets the text from a page.
|
String |
getTextFromPage(int page,
boolean useContainerMarkup)
get the text from the page
|
void |
processContent(byte[] contentBytes,
PdfDictionary resources,
PdfContentStreamHandler handler)
Processes PDF syntax
|
public PdfTextExtractor(PdfReader reader)
TextAssembler as the
render listenerreader - the reader with the PDFpublic PdfTextExtractor(PdfReader reader, boolean usePdfMarkupElements)
TextAssembler as the
render listenerreader - the reader with the PDFusePdfMarkupElements - should we use higher level tags for PDF markup entities?public PdfTextExtractor(PdfReader reader, TextAssembler renderListener)
reader - the reader with the PDFrenderListener - the render listener that will be used to analyze renderText
operations and provide resultant text@Nonnull public String getTextFromPage(int page) throws IOException
page - the 1-based page number of pageIOException - on error@Nonnull public String getTextFromPage(int page, boolean useContainerMarkup) throws IOException
page - page number we are interested inuseContainerMarkup - should we put tags in for PDf markup container elements (not
really HTML at the moment).IOException - on errorpublic void processContent(byte[] contentBytes,
PdfDictionary resources,
PdfContentStreamHandler handler)
contentBytes - the bytes of a content streamresources - the resources that come with the content streamhandler - interprets events caused by recognition of operations in a
content stream.Copyright © 2019. All rights reserved.