I have been playing around with PdfBox and PDFTextStripperByArea method.
I was able to extract information if the text is bold or italic, but I'm unable to get the underline information.
As far as I understand it in PDF, underline is done by drawing lines. So in theory I should be able to get some sort of information about lines somewhere around the text. Giving this information I could then find out if either text is underlined or in a table.
Here is my code so far:
List<TextPosition> textPos = charactersByArticle.get(index);
for (TextPosition t : textPos)
if (t.getFont().getFontDescriptor() != null)
if (t.getFont().getFontDescriptor().getFontWeight() > BOLD_WEIGHT ||
isBold = true;
if (t.getFont().getFontDescriptor().isItalic())
isItalic = true;
I have tried to play around the PDGraphicsState object which is processed in the processEncodedText method in PDFStreamEngine class but no information of lines found there.
Any suggestions where this information could be retrieved from ?
question from:https://stackoverflow.com/questions/13948853/pdf-find-out-if-text-is-underlined-or-a-table-cell