Can PDFNet also detect whether or not a character is a superscript or
However, looking at the PDF reference guide, it looks like
superscripts are usually done by setting the "Text Rise" parameter.
Would this be the only way to detect a superscript or subscript?
I have found the function GState::GetTextRise() but it always returns
0, including all elements in the attached file where there is clearly
a superscript. Am I doing something wrong? Is this an incorrect method
for determining if text is subscript or superscript?
It is sometimes possible to recognize that some text is superscript or
subscript using text-rise parameter in the graphics state (i.e.
GState::GetTextRise()), however this is not very reliable because many
PDF documents draw superscript/subscript text by providing an explicit
text matrix instead or using text-rise parameter.
Using CosEdit utility (www.pdftron.com/cosedit) you can peek into the
content stream of the given PDF:
/TT0 1 Tf
-0.0009 Tc 0.0009 Tw 0 Ts 100 Tz 0 Tr 12 0 0 12 90 703.92 Tm
/TT0 1 Tf
0.0038 Tc 0 Tw 7.98 0 0 7.98 187.62 714.42 Tm
As you can see superscript '1' is represented using a separate text
As a result, comparing a bounding boxes for the text run
(element.GetBBox()) relative to the bounding box to neighbor elements
may be a better way to identify text as being superscript or subscript.