Product: PDF SDK
Product Version: Python 3
Please give a brief summary of your issue:
I need to get the underlying text elements corresponding to the TextExtractor output. I’ve already matched elements with lines and words using the bounding boxes. In some cases the elements overlap, e.g. a textual watermark in the background and normal text in the foreground. If I use just bounding boxes, I’ll get possible collisions.
Please describe your issue and provide steps to reproduce it:
Is there a way to use any extra information to know which elements correspond do the foreground or background text (or even associated with a form xobject)? The TextExtractor class is able to pull out the text cleanly.
Please provide a link to a minimal sample where the issue is reproducible: