I need to get the underlying text elements corresponding to the TextExtractor output. I’ve already matched elements with lines and words using the bounding boxes. In some cases the elements overlap, e.g. a textual watermark in the background and normal text in the foreground. If I use just bounding boxes, I’ll get possible collisions.

Is there a way to use any extra information to know which elements correspond do the foreground or background text (or even associated with a form xobject)? The TextExtractor class is able to pull out the text cleanly.

