Element BBox for text in Form Object does not match TextExtraction Word BBox

david3 · August 29, 2024, 7:06am

Product: PDF SDK

Product Version: Python 3.10

Please give a brief summary of your issue:
The value returned by Element.GetBBox() doesn’t match the TextExtraction output for the Word.GetBBox()

Please describe your issue and provide steps to reproduce it:
I am trying to match the text extraction output to the text elements. It works well for text that is not within a form object, however matching with BBox intersection doesn’t work for elements that are within a form object.

I’m reading from the form as follows:
reader.Begin(obj, page.GetResourceDict())
// iterate through elements, grab text elements, look at element.GetBBox()
reader.End()

The bounding boxes aren’t far off form one another, so I wonder if there’s a missing transform somewhere?

Ryan · August 29, 2024, 9:02pm

To investigate further could you please provide the following information.

Input file(s)
Generated output
Code and settings used to generate (2) from (1)
Screenshots showing the output, and clearly indicating what you expected to get instead, and also clearly indicating the application/browser being used to view.