How to extract cropped text

Anatoly_Kudrevatukh · October 22, 2013, 5:35pm

Q: The text can be cropped due to the visual page dimensions, but we need to retrieve all text even if it is outside of the page bounds. Could you let me know if there is any way to do this using the TextExtractor.

If you want to know exact crop box that includes all elements you can use element reader to get a union of all bboxes on the page.

Alternatively according to PDF spec the maximum page dimensions should be 14,400 by 14,400 units, so you can use that as a crop box value.
A: You can do it by adjusting crop box on the page (page.SetCropBix(big_rect)) prior to text extraction.