Rebuild bad pdf structure


I have a badly structured pdf file that I need to read and rebuild. Attempting to run Adobe Acrobat’s optimizer expands the 100 mb file into a greater than 2G working set and eventually failing. While Acrobat is processing, it says that the nesting of the page geometry boxes is wrong, and fonts are not shared across pages. That is, each page has it’s own copy of times new roman. I did not construct this pdf with the PDFNet sdk, but I would like to use a simple script to read over the elements and fix these two problems.

How would I re-nest the bounding boxes, and rebuild the font information across all pages? I’m using python, BTW.

Spencer Rathbun

L & D Mail Masters, Inc.

110 Security Pkwy

New Albany, IN 47150

812.981.7161 X.171

Fax: 812.981.7169

Did you try using ‘pdftron.PDF.Optimizer’ on the file? For a sample use case check out the Optimizer sample project (included as part of PDFNet SDK).

If this doesn’t help, we would probably need to take a look at the file for a more specific recommendation.