Detecting broken/corrupt PDF documents


We have a customer who is receiving a bunch of bad PDF files. When these PDF files are opened with Acrobat, they are displayed fine, but when closing the file, you are prompted to save the file … which means that the error was recovered. These PDF files process OK, but the resulting output file has some content missing. When I process the same file again, but this time process it with the PDFTron Optimizer enabled with default settings, the resulting output PDF looks good.

Generally, customers would like us to be able to error out bad files, rather than get partially incorrect PDF files. Is there any way that this can be accomplished? Is there some way to detect cases like this? Or this just a situation where it is garbage in, garbage out? Thanks in advance!


We will take a look at these files, but in the meantime you can detect corrupt files that have been fixed using pdftron.PDF.PDFDoc.IsModified(). If IsModified() returns true, it means that there were some corruptions that caused changes in the document. You can then flag this type of files for review etc.