How do I remove pages in an existing PDF based on their content?

Aaron_Gravesdale · May 29, 2009, 11:02pm

Q: I have PDF files that have scattered throughout them disclaimer
pages, which I would like to remove programmatically. The pages always
have the same wording but where they appear in the files is random.
Essentially, the files are composed of multiple individual documents,
each of which is followed by a disclaimer page. However, the
disclaimer page may be the 2nd or 3rd page so its appearance is
unpredictable. Will PDFNet SDK (http://www.pdftron.com/pdfnet) allow
me to programmatically remove these pages?
------
A: You can use PDFNet SDK to programmatically remove pages in an
existing PDF based on their content.

As a starting point you may want to take a look at TextExtract sample
project (http://www.pdftron.com/pdfnet/samplecode.html#TextExtract).
Using pdftron.PDF.TextExtractor you should be able to search PDF pages
and detect the ones which should be removed.

One you know which page to remove you could simply call
pdfdoc.PageRemove(pgitr) – as shown in PDFPage sample project.