How to remove Tagged content from a PDF?

Question:

We are having some issues with a 3rd party tool and how it is failing to handle Tagged PDF files. How can we remove the Tagged data in hopes that the other tool will work properly?

Answer:

There are 3 ways to remove the Tagged data, each one involving a larger modification of the PDF.

  1. Set the MarkInfo/Marked boolean to false, or just delete (as we do in this case, since default is False anyway).
Obj root = pdfDoc.GetRoot();
root.Erase("MarkInfo");
  1. Do (1) above, and then also delete the Logical Structure.
Obj root = pdfDoc.GetRoot();
root.Erase("MarkInfo");
root.Erase("StructTreeRoot");
  1. Do (1) and (2) above, and then use the ElementEdit sample code to read each pages content, and NOT write back the following types.
e_marked_content_begin
e_marked_content_point
e_marked_content_end

Full code available upon request.

1 Like