PDF glyphs sometimes appear correct on some devices and missing on other devices

Question:

How do I repair a PDF file that does not have embedded fonts, and on some devices glyphs are missing?

I notice that if I select and copy+paste the text from the PDF in any PDF reader then I get garbage text output. That is, the text sometimes looks correct, depending on the device, but the extracted text is always meaningless.

Answer:

The issue is that the PDF font is

a) not embedded in the original document, and
b) does not have a proper ToUnicode map, instead using indices into the specific (non-embedded) font file to indicate glyphs.

(a) happens pretty often, and PDFTron normally is able to find a substitute font, but (b) makes it so that this is not possible, and the file can only be displayed on a system that has the Exact font file available.

The following are possible next steps to resolve this issue.

  1. Contact the author and have them embed the font in the PDF.
  2. If you have access to the font then you can use our Optimizer class server side to embed the missing font.
  3. If you have access to the font, but do not want to modify the PDF (perhaps it is digitally signed) then you would have to make the font available at the time of viewing the file.

Solution (1) is the ideal/best solution as only the author of the PDF knows what it should look like.

For solution (2) above you can use our Optimizer class to attempt to repair PDF files with bad font references in them.

using (PDFDoc doc = new PDFDoc(input_pdf_path))
{
	if (!doc.InitSecurityHandler()) return; // password protected

	Optimizer.TextSettings textSettings = new Optimizer.TextSettings();
	textSettings.EmbedFonts(true);
	textSettings.SubsetFonts(true);

	Optimizer.OptimizerSettings optimizerSettings = new Optimizer.OptimizerSettings();
	optimizerSettings.SetTextSettings(textSettings);

	Optimizer.Optimize(doc, optimizerSettings);

	doc.Save(output_pdf_path, SDFDoc.SaveOptions.e_linearized);
}

WARNING that for the above to actually work you need to run this on a device with the correct font file. If this is run on a server that does not have the correct font file then possibly the text will never look correct on any device afterwards.