Reload font cache after initialization

sofusstray · March 16, 2023, 12:18pm

Product: PDFtron Python SDK

Product Version: PDFNetPython3 version 9.5.0.post1

Please give a brief summary of your issue:
We have a situation where we convert a .docx file to a .pdf file and want to support fonts available on Google Fonts without requiring our clients to embed the fonts beforehand. We do this by checking the fonts in the fontTable.xml in the .docx file and downloading the missing ones on Google Fonts.

This works without issue, but we do PDFNet.Initialize() when we initialize the function and then want to support calling it at any time through our API. But PDFNet.Initialize() also loads all available fonts and putting more fonts in after the initialization step isn’t recognised. We cannot use PDFNet.Terminate() because you can’t Initialize after Terminating.

As such, I want to ask if it’s possible for PDFTron to refresh and reload the available fonts at any point.

Please describe your issue and provide steps to reproduce it:

Download Roboto from Roboto - Google Fonts and put it into the /usr/share/fonts/truetype folder on linux.
Use PDFNet.Initialize()
Convert a .docx containing the Roboto to PDF and see that it also exists in the converted PDF.
Download Open Sans from Open Sans - Google Fonts and put it into the same folder in the same function after the initialization step.
Convert a .docx containing the Open Sans font into PDF and see that it does not exist in the converted PDF because it was added after PDFNet.Initialize() loaded the fonts in.

Please provide a link to a minimal sample where the issue is reproducible:

from PDFNetPython3 import Convert, Optimizer, SDFDoc, PDFNet, PDFDoc
doc = PDFDoc()
Convert.OfficeToPDF(pdfdoc, "roboto.docx" ) #assuming roboto.docx exists with that font
Convert.OfficeToPDF(pdfdoc, "opensans.docx" ) #assuming opensans.docx exists with that font
pdfdoc.Save("fonts.pdf", SDFDoc.e_linearized)
pdfdoc.Close()

Ryan · March 16, 2023, 11:43pm

We recommend (especially if you are processing client provided DOCX files) to run DOCX to PDF conversion in a child process. That child process can just run the one conversion, with the fonts you want, and then terminate.

Is running child processes an option for you?
If not, please elaborate.

sofusstray · March 17, 2023, 8:27am

Hi Ryan.

Yes that was the initial intention. To further clarify, we’re using AWS Lambda containers to run the conversion. The issue is that a single container is spawned for 15 minutes and will run multiple requests if it’s done with the first one.

For instance, if one client wants to convert their docx to PDF, then it spawns a Lambda container and runs it. If another client a minute or two later then wants their docx converted then the same Lambda container will run this process. But because this container has already run PDFNet.Initialize() as part of its startup process, it won’t run any custom fonts beyond those of the first person to spawn the container.

We do have a hack-solution at works currently, where we use the subprocess library to call the function that runs PDFTron, but having to completely reinitialize all of PDFTron together with all our other libraries takes longer time and is not a very elegant solution. It works, but I’m wondering if there is some way to just tell PDFTron to load the fonts again without having to spawn an entirely new child process.

Ryan · March 20, 2023, 7:05pm

I’m not sure I follow the issue with launching a child process.

My suggestion is to move just your OfficeToPDF code into a child process.

Your parent process can contain all your other libraries loaded once.

It works, but I’m wondering if there is some way to just tell PDFTron to load the fonts again without having to spawn an entirely new child process.

Currently no, and there is currently no plan to add support for this.