Product:
Apryse SDK with Structured Output module
Please give a brief summary of your issue:
When converting a PDF to DOCX fonts from PDF do not match fonts in DOCX
Please describe your issue and provide steps to reproduce it:
When converting a PDF to DOCX using Apryse SDK Structured Output module, the fonts in the resulting DOCX document do not match the fonts used in the original PDF, even though those fonts are installed on both the conversion machine and the machine used to open the DOCX.
Here discord thread that contains sample documents and a script to reproduce the issue Discord
Hello,
Thanks for reaching out! I am unable to access those files in the link. Are you able to upload the source document, resulting document, and code here? If not could you create a ticket at https://support.apryse.com and add them to the ticket?
Thanks for your patience and understanding.
Hi Nicholas,
here are the files:
Conversion_Test (1).pdf (60.6 KB)
Conversion_Test (1).docx (35.6 KB)
And the script(looks like I can only add 2 files):
#---------------------------------------------------------------------------------------
# Copyright (c) 2001-2023 by Apryse Software Inc. All Rights Reserved.
# Consult LICENSE.txt regarding license information.
#---------------------------------------------------------------------------------------
require 'date'
require './app/sdks/apryse/arm64/PDFNetC/Lib/PDFNetRuby'
require 'etc'
include PDFNetRuby
require 'open-uri'
require 'fileutils'
$stdout.sync = true
# Relative path to the folder containing the test files.
$inputPath = "../../Downloads/"
$outputPath = "../../Downloads/"
def font_installed?(font_name)
`fc-list`.include?(font_name)
end
def main()
font_name = 'Inter'
if font_installed?(font_name)
puts "Font '#{font_name}' is installed."
else
puts "Font '#{font_name}' is not installed."
end
# The first step in every application using PDFNet is to initialize the
# library. The library is usually initialized only once, but calling
# Initialize() multiple times is also fine.
username = Etc.getlogin
puts "The script is being run by: #{username}"
PDFNet.Initialize("key")
PDFNet.AddResourceSearchPath("app/sdks/apryse/arm64/PDFNetC/Lib/");
if !StructuredOutputModule.IsModuleAvailable() then
puts ""
puts "Unable to run the sample: PDFTron SDK Structured Output module not available."
puts "-----------------------------------------------------------------------------"
puts "The Structured Output module is an optional add-on, available for download"
puts "at https://docs.apryse.com/documentation/core/info/modules/. If you have already"
puts "downloaded this module, ensure that the SDK is able to find the required files"
puts "using the PDFNet::AddResourceSearchPath() function."
puts ""
return
end
begin
# Convert PDF document to Word
puts "Converting PDF to Word"
$outputFile = $outputPath + "Conversion_Test.docx"
Convert.ToWord($inputPath + "Conversion_Test.pdf", $outputFile)
puts "Result saved in " + $outputFile
rescue => error
puts "Unable to convert PDF document to Word, error: " + error.message
end
PDFNet.Terminate
puts "Done."
end
main()
Please let me know is there’s anything else I could help with.
Thank you!
Inga