Xerox pdf documents are images

brad_curtis · May 24, 2017, 1:30pm

Hi,

We are in the process of evaluating your product. We are current using InkScape to transform PDF's to svg and html files. InScape is very slow and processor intensive so we are looking for alternative solutions.

The performance of pdfton is fantastic. However, we are having issues processing some xerox pdf documents. InkScape is able to transform the documents into "text" for both html and svg formats. Your product is embedding and image of the document.

We are wondering what the possibility of an update to your product to handle these documents. They account for about 20% of our document corpus so it will prohibit us from moving forward with a purchase.

thanks
Brad

Ryan · May 24, 2017, 6:42pm

Can you post a sample document here? If not, please email one to support.

Also, what PDFTron solution are you looking at? I assume WebViewer, or HTML5 based viewer? Though even that has two backends (PDF and XOD).

Are you looking to support both mobile and desktop browsers?

Or are you using our PDF2HTML converter? Or PDF2SVG?

Please clarify exactly what tool/api you are using.

brad_curtis · May 26, 2017, 9:49pm

Ryan,

We are working with our customer to get a redacted version. We are using the pdfnet SDK.

string foldername = fileName.Substring(0, fileName.Length - 4);
if (!Directory.Exists(@“C:\DiscoverTech\ConversionServer\Content” + foldername))
Directory.CreateDirectory(@“C:\DiscoverTech\ConversionServer\Content” + foldername);
pdftron.PDF.Convert.ToSvg(in_doc, @“C:\DiscoverTech\ConversionServer\Content”+ foldername + “\”+fileName+".svg");

if (!Directory.Exists(@“C:\DiscoverTech\ConversionServer\Content” + foldername + @"\html"))
Directory.CreateDirectory(@“C:\DiscoverTech\ConversionServer\Content” + foldername + @"\html");

pdftron.PDF.Convert.ToHtml(in_doc, @“C:\DiscoverTech\ConversionServer\Content” + foldername + @"\html");

thanks
Brad

Ryan · May 29, 2017, 8:12pm

Thank you for the clarification. We would definitely need to see the source PDF file before commenting further on your particular question, but I think the following will help you.

Our PDF2HTML converter is always converting all non-text graphics to an image, since plain HTML cannot handle vector content (except text).

Our PDF2SVG converter, by default, runs with Flatten Content setting to fast. This means areas of complex vector content, such as patterns, might get converted into an image. This is because otherwise browsers get stuck trying to render these complex vectors paths and stalls. This can be turned off with the following.
Convert.SVGOutputOptions.SetFlattenContent(Convert.FlattenFlag.e_off)

I also highly recommend that you check out our HTML5 Canvas WebViewer solution. It supports full vector content, plus many additional features, that make it better suited to SVG. Also, if you are targeting desktop browsers only, then you can actually view the PDF file directly without any conversions.
https://blog.pdftron.com/2013/08/08/how-to-integrate-a-pdf-viewer-in-html5-apps/

https://blog.pdftron.com/2015/11/10/pdfnetjs-html5-pdf-viewer-and-editor/

http://www.pdftron.com/webviewer/index.html