PDF Semantic Difference

johanne.ongchangco · March 31, 2025, 11:30am

WebViewer Version: 10.8.0

Do you have an issue with a specific file(s)? Yes
Can you reproduce using one of our samples or online demos? Yes

Please give a brief summary of your issue:
(Think of this as an email subject)
PDF Compare is not loading due to calling DocumentViewer.startSemanticDiff
With specific pdf’s. The second pdf is larger (not sure if related)

Please describe your issue and provide steps to reproduce it:
(The more descriptive your answer, the faster we are able to help you)
Please provide a link to a minimal sample where the issue is reproducible:

in JS Semantic PDF Files Comparison Demo | Apryse WebViewer
Load version 1 and version 2 then click compare
The page loader will always show.

system · March 31, 2025, 11:30am

Hello, I’m AiRon, an automated tech support bot
While you wait for one of our customer support representatives, I can help you with your question.

It seems you’re encountering an issue with the DocumentViewer.startSemanticDiff method in WebViewer version 10.8.0, specifically when comparing two PDF files of different sizes. This can sometimes happen if the WebViewer struggles to process larger documents or if there is a mismatch in the document structure.

Some things you can try:

Ensure both PDF files are correctly formatted and not corrupted.

Check if the issue persists with smaller versions of the files or with other PDF files to isolate the problem to these specific documents.

Try increasing the memory allocation or performance settings in your environment to handle larger files.

Update to the latest version of WebViewer, as newer versions might have fixes for performance issues with document comparison.

Consider breaking down the larger document into smaller sections and compare them separately to identify any specific section causing the issue.

Here are some resources that might help you:

PDF Comparison Guide - Learn more about how PDF comparison works and common troubleshooting steps.

Multi-Viewer Mode Documentation - Understand how to enable and use document comparison features effectively.

Version 10.8.0 Changelog - Check for any known issues or fixes related to semantic diff in this version.

Optimization Techniques - Tips on optimizing WebViewer for large documents and intensive operations.

Hopefully, this helps you solve your problem while you wait for a human to get back to you.

johanne.ongchangco · April 4, 2025, 12:23am

Any update on this? Thanks

mickael.vanstraelen · April 4, 2025, 8:25am

Hello Johanne.

After investigating this issue it seems the issue is coming from your PDFs.

It seems the bounding boxes are corrupted, preventing WebViewer from detecting any text on the documents. That is why when opening the document on WebViewer, you are not able to select any text.

The feature you are trying to use is trying to compare the text between the two documents, by not finding any, the feature is sending an error.

Best Regards,
Mickaël.

johanne.ongchangco · April 7, 2025, 3:57am

I have tagged this for moderation. Thanks

johanne.ongchangco · April 8, 2025, 11:20am

Is there any way that webviewer can handle those hidden issues? Because acrobat viewer or other viewers and even the webviewer does not show errors or can handle the pdf to be shown. Only when comparing semantics is having issue that any client or us would not know because we already tried validating the pdf’s for issues but did not show any issues. Can we make the webviewer handle this internally if there are maybe incompatible elements?

mickael.vanstraelen · April 8, 2025, 11:38am

Hi Johanne.

Unfortunately, in this case, the PDF itself is the issue.

The issue is present on both WebViewer and Adobe, the bounding boxes are corrupted, which makes it tricky to detect the text for WebViewer. Adobe barely handles it.

Here is the difference when selecting text from our demo.pdf and yours:

What you could do is try to operate the OCR (Optical Character Recognition), here are some guides:

https://sdk.apryse.com/api/web/Core.PDFNet.OCRModule.html#.applyOCRJsonToPDF__anchor

Let me know if you have any questions.

Best Regards,
Mickaël.

johanne.ongchangco · April 8, 2025, 12:05pm

Is it ok that that the compare in showcase would behave the same which is showing the loading indefinitely and no error prompt or other expected behavior?

mickael.vanstraelen · April 8, 2025, 12:44pm

Hi Johanne,

We agreed this error should be handled better than a console error and an infinite loader.

This has been reported and will be reviewed by the product team.

Best Regards,
Mickaël.