PDF Comparison with OCR Layer in webViewer

a.kehtarian · September 3, 2023, 8:09pm

Product:

Product Version:

Please give a brief summary of your issue:
(Think of this as an email subject)
Hello,

I’m encountering an issue when attempting to compare PDF files that include an OCR (Optical Character Recognition) layer and the compare functionality does not work.
also i have follwoing error in console:
Uncaught (in promise) Exception: **
** Message: map::at: key not found
** Filename: **
** Function: **
** Linenumber:**

Is there a feature or capability in webViewer that allows me to compare these OCR layers effectively?

Please describe your issue and provide steps to reproduce it:
(The more descriptive your answer, the faster we are able to help you)

Please provide a link to a minimal sample where the issue is reproducible:

maggiev · September 5, 2023, 11:08pm

Hi a.kehtarian,

Technically it’s possible. If you’re trying out our demo, the scanned PDF should be processed here https://showcase.apryse.com/ocr-module in order for text to be extracted. And then it can be downloaded (as a searchable PDF) to be used for a semantic compare here https://showcase.apryse.com/semantic-text-compare/.

Please find more documentation about text extracting here.

Regards,
Maggie V.

a.kehtarian · September 6, 2023, 12:26pm

Hi Maggie,

thank you for your response

If I understand correctly, we should first make the OCR-Layer document extract and then compare it.

Then the question would be why the WebViewer cannot evaluate the data from our document that is already in the OCR layer.

Some other PDF readers (Adobe Acrobat, Chrome PDF Addin, …) recognize the existing OCR layer and can find the text.

My question is whether WebViewer can also read/evaluate the OCR layer that is already there. Without me having to do the OCR recognition for the document again with the OCR module.

Beste regards
Amir Kehtarian