Segmentation Fault OCR Linux with NodeJS

Product: NodeJS + PDFNet/OCRModule

Product Version: 10.4.0

Please give a brief summary of your issue:
When executing the OCRModule on Linux, gives a Segmentation Fault error

Please describe your issue and provide steps to reproduce it:
We are using the @pdftron/pdfnet-node@10.4.0 libraries, and downloaded the lib from https://www.pdftron.com/downloads/OCRModuleLinux.tar.gz?_ga=2.168404713.938843601.1696230339-585641438.1676276555

When executing the Samples for the OCRModule in our server, it gives the following error:

PDFNet is running in demo mode.

Package: base
Package: idp
{
  message: 'Exception: \n' +
    '\t Message: Error using OCR module Add-On. Child process was terminated by signal: 0XB\n' +
    'Page conversion failed.\n' +
    '\t Conditional expression: false\n' +
    '\t Version      : 10.4.0-cae97e1\n' +
    '\t Platform     : Linux\n' +
    '\t Architecture : AMD64\n' +
    '\t Filename     : OCRModule.cpp\n' +
    '\t Function     : ProcessPDF\n' +
    '\t Linenumber   : 70\n',
  type: 'PDFWorkerError'
}

If we try executing the OCRModule itself via ./OCRModule, gives the following:

> ./OCRModule
Segmentation fault

Executing the very same application using the Windows library in my computer, the samples work without any problem, and executing the exe shows the following:

> OCRModule.exe
PDFNet OCR Module 10.4.0-cae97e182c.
This module was not designed to run as a standalone application: its features are available via the Apryse SDK.

Any idea what might be causing the issue on the Linux server?
Maybe lack of permissions? Not compiled for AMD64 architecture?

If it’s of any help, here is the uname --a command and some info about the hostnamectl output:

> uname --a
Linux 5.10.0-25-amd64 #1 SMP Debian 5.10.191-1 (2023-08-16) x86_64 GNU/Linux
> hostnamectl
    Virtualization: kvm
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 5.10.0-25-amd64
      Architecture: x86-64

Thanks!

1 Like

I just tried using the IRIS module, downloading it from: https://www.pdftron.com/downloads/IRISOCRModuleLinux.tar.gz?_ga=2.98623958.938843601.1696230339-585641438.1676276555 and changing its name to OCRModule (the original name in the downloaded file was not being detected) and the new error is the following

Iris available: false
Package: base
Package: idp
{
  message: 'Exception: \n' +
    '\t Message: Error using OCR module Add-On. External module is not recognized as a valid executable format for this CPU architecture (/path/to/Lib/Linux/OCRModule)\n' +
    'Page conversion failed.\n' +
    '\t Conditional expression: false\n' +
    '\t Version      : 10.4.0-cae97e1\n' +
    '\t Platform     : Linux\n' +
    '\t Architecture : AMD64\n' +
    '\t Filename     : OCRModule.cpp\n' +
    '\t Function     : ProcessPDF\n' +
    '\t Linenumber   : 70\n',
  type: 'PDFWorkerError'
}

The Iris available: false line is added in OCRTest.js:

PDFNet.addResourceSearchPath(LibPath);

const useIRIS = await PDFNet.OCRModule.isIRISModuleAvailable();
console.log("Iris available:", useIRIS); // <------ This line added --------
if (!(await PDFNet.OCRModule.isModuleAvailable())) {
    console.log('\nUnable to run OCRTest: Apryse SDK OCR module not available.');
    console.log('---------------------------------------------------------------');
    console.log('The OCR module is an optional add-on, available for download');
    console.log('at http://www.pdftron.com/. If you have already downloaded this');
    console.log('module, ensure that the SDK is able to find the required files');
    console.log('using the PDFNet.addResourceSearchPath() function.\n');

    return;
}

So it seems like the IRIS module is not recognized either

1 Like

Regarding both modules they need execute permission for the user account being used, which it sounds like for the IRIS module that is not the case.

Also, the file name is important, so renaming IRIS to OCRModule might confuse the SDK about which module it is talking to and cause issues.

You can use addResourceSearchPath to set it to the path that you put IRIS module into.

As for the seg fault, no one else has reported such an issue.

What happens if you call ./OCRModule directly from the command line? What is the full output to terminal?

1 Like

As I said in the initial topic, the only output it gives us is the Segmentation fault message

Console output

Regarding the permissions, everyone has permission to execute the file:

ocrdoceostore@hl1203:~/www/NodeJSTest/PDFNetNodeJS/node_modules/@pdftron/pdfnet-node-samples/samples/OCRTest/Lib$ ls -lah
total 29M
drwxr-xr-x 2 ocrdoceostore ocrdoceostoregrp 4.0K Oct  3 09:44 .
drwxr-xr-x 4 ocrdoceostore ocrdoceostoregrp 4.0K Oct  3 09:46 ..
-rwxr-xr-x 1 ocrdoceostore ocrdoceostoregrp  28M Sep 29 13:59 OCRModule
1 Like

can you try ldd OCRModule and provide the output?

1 Like

The output is the following:

ocrdoceostore@hl1203:~/www/NodeJSTest/PDFNetNodeJS/node_modules/@pdftron/pdfnet-node-samples/samples/OCRTest/Lib$ ldd OCRModule
        statically linked
1 Like

I suspect you downloaded the wrong binary somehow, or it got corrupt somehow. This is what I get for example.

ldd OCRModule
        linux-vdso.so.1 (0x00007ffdb54c3000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007feca7640000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007feca763a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007feca7617000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007feca74c8000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007feca72d6000)
        /lib64/ld-linux-x86-64.so.2 (0x00007feca7657000)

Can you please download again from here, and let me know how that works for you.

1 Like

Yep, seems like downloading the Linux module to my Windows laptop and then uploading it via FTP to the server corrupted the file in some way because the checksums on the files were different. Downloading the module directly to the server with a wget made it work with the same output as yours.

Thanks!

1 Like