Another issue with segfaults when calling OfficeToPDF

Product: PDFNetC, PDFNetRuby, PDFNetPython

Product Version: Both PDFNetC64_2023-06-26_stable and v10.2.0

Please give a brief summary of your issue:
(Think of this as an email subject)

Not sure if this is related to an issue that was resolved recently (See topic). But we ran into another file that’s causing segfaults.

Unfortunately, this time I’m unable to narrow down exactly what feature in the .docx file is the root cause. However, after testing various hypotheses I noticed that even deleting all the text contents from the file causes the segfault.

Although I was able to repeat the issue after deleting all contents and even removing data using the document inspector, I hesitate to post it publicly. I’ll DM the copy of the test file to an Apryse team member that requests it.

Please describe your issue and provide steps to reproduce it:
(The more descriptive your answer, the faster we are able to help you)

The same simple test script to reproduce this issue as before:

# frozen_string_literal: true

require '/usr/local/PDFNetC/Lib/PDFNetRuby'

PDFNetRuby::PDFNet.Initialize(ENV.fetch('PDFTRON_LICENSE_KEY'))

filepath = ARGV.first
doc = PDFNetRuby::PDFDoc.new
doc.Lock
PDFNetRuby::Convert.OfficeToPDF(doc, filepath, nil)
doc.Unlock

As mentioned above, I’ll provide the test file to an Apryse team member when requested. Meanwhile here are some outputs that I’ve been seeing. So far I’m seeing the following messages at the end of the core dump outputs:

  • malloc(): invalid size (unsorted)
  • malloc(): unsorted double linked list corrupted
  • corrupted double-linked list
  • corrupted size vs. prev_size

Here are some full outputs:

tmp/testing.rb:10: [BUG] Segmentation fault at 0x0000556f07633000
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0015 e:000014 CFUNC  :OfficeToPDF
c:0002 p:0043 s:0008 E:001430 EVAL   tmp/testing.rb:10 [FINISH]
c:0001 p:0000 s:0003 E:002640 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
tmp/testing.rb:10:in `<main>'
tmp/testing.rb:10:in `OfficeToPDF'

-- Machine register context ------------------------------------------------
 RIP: 0x00007f2331b75741 RBP: 0x0000556f0701aa10 RSP: 0x00007ffd8810d918
 RAX: 0x0000556f06f70dd0 RBX: 0x0000556f06f6c758 RCX: 0x0000556f06f70da8
 RDX: 0xffffffffff93dde8 RDI: 0x0000556f07632fe0 RSI: 0x0000556f07632fe8
  R8: 0xfffffffffffffff0  R9: 0x0000556f06f70d20 R10: 0x0000556f06f70dc8
 R11: 0x0000556f06f70dd0 R12: 0x0000556f06989b08 R13: 0x0000000000000000
 R14: 0x0000000000000000 R15: 0x0000556f06ce0730 EFL: 0x0000000000010282

-- C level backtrace information -------------------------------------------
malloc(): invalid size (unsorted)
tmp/testing.rb:10: [BUG] Segmentation fault at 0x0000555e0e976000
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0015 e:000014 CFUNC  :OfficeToPDF
c:0002 p:0043 s:0008 E:0001a0 EVAL   tmp/testing.rb:10 [FINISH]
c:0001 p:0000 s:0003 E:002030 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
tmp/testing.rb:10:in `<main>'
tmp/testing.rb:10:in `OfficeToPDF'

-- Machine register context ------------------------------------------------
 RIP: 0x00007fe9ed41974f RBP: 0x0000555e0e35dab0 RSP: 0x00007ffedd099548
 RAX: 0x0000555e0e2b3ef0 RBX: 0x0000555e0e2af878 RCX: 0x0000555e0e2b3ec8
 RDX: 0xffffffffff93df68 RDI: 0x0000555e0e975f80 RSI: 0x0000555e0e975f88
  R8: 0xfffffffffffffff0  R9: 0x0000555e0e2b3e40 R10: 0x0000555e0e2b3ee8
 R11: 0x0000555e0e2b3ef0 R12: 0x0000555e0dc6ad68 R13: 0x0000000000000000
 R14: 0x0000000000000000 R15: 0x0000555e0e023b40 EFL: 0x0000000000010286

-- C level backtrace information -------------------------------------------
malloc(): unsorted double linked list corrupted
tmp/testing.rb:10: [BUG] Segmentation fault at 0x0000563d36ac9000
ruby 3.2.1 (2023-02-08 revision 31819e82c8) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0003 p:---- s:0015 e:000014 CFUNC  :OfficeToPDF
c:0002 p:0043 s:0008 E:001820 EVAL   tmp/testing.rb:10 [FINISH]
c:0001 p:0000 s:0003 E:001950 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
tmp/testing.rb:10:in `<main>'
tmp/testing.rb:10:in `OfficeToPDF'

-- Machine register context ------------------------------------------------
 RIP: 0x00007f077401974a RBP: 0x0000563d3678c620 RSP: 0x00007ffcb70b1538
 RAX: 0x0000563d3678d480 RBX: 0x0000563d3678d4c8 RCX: 0x0000563d3678d458
 RDX: 0xffffffffffcc44d8 RDI: 0x0000563d36ac8fa0 RSI: 0x0000563d36ac8fa8
  R8: 0xffffffffffffffe0  R9: 0x0000563d3678d480 R10: 0x0000563d3678d478
 R11: 0x0000563d3678d480 R12: 0x0000563d361e3fd8 R13: 0x0000000000000000
 R14: 0x0000000000000000 R15: 0x0000563d3659b3b0 EFL: 0x0000000000010282

-- C level backtrace information -------------------------------------------
corrupted size vs. prev_size

These were all tested with the nightly “PDFNetC64_2023-06-26_stable.” I have also tested with v10.2.0 from your download page and I see similar segfaulting behavior. I also tested with the Python3 SDK hosted at your PyPI index, as well as a test program with the C++ SDK. They all result in segfaults.

Please provide a link to a minimal sample where the issue is reproducible:

See above.

1 Like

I’ll also post the output from running a C++ test program with Valgrind. Here is the test program:

#include <iostream>
#include <cstdlib>
#include <string>
#include <PDF/Convert.h>
#include <PDF/PDFDoc.h>
#include <PDF/PDFNet.h>
#include <SDF/ObjSet.h>

using namespace pdftron;
using namespace PDF;
using namespace SDF;

int
main(int argc, char** argv)
{
        try
        {
                std::string filepath(*(argv + 1));
                PDFNet::Initialize(std::getenv("PDFTRON_LICENSE_KEY"));
                PDFDoc doc;
                doc.Lock();
                Convert::OfficeToPDF(doc, UString(filepath), nullptr);
                doc.Save(filepath + ".pdf", SDFDoc::e_incremental);
                doc.Unlock();
                doc.Close();
        }
        catch (Common::Exception& e)
        {
                std::cerr << e << std::endl;
                return EXIT_FAILURE;
        }

        return EXIT_SUCCESS;
}

The output is too large to include inline, so see attached file:
valgrind.txt.gz (125.9 KB)

1 Like

Hi James,

Thank you for the detailed information about the crash you are seeing on your side. It is really appreciated. Can you please forward the specific file that you are able to reproduce the issue with to support@apryse.com I can then take a look on my side.

Thank you.

1 Like

Hi Shakthi. I just sent the file to support@apryse.com. Thanks!

1 Like

I am having the same issue, using OfficeToPDF on a specific file using python sdk. Are there any remediations? Or even how do I handle this error. Because I am using python, it doesn’t even seem like I can gracefully handle this. I can’t catch a segfault, and so my entire program dies, which is quite bad. I can filter out that file ahead of time, but i cant know what will cause segfaults for new files.

1 Like

@lucas It took them a little over a week after I sent them my test file, but they eventually rolled out a fix in the 2023-07-13 v10.2 nightly release. Unfortunately, we got more segfaults just a few days later and was then told to upgrade to 2023-07-26 v10.2. It’s been ok since but we’re honestly still on high alert just hoping it doesn’t happen again.

1 Like

Thank you for the update @jamesl !

1 Like

@jamesl do you know how I would install that version for the python sdk?

1 Like

I just upgraded to the 10.3.0 (Aug 2nd) release and am getting the same segfaults

1 Like

Oh boy. Yea, definitely open a new topic. Hopefully somebody from the Apryse support team responds and they’re able to release a new patch.

As for installing the nightly releases, we built the wrapper following the directions on their readme with the BUILD_PDFNetRuby=ON option. For Python, I’m assuming the only difference is setting BUILD_PDFNetPython=ON instead.

1 Like

We got hit with this again earlier this week and I just opened a new topic. Just letting you know in case it’s the same issue affecting us both.

2 Likes

Hello Lucas, this issue was determined to be file specific and a fix has been made to the latest version of the SDK. If you are experiencing a segfault issue when trying to convert a DOCX file, please forward the file and any further details to support@apryse.com.

Thank you.

1 Like