PDF File Size Exponentially Grows on Continued Processing

Product: Apryse SDK

Product Version: 10.1.1

Please give a brief summary of your issue:

Eventually, when re-processing a PDF file with images in it, the output file size will continue to grow 10 times in size on each save.

Please describe your issue and provide steps to reproduce it:

I am building a Django application in Python, and I have an endpoint where I pass a PDF file blob to the backend for processing. The PDF processing is meant to standardize every page to an 8.5" x 11" format, scaling down large pages/images that exceed those dimensions, and simply placing elements on a page that are smaller than those dimensions. Some image normalization is also desirable to improve the user experience around processing, viewing, and downloaded the relevant PDFs. Here is my python function doing the processing work:

def resize_pdf_pages(file_field, standard_width=612, standard_height=792):
    new_doc = PDFDoc()

    with file_field.open(mode='rb') as file:
        file_bytes = file_field.read()

        byte_arr = bytearray(file_bytes)
        doc = PDFDoc(byte_arr, len(byte_arr))
        doc.InitSecurityHandler()

        page_resized = False

        for page in range(1, doc.GetPageCount() + 1):
            current_page = doc.GetPage(page)
            media_box = current_page.GetMediaBox()
            page_width = media_box.x2
            page_height = media_box.y2

            # Simply copy page if it is standard sized
            if page_width == standard_width and page_height == standard_height:
                new_doc.InsertPages(page, doc, page, page, PDFDoc.e_none)
            else:
                page_resized = True
                # Scale based on larger dimension, if needed, otherwise it will
                # already fit on the standard page size
                s = Stamper(Stamper.e_absolute_size, page_width, page_height)

                if page_width > standard_width or page_height > standard_height:
                    s = Stamper(Stamper.e_relative_scale, 1, 1)

                new_page = new_doc.PageCreate(Rect(0, 0, standard_width, standard_height))
                new_doc.PagePushBack(new_page)

                s.SetAsBackground(True)
                s.SetAlignment(Stamper.e_horizontal_center, Stamper.e_vertical_center)
                s.StampPage(new_doc, current_page, PageSet(page))

        file.close()
        doc.Close()

    if page_resized:
        image_settings = ImageSettings()
        image_settings.SetCompressionMode(ImageSettings.e_jpeg)
        image_settings.SetQuality(10)
        image_settings.SetImageDPI(300, 200)
        image_settings.ForceRecompression(True)

        opt_settings = OptimizerSettings()
        opt_settings.SetColorImageSettings(image_settings)
        opt_settings.SetGrayscaleImageSettings(image_settings)

        Optimizer.Optimize(new_doc, opt_settings)

        buffer = new_doc.Save(SDFDoc.e_linearized)
    else:
        buffer = new_doc.Save(0)

    new_doc.Close()
    buffer = BytesIO(buffer)
    return buffer

As you can see, I am avoiding normalization work if I can, which is only the case when all pages are already standardized. In the front end, after a number of save operations (usually around 5) to the same Django model that stores the file (which in turn invokes this method), the PDF file size will start growing by multiples of 10. Would this be due to the repeated optimization code that is invoked on every page whenever something larger/smaller than standard resolution is included in the pdf? After the invocation of this function, I also clean up the buffer that I return, with buffer.close().

I would also like to add that I am using the demo license while developing locally.

Thank you for contacting us about this. One suggestion is to first import all the pages in one call (see sample 1 in the sample code here) and then apply your stamping logic on the new document.

If you are still experiencing the size issue, can you please modify the sample project to illustrate the issue?

Thank you in advance.