Product: Apryse SDK
Product Version: 10.1.1
Please give a brief summary of your issue:
Eventually, when re-processing a PDF file with images in it, the output file size will continue to grow 10 times in size on each save.
Please describe your issue and provide steps to reproduce it:
I am building a Django application in Python, and I have an endpoint where I pass a PDF file blob to the backend for processing. The PDF processing is meant to standardize every page to an 8.5" x 11" format, scaling down large pages/images that exceed those dimensions, and simply placing elements on a page that are smaller than those dimensions. Some image normalization is also desirable to improve the user experience around processing, viewing, and downloaded the relevant PDFs. Here is my python function doing the processing work:
def resize_pdf_pages(file_field, standard_width=612, standard_height=792):
new_doc = PDFDoc()
with file_field.open(mode='rb') as file:
file_bytes = file_field.read()
byte_arr = bytearray(file_bytes)
doc = PDFDoc(byte_arr, len(byte_arr))
doc.InitSecurityHandler()
page_resized = False
for page in range(1, doc.GetPageCount() + 1):
current_page = doc.GetPage(page)
media_box = current_page.GetMediaBox()
page_width = media_box.x2
page_height = media_box.y2
# Simply copy page if it is standard sized
if page_width == standard_width and page_height == standard_height:
new_doc.InsertPages(page, doc, page, page, PDFDoc.e_none)
else:
page_resized = True
# Scale based on larger dimension, if needed, otherwise it will
# already fit on the standard page size
s = Stamper(Stamper.e_absolute_size, page_width, page_height)
if page_width > standard_width or page_height > standard_height:
s = Stamper(Stamper.e_relative_scale, 1, 1)
new_page = new_doc.PageCreate(Rect(0, 0, standard_width, standard_height))
new_doc.PagePushBack(new_page)
s.SetAsBackground(True)
s.SetAlignment(Stamper.e_horizontal_center, Stamper.e_vertical_center)
s.StampPage(new_doc, current_page, PageSet(page))
file.close()
doc.Close()
if page_resized:
image_settings = ImageSettings()
image_settings.SetCompressionMode(ImageSettings.e_jpeg)
image_settings.SetQuality(10)
image_settings.SetImageDPI(300, 200)
image_settings.ForceRecompression(True)
opt_settings = OptimizerSettings()
opt_settings.SetColorImageSettings(image_settings)
opt_settings.SetGrayscaleImageSettings(image_settings)
Optimizer.Optimize(new_doc, opt_settings)
buffer = new_doc.Save(SDFDoc.e_linearized)
else:
buffer = new_doc.Save(0)
new_doc.Close()
buffer = BytesIO(buffer)
return buffer
As you can see, I am avoiding normalization work if I can, which is only the case when all pages are already standardized. In the front end, after a number of save operations (usually around 5) to the same Django model that stores the file (which in turn invokes this method), the PDF file size will start growing by multiples of 10. Would this be due to the repeated optimization code that is invoked on every page whenever something larger/smaller than standard resolution is included in the pdf? After the invocation of this function, I also clean up the buffer that I return, with buffer.close().
I would also like to add that I am using the demo license while developing locally.