Handling Large PDF Streams with PDFTron (2GB+ Files)

zaid.jeelani · May 7, 2025, 6:50am

Product: PDFTron.NETCore.Windows.x64

Product Version: 11.3.0

Please give a brief summary of your issue:
We are currently using the following API to create a PDFDoc from a PDF file stream:

var pdfDocument = new PDFDoc(pdfFileStream);

We use this pdfDocument to extract the annotations from the pdf file.

However, when working with a stream from a large PDF file (greater than 2GB in size), we encountered the following error: “Stream was too long.” when attempting to save the stream to a file.

pdfDocument.Save(pdfFileStream, pdftron.SDF.SDFDoc.SaveOptions.e_linearized);

It was recommended to use file path instead of stream for files larger than 2GB. However, in our case, we are working with Azure Blob Storage, and we do not have a local file path — we only have access to the file stream.

Is it possible to continue using streams for files over 2GB when working with PDFTron Server?

Are there alternative approaches or optimizations available for handling large PDF files via streams, especially when file path access is not an option?

Regards,
Zaid.

system · May 7, 2025, 6:50am

Hello, I’m AiRon, an automated tech support bot
While you wait for one of our customer support representatives, I can help you with your question.

When dealing with large PDF files (over 2GB) using streams, the .NET framework may have limitations with handling such large streams directly. This can lead to errors like “Stream was too long.” When working with Azure Blob Storage, where direct file paths are unavailable, you need to consider alternative approaches to manage large files efficiently.

Some things you can try:

Instead of using a single large stream, consider breaking down the PDF into smaller chunks. This can help manage memory usage and avoid stream size limitations.

Use Azure Blob’s ability to create temporary local files. You can download the blob to a temporary file on disk, process it, and then delete it after processing.

Consider using the PDFTron SDK to handle the PDF file directly from Azure Blob Storage by leveraging Azure’s SDK to manage large files more efficiently.

Explore using a memory-mapped file to simulate file path access, allowing the PDFTron library to work with file-like access without needing an actual file path.

Check if the PDFTron library provides any specific methods optimized for handling large files or streams, especially in cloud environments.

Here are some resources that might help you:

Apryse .NET Core PDF Library Integration - This guide can help you set up the SDK and run samples, which may include handling large files.

Microsoft FileStream Documentation - Learn how to work with file streams, which might offer insights into managing large file streams.

Azure Blob Storage Documentation - Provides methods to upload and download blobs, which can be useful when handling large files.

MemoryMappedFile Documentation - Explore how to use memory-mapped files for accessing large files without needing a physical file path.

Apryse Support - Contact Apryse for more detailed support on handling large files with their SDK.

Hopefully, this helps you solve your problem while you wait for a human to get back to you.

Marcus_Andra · May 7, 2025, 2:35pm

Hello,

When working with such large streams 2GB+ you can run into limitations due to the stream simply being too large to properly handle. This can happen for various reasons such as limitations provided through .NET.

If you are receiving such large file streams the automated response provided does provide some good workarounds that may work for you. Highlighting some of these:

Azure Blob Storage looks to support downloading specific byte ranges of blobs. The idea would be that you could look into breaking the file into chunks before performing the conversion. You could then convert the chunks to PDF individually and then merge them back to a single PDF using the PDFDoc InsertPages method:
Merge, Copy, Delete, Rearrange PDF Pages - Sample Code | Apryse documentation
BlobDownloadOptions.Range Property (Azure.Storage.Blobs.Models) - Azure for .NET Developers | Microsoft Learn
Another option would be investigating the potential to temporarily get the file stream as a file so that it can be processed on a local disk. Once you are done with the file, you could delete the temporary file entirely. I understand this likely will not work for you, but it’s worth taking note since Azure provides support for this:
Download a blob with .NET - Azure Storage | Microsoft Learn