Please give a brief summary of your issue:
I’m trying to convert a single page of a PDF to an image (eventual goal is a base64 encoded string) with the .NET API. But finding a way to do this that is cross-platform, in memory, and that actually works has been a challenge.
The current (working) code I have looks like this:
using var pdfDraw = new PDFDraw(ThumbnailResolutionDpi);
string tempFile = Path.GetRandomFileName() + ".jpg";
var hintSet = new ObjSet();
Obj encoderParam = hintSet.CreateDict();
encoderParam.PutNumber("Quality", ThumbnailUtils.ThumbnailJpegQuality);
pdfDraw.Export(firstPage, tempFile, "JPEG", encoderParam);
byte[] imageBytes = await File.ReadAllBytesAsync(tempFile, token);
File.Delete(tempFile);
result.ThumbnailBase64 = Convert.ToBase64String(imageBytes);
Where firstPage is a PDFTron.PDF.Page object.
The issue is… this involves writing a pointless temp file to disc, reading from it, then deleting it. Is there a way I can do this without relying on Export (or any platform-specific code)?
I tried the ExportAsPng() and GetBitmap() methods described here, but didn’t have any luck. I’m probably missing something simple there.
Hello, I’m AiRon, an automated tech support bot While you wait for one of our customer support representatives, I can help you with your question.
When converting a PDF to an image in the .NET API without writing to disk, you can use the Export method with an in-memory stream. This avoids the need for temporary files and provides a cross-platform solution.
Some things you can try:
Use PDFDraw to render the page to an in-memory stream instead of a file.
Initialize a MemoryStream and pass it to the Export method.
Convert the MemoryStream data to a byte array, then encode it to a Base64 string.
Ensure proper disposal of objects to manage resources efficiently.
Here are some resources that might help you:
.NET Core PDF library integration: This guide provides comprehensive instructions to integrate Apryse SDK into .NET Core applications.
PDFDraw API Documentation: Detailed documentation on how to use the PDFDraw class for rendering PDF pages.
Working with PDF Forms: Learn about handling PDF forms using Apryse SDK, which might offer insights into working with PDF content.
Barcode Extraction Guide: Though focused on barcodes, this guide might provide useful tips on handling data in memory.
Hopefully, this helps you solve your problem while you wait for a human to get back to you.
I did try this before, using the MemoryFilter class. There is one issue I have with this though. To construct a MemoryFilter, I need to specify the buffer size in advance (unlike a .NET MemoryStream). Of course, I don’t know the exact size the image will be yet. In the example you linked, the buffer size was set to 64 KiB. I’m curious if there’s a good heuristic I should be following here.
Although, the documentation for MemoryFilter says “MemoryFilter makes sure that there is enough space to accomodate all data”. So is that buffer size parameter not actually important for a sink? If so, this works great for me, I would just comment in my code so that’s clear.
Just tested it out, and can confirm. The MemoryFilter worked A-OK even with a buffer size of “0”. The fact that it’s a required parameter is just a bit misleading. This works perfectly for me, thanks for your help!
Actually, while I’m here, one other thing tripped me up. The value of the is_input parameter doesn’t do what I’d expect. I thought true = sink (aka an input stream). I must have misunderstood something there..
Sorry if the documentation is unclear. That is just the initial starting capacity of the stream. It will auto increase as needed.
As for is_input flag, false can be thought to mean sink, that is the memory filter is being filled with data. true then means it is a source (to be read from)