[Best Practices] How i can export inline images from current document?

anon86032500 · May 8, 2014, 11:14am

Hello Support,

i want to know some details about extracting images from a pdf document. I use the following code:

if (element.GetType() != Element.Type.e_image && element.GetType() != Element.Type.e_inline_image) continue;
 
var ctm = element.GetCTM();
double x2 = 1, y2 = 1;
ctm.Mult(ref x2, ref y2);

var result =  (new pdftron.PDF.Image(element.GetXObject())).GetBitmap();

I found a topic in the group that element.GetXObject() is not the right way for inline images. So i used this: Link The problem is that the export image is rotated. In the page it is normal but extracted it is backflip. So how is the best practices to extract images in

unrotated original form.

Regards

Daniel

agravesdale · May 8, 2014, 6:07pm

These SDK articles are the correct way to extract inline images:

https://groups.google.com/d/msg/pdfnet-sdk/K1Po0-1vylU/uaeedGmYdKoJ
https://groups.google.com/d/msg/pdfnet-sdk/R_a0P8blPP8/plCls22kZ_AJ

It may be that the document reverses the image in the PDF content stream. If you forward the document to support@pdftron.com, we can take a closer look.

Another option for you could be to instead rasterize the page with PDFDraw:

http://www.pdftron.com/pdfnet/samplecode.html#PDFDraw

In that case, you would be certain that rotation would be identical to the original.

agravesdale · May 15, 2014, 1:40am

If you extract the image using ImageExtract, it will show you the coordinates of the image. This is how you could detect that the image is displayed upside-down:

→ Image: 1
Width: 451
Height: 294
BPC: 8
Coords: x1=137.43509, y1=336.31201, x2=353.51819, y2=195.25781
→ Image: 2
Width: 440
Height: 342
BPC: 8
Coords: x1=137.3755, y1=483.56018, x2=348.53619, y2=647.90338

The first image is upside-down — its y1 coordinate is larger than its y2 coordinate. If you want to automatically flip such images, you could detect for this case and post-process the output. For example, using ImageMagick the command would be:

convert.exe upside_down_image.jpg -flip corrected_image.jpg

anon86032500 · May 15, 2014, 8:48am

Is that the complete interpretation of coordinates or is there any other possible way? When y1<y2 and x1<x2 isn’t the picture rotated and when x1<2 the image is upside-down and must rotate in 180 degrees? When this interpretations are only the possible cases so i can rotate the image after extracting.
Am Donnerstag, 15. Mai 2014 03:40:50 UTC+2 schrieb Aaron:

If you extract the image using ImageExtract, it will show you the coordinates of the image. This is how you could detect that the image is displayed upside-down:

→ Image: 1
Width: 451
Height: 294
BPC: 8
Coords: x1=137.43509, y1=336.31201, x2=353.51819, y2=195.25781
→ Image: 2
Width: 440
Height: 342
BPC: 8
Coords: x1=137.3755, y1=483.56018, x2=348.53619, y2=647.90338

The first image is upside-down — its y1 coordinate is larger than its y2 coordinate. If you want to automatically flip such images, you could detect for this case and post-process the output. For example, using ImageMagick the command would be:

convert.exe upside_down_image.jpg -flip corrected_image.jpg

agravesdale · May 15, 2014, 8:19pm

When this interpretations are only the possible cases so i can rotate the image after extracting.

Note also that the coordinates as calculated by the ImageExtract sample fail to account for any page rotation. It’s possible that page rotation could be used in a PDF to compensate for image rotation. (This should be very rare, but is possible.) You could compensate for the rotation when calculating coordinates (https://groups.google.com/d/msg/pdfnet-sdk/4sPgTwkaAoE/shXolsUDUs0J) or simply detect page rotation (http://www.pdftron.com/pdfnet/PDFNet/html/M_pdftron_PDF_Page_GetRotation.htm) and manually inspect the images to determine if further processing is required.

Ryan · January 13, 2016, 7:39pm

The C# ImageExtractTest sample has been updated to handle inline images, including flipping. Below are the relevant code snippets.

if (element.GetType() == Element.Type.e_inline_image)
{
Image2RGB image2rgb = new Image2RGB(element);
FilterReader image_reader = new FilterReader(image2rgb);
pdftron.PDF.Image image = pdftron.PDF.Image.Create(doc, image_reader,

element.GetImageWidth(), element.GetImageHeight(), 8, ColorSpace.CreateDeviceRGB());

image.Export(filename);
}

static byte[] FlipImage(Element element)
{
Image2RGB image2rgb = new Image2RGB(element);
int width = element.GetImageWidth();
int height = element.GetImageHeight();
int out_data_sz = width * height * 3;
int stride = width * 3;
FilterReader reader = new FilterReader(image2rgb);
byte[] image_data = new byte[out_data_sz];
byte[] flipped_data = new byte[out_data_sz];
reader.Read(image_data);
for (int row = 0; row < height; ++row)
{
Buffer.BlockCopy(image_data, row * stride, flipped_data, out_data_sz - (stride * (row + 1)), stride);
}
return flipped_data;
}