i want to know some details about extracting images from a pdf document. I use the following code:
if (element.GetType() != Element.Type.e_image && element.GetType() != Element.Type.e_inline_image) continue;
var ctm = element.GetCTM();
double x2 = 1, y2 = 1;
ctm.Mult(ref x2, ref y2);
var result = (new pdftron.PDF.Image(element.GetXObject())).GetBitmap();
I found a topic in the group that element.GetXObject() is not the right way for inline images. So i used this: Link The problem is that the export image is rotated. In the page it is normal but extracted it is backflip. So how is the best practices to extract images in
It may be that the document reverses the image in the PDF content stream. If you forward the document to support@pdftron.com, we can take a closer look.
Another option for you could be to instead rasterize the page with PDFDraw:
If you extract the image using ImageExtract, it will show you the coordinates of the image. This is how you could detect that the image is displayed upside-down:
The first image is upside-down — its y1 coordinate is larger than its y2 coordinate. If you want to automatically flip such images, you could detect for this case and post-process the output. For example, using ImageMagick the command would be:
Is that the complete interpretation of coordinates or is there any other possible way? When y1<y2 and x1<x2 isn’t the picture rotated and when x1<2 the image is upside-down and must rotate in 180 degrees? When this interpretations are only the possible cases so i can rotate the image after extracting.
Am Donnerstag, 15. Mai 2014 03:40:50 UTC+2 schrieb Aaron:
If you extract the image using ImageExtract, it will show you the coordinates of the image. This is how you could detect that the image is displayed upside-down:
The first image is upside-down — its y1 coordinate is larger than its y2 coordinate. If you want to automatically flip such images, you could detect for this case and post-process the output. For example, using ImageMagick the command would be:
When this interpretations are only the possible cases so i can rotate the image after extracting.
Note also that the coordinates as calculated by the ImageExtract sample fail to account for any page rotation. It’s possible that page rotation could be used in a PDF to compensate for image rotation. (This should be very rare, but is possible.) You could compensate for the rotation when calculating coordinates (https://groups.google.com/d/msg/pdfnet-sdk/4sPgTwkaAoE/shXolsUDUs0J) or simply detect page rotation (http://www.pdftron.com/pdfnet/PDFNet/html/M_pdftron_PDF_Page_GetRotation.htm) and manually inspect the images to determine if further processing is required.