How do I find the compression method used on a PDF image?

Q:

I’m parsing through a PDF before transforming it to SVG and need to analyze the specific image formats (GIF/PNG/TIFF). I can’t seem to figure out what the property is of the image name/type….Or are all the images embedded in a PDF converted into PNGs only even though they may have originated as JPG?

What I have so far. I would imagine that one of the two commented out functions would exist, but I can’t seem to find anything like it.

using (PDFDoc doc = new PDFDoc(fileName)) {
for(int j = 1; j <= doc.getPageCount(); j++){
Page page = doc.getPage(j);
using (ElementReader reader = new ElementReader()) {
reader.begin(page);
Element elem;
while ((elem = reader.next()) != null){
if (elem.getType() == Element.e_inline_image

elem.getType() == Element.e_image){

//System.out.println("******** Image Type: " + elem.getImageType());
//System.out.println("******** Image Filename: " + elem.getImageFileName());
int totalResolution = elem.getImageHeight() * elem.getImageWidth();
System.out.println("******** Image Resolution: " + totalResolution);
}
}

A:

GIF, PNG, and TIFF are not supported in PDF. Instead, images can be embedded in any of the standard PDF filters: FlateDecode (Zlib), DCTDecode (JPEG), JPXDecode(JPEG2000), etc. You can refer to the PDF specification for more information:

http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf - Table 3.5 - Standard Filters

To determine the filter type of a specific image, you can do something like this:

Image img(elem.GetImageXObject());
img.GetBaseCompressionType();.

enum BaseCompressionType {
e_base_jbig2,
e_base_ccitt,
e_base_jpeg,
e_base_jp2,
e_base_flate,
e_base_lzw,
e_base_run_length,
e_base_other
};

BaseCompressionType GetCompressionTypeFromName(string name)
{
if (name == “JPXDecode”)) return e_base_jp2;
else if (name == “DCTDecode”)) return e_base_jpeg;
else if (name == “JBIG2Decode”)) return e_base_jbig2;
else if (name == “CCITTFaxDecode”)) return e_base_ccitt;
else if (name == “FlateDecode”)) return e_base_flate;
else if (name == “LZWDecode”)) return e_base_lzw;
else if (name == “RunLengthDecode”)) return e_base_run_length;
else return e_base_other;
}

BaseCompressionType GetBaseCompressionType(Image img)
{
Obj xobject = img.GetSDFObj();
BaseCompressionType ret = e_base_other;
if (!xobject) return ret;

Obj f = xobject.FindObj(“Filter”);
if (f == null) return ret;
if (f.IsName())
return GetCompressionTypeFromName(f.GetName());
else if (f.IsArray())
{
int sz = int(f.Size());
for (int i=0; i<sz; ++i) {
if (f->GetAt(i).IsName()) {
BaseCompressionType t = GetCompressionTypeFromName(f.GetAt(i).GetName());
if (t != e_base_other) {
ret = t;
if (ret == e_base_jpeg || ret == e_base_jp2 || ret == e_base_jbig2 || ret == e_base_jp2 || ret == e_base_ccitt)
return ret;
}
}
}
}
return ret;
}

When exporting images from PDF, the pdftron.PDF.Image.Export() method will choose the appropriate format based on the embedded image’s format. Alternatively, you can use Image.ExportAsTiff or Image.ExportAsPng to enforce a given format.