Q: I'm trying to come up with a program that will, given a pdf, look
at all elements of a page and report if the page has any type of
color. I see ways to get colorspace info and text color, but i can't
figure out how to detect if an image is in color or b/w values.
The end result is to report, page by page, if the page contains color.
Can I implement this using PDFNet SDK?
-----
A: You can use PDFNet to determine if the page has any type of color.
Text and paths/vectors can be handled in the same way (e.g. using
element.GetGState(). GetFillColorSpace()/GetStrokeColorSpace()/
GetFillColor()/GetStrokeColor() similar to the code in
ElementReaderAdv - http://www.pdftron.com/net/samplecode.html#ElementReaderAdv).
For images, you can use image.GetImageData() to obtain uncompressed
data stream. To make it easier to deal with different image formats
(color spaces, channels, sample sizes, etc), you can use Image2RGB
filter to normalize all image data to RGB. For example:
// in C++
// Extract and convert image data to RGB 8-bpc format.
Image2RGB img_conv(image_element);
FilterReader reader(img_conv);
while(reader.Read(buf, 1024) != 0) {
...
}
// C# / JAVA code is similar
Image2RGB img_conv = new Image2RGB(image_element);
FilterReader reader = new FilterReader(img_conv);
while() {
reader.Read(....)
}
A simple and quick way to differentiate between color and grayscale
images is to check if the RGB values are all the same ( if pix[0] ==
pix[1] == pix[2] -> grayscale or mono, otherwise the image contains
some color).