How do I convert color PDF images to B/W?

Aaron_Gravesdale · July 16, 2008, 5:18pm

Q: I want to convert all images in a PDF to Black and white images. Is
this
possible? If so, it is possible for you to send me an example of how I
do this?
----
A: To render PDF pages as a monochrome (black & white) image you only
need to specify BPC parameter in encoder 'hints' dictionary.

For example (in C#, C/C++/Java/VB is similar):

ObjSet hint_set = new ObjSet(); // A collection of rendering 'hits'

// Initialize render 'hint' parameter, that is used to control the
// rendering process. In this case we tell the rasterizer to export
the image as
// 1 Bit Per Component (BPC) image.
Obj hint=hint_set.CreateDict();
hint.PutName("ColorSpace", "Gray");
hint.PutNumber("BPC", 1);
...
pdfdraw.Export(page, "mt.tif ", "TIF", hint);

For full sample code please take look at PDFDraw sample project that
comes as part of the SDK (http://www.pdftron.com/net/
samplecode.html#PDFDraw - then search for "mono_hint").

In case you would like to render PDF page as a grayscale image
(instead of 1BPC monochrome) simply comment-out "BPC" parameter. For
example:

// 'gray_hint' tells the rasterizer to export the image as grayscale.
Obj gray_hint=hint_set.CreateDict();
gray_hint.PutName("ColorSpace", "Gray");
draw.Export(page, "my.png", "PNG", gray_hint);

Aaron_Gravesdale · July 17, 2008, 6:59pm

Q: I have PDF's that contains colour images and text. I want to loop
trough
all pages and convert all colour images to
Black and white images. The result out should be an new PDF with B/W
images and text.

I'm using VB net.

So, if it is possible for you to give me an small example of how to

read image from input stream,
convert to black and white and
write to outputstream

Thanks!
----
A: You can traverse and extract all images from a PDF document as
shown in ImageExtract sample project. Since you are developing
under .NET you can use System.Drawing.Bitmap to convert the extracted
Bitmaps to grayscale/monochrome (http://www.bobpowell.net/
grayscale.htm or http://www.bobpowell.net/onebit.htm). Perhaps a less
elegant solution would be to use ImageMagic for image conversion.

The following discussion applies in case you would also like to
replace the original images in PDF with new (converted,
downscaled, ...) images:

You can then embed the converted image (i.e. the replacement image) in
the existing PDF using Image.Create(bitmap) (please see AddImage
sample project for concrete sample code).

The final step would be to swap the old PDF image with a new image.

One approach to this problem is illustrated in the following
pseudocode:

You can access all image XObjects used on the page through page
resource dictionary (page.GetResources()):

pdftron.SDFF.Obj res = page.GetResources();
if (res != null) {
  Obj images = res.FindObj("XObject");
  if (images != null) {
    DictInterator itr = images.GetDictIterator();
    for (; itr.HasNext(); itr.Next()) {
       Obj image = itr.Value();
       // check is the XObject is image or a form ... we are looking
only for raster images
       Obj subtype = image.FindObj("Subtype");
       if (subtype.GetName() == "Image") { // We have a raster image
         pdftron.PDF.Image old_img = new pdftron.PDF.Image(image);
         pdftron.PDF.Image replacement_img = ... convert/resample the
old image or get a new image.
         ... and now replace the old image with a new one
         image.GetDoc().Swap(image.GetObjNum(),
replacement_img.GetObjNum());
         // that's it!
       }
...

Another good example to take a look is JBIG2 sample project (http://
www.pdftron.com/net/samplecode.html#JBIG2). In this sample all
monochrome images are replaced with JBIG2 optimized images. The main
difference from the above example is that swapping procedure works on
the entire document whereas in the above example it is limited to
images directly referenced from a given page (e.g. images referenced
in Form XObjects or annotations will not be processed).