Currently we’re using the PDFNet SDK (http://www.pdftron.com/pdfnet)
to extract mainly vector graphics, text and images. We’re trying to
determine whether the PDFNet SDK has the capability to clip the
content with an arbitrarily rotated clipping region before we extract
it. We would like to extract only content inside the crop box, and
any content lying on one or more of the crop region edges would be
appropriately trimmed. I’m aware of the page crop and media boxes, but
the page can only be rotated to multiples of 90 degrees, and these
boxes do not seem to have any effect on the content extracted.
A: PDFNet does not offer general polygon clipping API. Also, it is
not clear how this would work for raster images, text, or form
xobjects. In case of paths (vector art), you could implement the
required functionality using third party extensions such as GPC
(General Polygon Clipper - http://www.cs.manchester.ac.uk/~toby/alan/software).
The library is fairly small (2 files) and the actual implementation
should be fairly short. So you would use PDFNet to extract vector
outlines (using element. GetPathPoints/GetPointCount/GetPathTypes/
GetPathTypesCount() ) and GPC to compute the intersections between the
path and the required clipping region.
How do I compute the intersection between the path in PDF and an arbitrary clipping region?