Q:
Our scenario is this:
· Input file is a layered PDF (normally one page, but could be more)
· We need to check that a particular layer has live (not outlined) text on it
· We know the layer name we are looking for will contain the word ‘artwork’
· Therefore, we want to attempt to extract text only on this particular layer (if it is found)
· If the extracted text is empty, we will fail the process, otherwise we continue
Is there a recommended approach to this? My developers have been struggling a little with this as there doesn’t appear to be a way to extract text from only one layer?
A:
Yes, this is a somewhat tricky. One thing that pops to mind is that you can extract the required text layer into a temp page then use ‘pdftron.PDF.TextExtractor’ to get text from the page.
To extract the layer you can use the approach shown in ElementEdit sample: http://www.pdftron.com/pdfnet/samplecode.html#ElementEdit
To copy elements you would initialize ElementReader with OCG Context similar to the way PDFDraw in PDFLayers sample (http://www.pdftron.com/pdfnet/samplecode.html#PDFLayers):
Config init_cfg = doc.GetOCGConfig();
Context ctx = new Context(init_cfg);
ctx.ResetStates(false);
ctx.SetState(ocg, true);
…
reader.Begin(page, ctx);
…
if (element.IsOCVisible()) {
writer.ElementWrite(element);
}