Extracting text from highlight annotation

volodymyr.liutkin · November 24, 2023, 11:53am

Good day What is the proper way to extract plain text from highlight annotation object or from its xfdf?

kkim · November 24, 2023, 7:21pm

Hi there,

Thanks for reaching out to WebViewer forums,

To get the text from the highlight Annotation, you can typically extract the text under a highlight annotation by accessing the quads of the annotation and then using the documentViewer.getSelectedText() method.

const { annotationManager, documentViewer } = instance.Core;

// Assuming you have an annotation object
const highlightAnnotation = annotationManager.getSelectedAnnotations()[0]

const quads = highlightAnnotation.getQuads();
const textsUnderHighlight = quads.map(quad => {
  const selectionStartPoint = { x: quad.x1, y: quad.y1, pageNumber: highlightAnnotation.PageNumber };
  const selectionEndPoint = { x: quad.x3, y: quad.y3, pageNumber: highlightAnnotation.PageNumber };
  
  documentViewer.select(selectionStartPoint, selectionEndPoint);
  return documentViewer.getSelectedText();
})
console.log(textsUnderHighlight)

To get text from the highlight annotation’s XFDF, it would be in the tags:

<?xml version="1.0" encoding="UTF-8" ?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve"><pdf-info xmlns="http://www.pdftron.com/pdfinfo"
        version="2" import-version="4" />
    <fields />
    <annots>
        <highlight page="0" rect="123.566,239.305,392.054,310.801" color="#FFCD45" flags="print"
            name="f51aff45-5560-77bf-e8c0-c637b0e91522" title="Guest" subject="Highlight" date="D:20231124140457-05'00'"
            creationdate="D:20231124140457-05'00'"
            coords="123.566,310.8009,392.05400000000003,310.8009,123.566,239.3049000000001,392.05400000000003,239.3049000000001">
            <trn-custom-data
                bytes="{&quot;trn-annot-preview&quot;:&quot;mportant F&quot;,&quot;trn-associated-number&quot;:&quot;1&quot;}" />
        </highlight>
    </annots>
    <pages>
        <defmtx matrix="1,0,0,-1,0,792" />
    </pages>
</xfdf>

Best regards,
Kevin Kim

volodymyr.liutkin · November 29, 2023, 1:51pm

Also better not forget to documentViewer.clearSelection() after text is extracted, otherwise the selected text will stay selected in the viewer.