I want to extract Text and Number from pdf

I’m currently working on a feature that allows users to extract text and numbers from a selected area of a PDF document. The user will be able to choose the area by dragging a rectangle over the desired region. I have already implemented an API that prints the rectangle and retrieves text based on the provided coordinates, which is functioning correctly. However, I now need to enable users to interactively select the coordinates by dragging the rectangle. It’s important to note that the interaction should not be affected by rotation or zooming.

I’m using ReactJS for this project, and I would appreciate guidance on which APIs can assist me in achieving this functionality.

Thank you!

Hi there,

Thank you for contacting WebViewer forums,

We have a guide where we can extract text based on annotation:

Perhaps you can use a rectangular annotation, let the user resize it, then extract the text underneath based on annotationChanged event:

Best regards,
Kevin Kim

Could not able to extract text from annotation Please suggest

Hi there,

Using the above guides, I was able to create a rectangle annotation and get the text extracted underneath:

documentViewer.addEventListener('annotationsLoaded', async () => {
    console.log('annots loaded');
    const annotations = annotationManager.getAnnotationsList();
    annotationManager.deleteAnnotations(annotations);
    console.log(annotations);

    const annot = new Annotations.RectangleAnnotation({
      PageNumber: 1,
      X: 70,
      Y: 450,
      Width: 200,
      Height: 100,
      StrokeColor: new Annotations.Color(0, 255, 0, 1), 
    });

    annotationManager.addAnnotation(annot);
    annotationManager.redrawAnnotation(annot);

    await PDFNet.initialize();
    const doc = await documentViewer.getDocument().getPDFDoc();

    // export annotations from the document
    const annots = await annotationManager.exportAnnotations();

    // Run PDFNet methods with memory management
    await PDFNet.runWithCleanup(async () => {

      // lock the document before a write operation
      // runWithCleanup will auto unlock when complete
      doc.lock();

      // import annotations to PDFNet
      const fdf_doc = await PDFNet.FDFDoc.createFromXFDF(annots);
      await doc.fdfUpdate(fdf_doc);

      const page = await doc.getPage(1);
      const rect = await page.getCropBox();
      const annotation = await page.getAnnot(0);
      const te = await PDFNet.TextExtractor.create();
      te.begin(page, rect);
      const textData = await te.getTextUnderAnnot(annotation);
      console.log(textData);
    })
  })

Please try the method above and let us know if you still experience issues.

Best regards,
Kevin Kim

Could it be possible That user draw reactangle on thier own and Then extract text beneath it so basically I want to generate reactangle from user not pre built one

Thanks

Hi there,

For the RectangleCreateTool, we have the annotationAdded event:
https://docs.apryse.com/api/web/Core.Tools.RectangleCreateTool.html#event:annotationAdded__anchor

If you want to allow the user to resize the rectangle, you would probably want a prompt to confirm and then extract the text underneath.

Although more effort, if you want to get the event where the user resizes the rectangle annotation, you could try the annotationChanged event and then specify the rectangle annotation. i.e.
if (annotations[0] instanceof instance.Core.Annotations.RectangleAnnotation)

Best regards,
Kevin Kim

By clicking the ‘R’ key on my keyboard, I can effortlessly create rectangle annotations in my PDF. Moreover, I have the amazing ability to extract text from these annotations exactly as I desire. However, I wish for this option to be enabled by default whenever I open my PDF. It would be even better if this option were displayed alongside the thumbnail. Unfortunately, despite my thorough exploration of the documentation, I have been unable to locate this particular option. I am currently utilizing React for my project. If you could provide guidance or assistance in finding this option, I would be extremely grateful.

Thank you very much!

If you would like to enable the rectangle annotation create tool on document load, you can use the following APIs:

We do not currently have a way to customize the thumbnail panel, but you can choose to customize your UI by following the advanced UI customization guide:

Best regards,
Kevin Kim