I want to extract Text and Number from pdf

rakeshkind2021 · June 23, 2023, 7:04am

I’m currently working on a feature that allows users to extract text and numbers from a selected area of a PDF document. The user will be able to choose the area by dragging a rectangle over the desired region. I have already implemented an API that prints the rectangle and retrieves text based on the provided coordinates, which is functioning correctly. However, I now need to enable users to interactively select the coordinates by dragging the rectangle. It’s important to note that the interaction should not be affected by rotation or zooming.

I’m using ReactJS for this project, and I would appreciate guidance on which APIs can assist me in achieving this functionality.

Thank you!

kkim · June 23, 2023, 8:53pm

Hi there,

Thank you for contacting WebViewer forums,

We have a guide where we can extract text based on annotation:

Perhaps you can use a rectangular annotation, let the user resize it, then extract the text underneath based on annotationChanged event:

Best regards,
Kevin Kim

rakeshkind2021 · June 27, 2023, 2:14pm

Could not able to extract text from annotation Please suggest

kkim · June 27, 2023, 5:10pm

Hi there,

Using the above guides, I was able to create a rectangle annotation and get the text extracted underneath:

documentViewer.addEventListener('annotationsLoaded', async () => {
    console.log('annots loaded');
    const annotations = annotationManager.getAnnotationsList();
    annotationManager.deleteAnnotations(annotations);
    console.log(annotations);

    const annot = new Annotations.RectangleAnnotation({
      PageNumber: 1,
      X: 70,
      Y: 450,
      Width: 200,
      Height: 100,
      StrokeColor: new Annotations.Color(0, 255, 0, 1), 
    });

    annotationManager.addAnnotation(annot);
    annotationManager.redrawAnnotation(annot);

    await PDFNet.initialize();
    const doc = await documentViewer.getDocument().getPDFDoc();

    // export annotations from the document
    const annots = await annotationManager.exportAnnotations();

    // Run PDFNet methods with memory management
    await PDFNet.runWithCleanup(async () => {

      // lock the document before a write operation
      // runWithCleanup will auto unlock when complete
      doc.lock();

      // import annotations to PDFNet
      const fdf_doc = await PDFNet.FDFDoc.createFromXFDF(annots);
      await doc.fdfUpdate(fdf_doc);

      const page = await doc.getPage(1);
      const rect = await page.getCropBox();
      const annotation = await page.getAnnot(0);
      const te = await PDFNet.TextExtractor.create();
      te.begin(page, rect);
      const textData = await te.getTextUnderAnnot(annotation);
      console.log(textData);
    })
  })

Please try the method above and let us know if you still experience issues.

Best regards,
Kevin Kim

rakeshkind2021 · June 28, 2023, 10:36am

Could it be possible That user draw reactangle on thier own and Then extract text beneath it so basically I want to generate reactangle from user not pre built one

Thanks

kkim · June 28, 2023, 2:43pm

Hi there,

For the RectangleCreateTool, we have the annotationAdded event:
https://docs.apryse.com/api/web/Core.Tools.RectangleCreateTool.html#event:annotationAdded__anchor

If you want to allow the user to resize the rectangle, you would probably want a prompt to confirm and then extract the text underneath.

Although more effort, if you want to get the event where the user resizes the rectangle annotation, you could try the annotationChanged event and then specify the rectangle annotation. i.e.
if (annotations[0] instanceof instance.Core.Annotations.RectangleAnnotation)

Best regards,
Kevin Kim

rakeshkind2021 · June 29, 2023, 4:11am

By clicking the ‘R’ key on my keyboard, I can effortlessly create rectangle annotations in my PDF. Moreover, I have the amazing ability to extract text from these annotations exactly as I desire. However, I wish for this option to be enabled by default whenever I open my PDF. It would be even better if this option were displayed alongside the thumbnail. Unfortunately, despite my thorough exploration of the documentation, I have been unable to locate this particular option. I am currently utilizing React for my project. If you could provide guidance or assistance in finding this option, I would be extremely grateful.

Thank you very much!

kkim · June 29, 2023, 2:08pm

If you would like to enable the rectangle annotation create tool on document load, you can use the following APIs:

documentLoaded event Apryse Documentation | Documentation
setToolMode: Apryse Documentation | Documentation

We do not currently have a way to customize the thumbnail panel, but you can choose to customize your UI by following the advanced UI customization guide:

Best regards,
Kevin Kim