How can I get the page number for a specific text (Headers only) in PDF so that I can use these in my custom ToC to go to those specific pages?

Product: PDFTron WebViewer (Web/JS)

Product Version: v10.2.3

Please give a brief summary of your issue: Unable to get the page numbers for specific text in the PDF

Please describe your issue and provide steps to reproduce it:
In Web Project using WebViewer to display documents with different view depending on which API i’m using, i’m creating a custom ToC based on document’s Header sections (h1,h2,etc or ##, ###) and need to find their page numbers so that to display them in that ToC and use setCurrentPage to navigate to that page.

Basically, i wanna get the correct page numbers for specific Header texts from the PDF

Thank you for posting your question to our forum. We will provide you with an update as soon as possible.

Hello Jon,

Thank you for contacting WebViewer Forums.

To get the page number of a specific text, we can use the textSearchInit() function: Apryse WebViewer Class: DocumentViewer

await documentViewer.textSearchInit(
    'header',
    instance.Core.Search.Mode.WHOLE_WORD,
    {
      fullSearch: true,
      onResult: (result) => {
        console.log(result.pageNum);
      },
    }
  );

Best Regards,
Luke Dam
Web Development Support Engineer
Apryse Software Inc.

Hi Luke, thanks for your answer. Do I need to specify the header pattern since i was getting data in .md format that’s being converted to PDF in WebViewer and wanted to display header texts (like ##, ###, etc) in my custom ToC sidepanel

basically, how to correctly specify the pattern in textSearchInit() so that it matches only header texts from PDF?

Hello Jon,

Here is more info on the mode parameters: Apryse WebViewer Namespace: Search

However, it sounds like you have a side panel outlines similar to the one we have on our demo: Table of Contents Editing | Apryse WebViewer

In this case, the outlines implementation would be better. Documentation here: Add | Apryse Documentation

Best Regards,
Luke Dam
Web Development Support Engineer
Apryse Software Inc.

Hi luke, i know that there is outlines implementation but i still wanted to do separate side panel that extracts headers from coming .md file and generate them as TOC but I’m having issue with it’s page numbers since when converted to PDF, it adds white-spaces and paddings and becomes different from original .md file page number.

So, how to actually, find the page numbers of these header texts in PDF and use their page number to put as tags when clicked it should navigate to that page?
sample snippet of my code:

 React.useEffect(() => {
        if (webViewerInstance) {
            const { documentViewer } = webViewerInstance.Core;
            documentViewer.addEventListener('documentLoaded', async () => {
                const pageNumbers: number[] = [];
                await documentViewer.textSearchInit(
                    '##\\s.*', // search for headers
                    webViewerInstance.Core.Search.Mode.PAGE_STOP | webViewerInstance.Core.Search.Mode.HIGHLIGHT,
                    {
                        fullSearch: true,
                        onResult: (result) => {
                            if (result.resultCode === webViewerInstance.Core.Search.ResultCode.FOUND) {
                                // result.pageNum contains the page number of the found header
                                pageNumbers.push(result.pageNum);
                                console.log('Found header on page:', result.pageNum);
                            }
                        },
                    }
                );
                setHeaderPageNumbers(pageNumbers);
            });
        }
    }, [webViewerInstance]);

Hi Jon,

Thank you for clarifying.

You can set the search mode to regex and look for the string value that way.
It would look something like this for h1, h2 headers and ##, ###:

await documentViewer.textSearchInit(
    '(<h[12][^>]*>.*?</h[12]>|##\s.*|###\s.*)',
    instance.Core.Search.Mode.REGEX,
    {
      fullSearch: true,
      onResult: (result) => {
        console.log(result.pageNum);
      },
    }
  );

Please modify the pattern to your preference. Hope this helps!

Best Regards,
Luke Dam
Web Development Support Engineer
Apryse Software Inc.