What coordinate space are the values in the JSON output of Tabular Data extraction?

Question
When using DataExtractionModule.DataExtractionEngine.e_tabular what exactly do the Rect coordinates mean in the JSON output?

Answer
The coordinates are in what might be called page space coordinates, where the origin is the bottom left corner of the page, as you would see it in a PDF viewer, with the +Y axis going up, and +X axis going right.

If you wanted to convert the coordinates to PDF user space coordinates you use the Page.GetDefaultMatrix().Invers() functions to translate. See the following code for an example.

// given pdfDoc is PDFDoc for PDF processed
Page page = pdfDoc.GetPage(pageNo);
Matrix2D mtx = page.GetDefaultMatrix().Inverse();
// now in a loop you can find all the `Rect` and `Baseline` values in the JSON and translate from `page space` to `user space`
mtx.Mult(ref x1, ref y1);
mtx.Mult(ref x2, ref y2);
// now you could do something like add an annotation over the Rect
pdftron.PDF.Annots.Square sq = pdftron.PDF.Annots.Square.Create( z, new Rect(x1, y1, x2, y2) );
sq.SetColor(new ColorPt(1, 0, 1), 3);
sq.RefreshAppearance();
page.AnnotPushBack( sq );
2 Likes

Hello, I’m AiRon, an automated tech support bot :robot:
While you wait for one of our customer support representatives, I can help you with your question.


The coordinates in the JSON output of Apryse’s Tabular Data Extraction are in page space coordinates. This means that the origin is at the bottom-left corner of the page, with the +Y axis extending upwards and the +X axis extending to the right, similar to how you would view the page in a PDF viewer. If you need to convert these coordinates into PDF user space, you would use the Page.GetDefaultMatrix().Inverse() function to perform the translation.

Some things you can try:

  • Use the Page.GetDefaultMatrix().Inverse() method to convert page space coordinates to PDF user space coordinates.

  • Apply the inverse matrix transformation to each coordinate point (x, y) in your JSON output to get the user space coordinates.

  • Use these converted coordinates for further processing, such as adding annotations or extracting specific regions.



Here are some resources that might help you:


Hopefully, this helps you solve your problem while you wait for a human to get back to you.

1 Like