Question
When using DataExtractionModule.DataExtractionEngine.e_tabular what exactly do the Rect coordinates mean in the JSON output?
Answer
The coordinates are in what might be called page space coordinates, where the origin is the bottom left corner of the page, as you would see it in a PDF viewer, with the +Y axis going up, and +X axis going right.
If you wanted to convert the coordinates to PDF user space coordinates you use the Page.GetDefaultMatrix().Invers() functions to translate. See the following code for an example.
// given pdfDoc is PDFDoc for PDF processed
Page page = pdfDoc.GetPage(pageNo);
Matrix2D mtx = page.GetDefaultMatrix().Inverse();
// now in a loop you can find all the `Rect` and `Baseline` values in the JSON and translate from `page space` to `user space`
mtx.Mult(ref x1, ref y1);
mtx.Mult(ref x2, ref y2);
// now you could do something like add an annotation over the Rect
pdftron.PDF.Annots.Square sq = pdftron.PDF.Annots.Square.Create( z, new Rect(x1, y1, x2, y2) );
sq.SetColor(new ColorPt(1, 0, 1), 3);
sq.RefreshAppearance();
page.AnnotPushBack( sq );