PDFNet v6.5 RightToLeft support in TextExtractor

anon86032500 · May 20, 2015, 8:30am

Hello support
today i tried the new righttoleft support for the textextractor. In this context I don’t understand the changes in the code so I hope you can clarify this fact.

What i do is that i set the Property of PdfViewControl.SetRightToLeftLanguage(true) so in the picture i see the order of points i got from the current selection of a word in a pdf document. I see that the order is changed. So its okay. Now i tried to
run the text extractor. When I run the textextractor with the following code

`

using (var textExtractor = new TextExtractor())
{
textExtractor.SetRightToLeftLanguage(false);
textExtractor.Begin(1, new Rect(new Point(153,214),new Point(183,222)), TextExtractor.ProcessingFlags.e_no_invisible_text | TextExtractor.ProcessingFlags.e_no_dup_remove);

`

So in the RightToLeft = false context i got the correct line from pdf.

`
for (var pdfLine = textExtractor.GetFirstLine(); pdfLine.IsValid(); pdfLine = pdfLine.GetNextLine())

`

The pdfLine.GetBBox() is the same rect to (new Rect(new Point(153,214),new Point(183,222))) and the pdfLine.GetQuad() i got the same results in same order to the first column you see in the picture.
When i set PdfViewControl.SetRightToLeftLanguage(true) and TextExtractor.SetRightToLeftLanguage(true) and run the same code. I got a pdfLine which BboX and Quad array has the same y1,y2 values but the x1, x2 values are different.
The question is how i must set the Rect property in textExtractor.Begin() to get the correct pdf line.

I hope you can understand what i mean.

Best regards

Daniel

Anatoly_Kudrevatukh · May 26, 2015, 12:47am

Hello Daniel,

Thank you for reporting the issue. It appears there is an error in TextExtractor right to left bbox/quad computation logic.

It will most likely going to be fixed in the next official release.

Regards,
Anatoly.