Is there a way to do a diacritical insensitive or collation based text search in PDFNet?

agravesdale · May 28, 2014, 2:05am

Q:

I use the method -(BOOL)FindText:(NSString*)searchString MatchCase:(BOOL)matchCase MatchWholeWord:(BOOL)matchWholeWord SearchUp:(BOOL)searchUp RegExp:(BOOL)regExp;
on PDFViewCtrl but this method does not handle accent very well.

French users expect the search to find “exécutif” if they search for “executif” or to find “EXÉCUTIF” if “exécutif” or “executive” is searched.

In other languages, the rules differ. Usually, string comparison and ordering in Unicode is specified with a collation that define these rules.

Is there a way to do a diacritical insensitive or collation based text search in PDFViewCtrl ?

A:
We do not provide collation-based text search in PDFNet, however you could implement this functionality on your end. You can use TextExtractor (http://www.pdftron.com/pdfnet/mobile/docs/iOS/src/interface_text_extractor.html) to extract an entire page’s text, along with bounding boxes. You could then perform a diacritical-insensitive search and use its results, along with the bounding box information, to programmatically select the appropriate characters.

Alternatively, if you are able to hide the diacritical information behind regular expressions (e.g., encoding “é” as “[ée]” or even “.”), then you could use the existing PDFNet TextSearch in regular expression mode (http://www.pdftron.com/pdfnet/mobile/docs/iOS/src/interface_text_search.html).