We get this error while using TextExtractor.GetAsText on some documents.
Is there a safe way to bypass / ignore this problem? We would be happy if the page in question cannot be extracted and just returns an empty string for instance. The best would be to extract what could be extracted.
Here, the program abruptly terminate and we cannot catch it at all:
terminate called after throwing an instance of ‘pdftron::Common::Exception’
what(): Exception:
Message: Bad UTF16 - leading low surrogate
Conditional expression: hiUnit <= 0xDBFF
Version : 7.1.0.74119
Platform : Linux
Architecture : AMD64
Filename : UnicodeUtils.cpp
Function : CodePoint_from_UTF16Nat_Surrogate
Linenumber : 1305
No, it is fine. The one on the site is literally one of the nightly builds. We just update the website link when its deemed important, either for new feature releases or if there is an important fix.
We tried with nightly, but still having this error:
terminate called after throwing an instance of 'pdftron::Common::Exception'
what(): Exception:
Message: Bad UTF16 - leading low surrogate
Conditional expression: hiUnit <= 0xDBFF
Version : 7.1.1.74739
Platform : Linux
Architecture : AMD64
Filename : UnicodeUtils.cpp
Function : CodePoint_from_UTF16Nat_Surrogate
Linenumber : 1305