How do I detect/remove Hyphens from text using TextExtractor?

Aaron_Gravesdale · August 21, 2008, 5:12pm

Q: Is there a way to remove Hyphens while using the TextExtractor when
traversing PDF text using Lines and Words iterators?
------
A: TextExtractor.GetAsText() has a boolean option than can be used to
enable removal of hyphen character and merging of text runs to form a
single word.

When you extract words line by line this option is not available
because it wouldn't be possible to return the correct positioning
information (since the word is split across two lines). Instead, you
can use line.EndsWithHyphen() - which tests whether the line of text
ends with a hyphen (i.e. '-'). If EndsWithHyphen() is true you can
remove the last character and treat the first word on the next line as
belonging to the same word.