I am trying to parse a PDF that was generated from a CAD drawing and it seems that the text is stored as e_path (because parsing e_text did not return anything). How can I identify the textual paths from the drawings? For example, text inside a circle, how do I know that e_path represents a circle versus some character?
Unfortunately it is common for CAD software to do this, have what looks to be text, actually just vector paths, and not real text that can be searched and selected.
I would highly recommend resolving this as the source if possible. Perhaps there is some way to have the text be actual text.
If not, then the best course of action is use an Optical Character Recognition (OCR) program to determine the text.
Tesseract is a great open source OCR tool.