This is regarding insufficient pdf data extraction using data extraction module.

Product: DataExtractionModule - OCRModule.exe & StructuredOutput.exe

Product Version: OCRModule - 10.11.8.4197 , StructuredOutput - 10.0.18108.1

Please give a brief summary of your issue: Unable to get some type of data after extraction from PDF using the provided module.

Please describe your issue and provide steps to reproduce it:
When we are extracting pdf data for some PDFs using the provided module , in the output JSON , we are unable to get shapes data like line , rectangle, circle kind of entities. If i am getting a paragraph or textbox data we are not able to see any border color in the extracted data JSON.

Kindly explian why is this so or we are using wrong module.

Warm Regards.
Jatin

1 Like

Thank you for contacting us about this. While the Data Extraction module supports extracting document structure (including headers, footers, font style), it does not provide the ability to extract shapes and information such as table border color.

I have raised this with the team as a feature request for future consideration.

For more information about the capabilities of the module, please refer to the documentation here.

1 Like