Data Science and Machine Learning projects

Data Science and Machine Learning projects

2020 research

Projects developed during the UniCredit internship including OCR, NLP, and document classification.

OCR Pipeline

  • Implementation of a generic and configurable OCR web service for text extraction from scanned documents.
  • Configurable image processing based preprocessing modules, configurable modules for different OCR engines, language sensitive word correction post processing modules.

Garnishment Document Classification and Enrichment

  • Testing different enrichment models and perform different tests.
  • Implementation of Deep learning techniques for document classification and named-entity recognition.
  • Dockerizing, Integration and testing web services.

Stamp Recognition and Information Extraction

  • Specific stamp recognition from scanned documents and date and time extraction.
  • Implemented with Keras, scikit-learn, OpenCV.
  • Managed to achieve 83% accuracy.