OCR invoice processing

  • Algorithms
  • Python

Algorithms for universal OCR invoice processing

Our Data Science department had been working on writing and developing algorithms aimed to improve and expand the functionality of Optical Character Recognition of invoices for companies.

Thanks to new algorithms, the technology can correctly process invoices with any formatting and not only with a predetermined format.

The result of the work was an API that receives an invoice of a certain company (for the invoices of which a basic template of bounding boxes and expected text in PDF format was added previously) with one or more pages in image format, then the algorithm recognizes the table borders and the necessary data on it. After that, the algorithm processes the data - recalculates amounts and percentages, making notes on whether the values in the invoice are correct. This solution allows the automation of fiscal accounting.

Development time
2 weeks of 1 developer