OVERVIEW OF OCR TOOLS FOR THE TASK OF RECOGNIZING TABLES AND GRAPHS IN DOCUMENTS

Authors

DOI:

https://doi.org/10.20535/2708-4930.3.2022.265200

Keywords:

OCR, PDF files, FastText, detection, recognition, deep learning, technical documents.

Abstract

This study provides an overview of OCR tools for recognizing document tables and graphs. Digitizing paper documents has many advantages for both individuals and businesses. One must use OCR (optical character recognition) software to digitize. Such software scans documents to make the text readable by a computer. One can convert them to formats supported by Microsoft Word or Google Docs. OCR software is becoming more of a necessity than a utility for entertainment. OCR creates searchable, editable text from printed documents, as well as from scanned photos or books and PDF files.

Currently, there is an active trend toward the digitalization of documents. There is a great demand for solutions that can effectively automate the processing of an extensive array of documents with high accuracy. A particular case is the processing of PDF files, such as scanned documents or generated by software editors. OCR solutions aim to increase the efficiency of processing and analysis of digital documents using artificial intelligence. Both government agencies and businesses can use these solutions. The developed systems can be a valuable addition to CRM systems and can be integrated instead of existing document processing modules or used as a separate solution.

Although existing OCR solutions can efficiently recognize text, recognizing graphical elements, such as charts and tables, is still in the making. Solutions that can increase the accuracy of visual data recognition can be valuable for technical document processing, such as scientific, financial, and other analytical documents.

Downloads

Published

2022-12-23