Comparative Analysis of Google Vision OCR with Tesseract on Newspaper Text Recognition

Authors

DOI:

https://doi.org/10.69616/mcs.v1i1.178
\

Keywords:

Comparative Analysis of OCR,, Google Vision, , Optical Character Recognition,, Tesseract

Abstract

Optical Character Recognition (OCR) is a technique used to convert image files into machine-readable text. There are two Optical Character Recognition (OCR) algorithms that are currently well known and widely used, namely Google Vision's Optical Character Recognition (OCR) and Tesseract. The purpose of this study is to compare the Optical Character Recognition (OCR) algorithms of Google Vision and Tesseract so that people can more easily find out which algorithm is the right one to implement on the system they are going to build. The method used in this research is Research and Development (R&D) with the stages of literature study, needs analysis, dataset collection and expansion, architectural design development and application modeling, system implementation, testing and evaluation, drawing conclusions. To be able to determine the level of accuracy, precision and sensitivity of each algorithm, this research uses the Confusion Matrix formula. The results of this study conclude that Google Vision's Optical Character Recognition (OCR) algorithm is superior to Tesseract because the level of accuracy, sensitivity, and precision is superior to Google Vision.

References

S. E. Umbaugh, Digital Image Processing and Analysis: Applications with MATLAB® and CVIPtools. CRC press, 2017.

M. Saifudin and H. Widrani, “Rancang Bangun Sistem Digitalisasi Dokumen Menggunakan Metode Visible Watermark Di Kantor Urusan Agama (Kua) Kecamatan Sayung,” J. Tek. Inform. dan Teknol. Inf., vol. 1, no. 3, pp. 1–7, 2021.

R. C. Gonzalez and R. E. Woods, “Digital Image Processing, Hoboken.” NJ: Pearson, 2018.

V. Kajla, A. Gupta, and A. Khatak, “Analysis of x-ray images with image processing techniques: A review,” in 2018 4th International Conference on Computing Communication and Automation (ICCCA), IEEE, 2018, pp. 1–4.

N. Prameela, P. Anjusha, and R. Karthik, “Off-line Telugu handwritten characters recognition using optical character recognition,” in 2017 International conference of electronics, communication and aerospace technology (ICECA), IEEE, 2017, pp. 223–226.

A. Chaudhuri, K. Mandaviya, P. Badelia, and S. K. Ghosh, “Optical character recognition systems,” in Optical Character Recognition Systems for Different Languages with Soft Computing, Springer, 2017, pp. 9–41.

R. Kaur and D. V. Sharma, “Punjabi text recognition system for portable devices: A comparative performance analysis of cloud vision API with Tesseract,” J. Comput. Sci. Eng., vol. 2, no. 2, pp. 104–111, 2021.

I. J. Manuel, “Analisis Perbandingan Optical Character Recognition Google Vision Dengan Tesseract Pada Pengenalan Plat Nomor Kendaraan Pribadi Di Indonesia.” Universitas Multimedia Nusantara, 2018.

S. Dhiman and A. Singh, “Tesseract vs gocr a comparative study,” Int. J. Recent Technol. Eng., vol. 2, no. 4, p. 80, 2013.

A. Athira, S. Lekshmi, P. Vijayan, and B. Kurian, “Smart parking system based on optical character recognition,” in 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), IEEE, 2019, pp. 1184–1188.

R. Rismanto, A. Prasetyo, and D. A. Irawati, “Optimalisasi Image Thresholding Pada Optical Character Recognition Pada Sistem Digitalisasi dan Pencarian Dokumen,” PETIR, vol. 13, no. 1, pp. 1–11, 2020.

M. R. Firdaus, “Penerapan OCR (Optical Character Recognition) Pada Sistem Akuisisi Dokumen Jabatan Fungsional Dosen.” Universitas Muhammadiyah Malang, 2020.

S. Muharom, “Pengenalan Nomor Ruangan Menggunakan Kamera Berbasis OCR Dan Template Matching,” J. Inf., vol. 4, no. 1, 2019.

R. D. Madhani, “Optical Character Recognition (Ocr) Pada Dokumen Karya Tulis Ilmiah Menggunakan Metode Long Short-Term Memory (Lstm).” Universitas Komputer Indonesia, 2021.

I. Andi, M. Muchtar, and J. Y. Sari, “Mask Detection Using the YOLO (You Only Look Once) Method,” Media Inf. Teknol., vol. 1, no. 1, pp. 1–12, 2024.

D. Febiharsa, I. M. Sudana, and N. Hudallah, “Uji fungsionalitas (blackbox testing) sistem informasi lembaga sertifikasi profesi (silsp) batik dengan appperfect web test dan uji pengguna,” Joined J. (Journal Informatics Educ., vol. 1, no. 2, pp. 117–126, 2018.

P. N. Andono and T. Sutojo, Pengolahan citra digital. Penerbit Andi, 2017.

P. Subarkah, E. P. Pambudi, and S. O. N. Hidayah, “Perbandingan Metode Klasifikasi Data Mining untuk Nasabah Bank Telemarketing,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 20, no. 1, pp. 139–148, 2020.

Published

2024-07-19

Issue

Section

Articles