Paper
24 February 2023 Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition
Cerwin Dexter L. Dela Rosa, Kreed Zion Lorenzo G. Lagunilla, Jomari V. Ramos, Austin Kenneth V. San Pedro, Joseph Marvin R. Imperial
Author Affiliations +
Proceedings Volume 12590, Third International Conference on Computer Vision and Information Technology (CVIT 2022); 1259006 (2023) https://doi.org/10.1117/12.2669907
Event: 2022 3rd International Conference on Computer Vision and Information Technology (CVIT 2022), 2022, Beijing, China
Abstract
Common approaches to vision-based tasks such as character and object recognition use Convolutional Neural Networks (CNNs) due to their practicality in processing images and theoretical grounding. In this work, we take a different perspective in the task of Baybayin script recognition by exploring Vision Transformers, a new paradigm for processing images inspired by the Transformer model. We compare performances of CNNs and ViT and analyzed model confidence on a set of test images using Local Interpretable Model-Agnostic Explanations (LIME). Results show that, performance-wise, convolution-based architectures (CNNS) still outperform sequence-based methods (ViT) for discriminating Baybayin scripts with a nearly doubled performance of 84.5% to 48.8% in accuracy respectively.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Cerwin Dexter L. Dela Rosa, Kreed Zion Lorenzo G. Lagunilla, Jomari V. Ramos, Austin Kenneth V. San Pedro, and Joseph Marvin R. Imperial "Convolutions vs. Sequences: Understanding performances of neural-based methods for automatic Baybayin script recognition", Proc. SPIE 12590, Third International Conference on Computer Vision and Information Technology (CVIT 2022), 1259006 (24 February 2023); https://doi.org/10.1117/12.2669907
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visual process modeling

Transformers

Optical character recognition

Image processing

Convolutional neural networks

Artificial neural networks

Convolution

Back to Top