Paper
25 May 2005 Optimizing OCR accuracy for bi-tonal, noisy scans of degraded Arabic documents
Paul Herceg, Benjamin Huyck, Christopher Johnson, Linda Van Guilder, Amlan Kundu
Author Affiliations +
Abstract
Acquiring foreign language from degraded hardcopy documents is of interest to military and border control applications. Bi-tonal image scans are desirable because file size is small. However, the nature of hardcopy degradations and the scanner or image enhancement software capabilities used directly affect the quality of the captured image and the extent of language acquisition. We applied a collection of manual treatments to hardcopy Arabic documents to develop a corpus of bi-tonal images. We then used this corpus in an exploratory study to derive conclusions about how bi-tonal images could be enhanced. This paper discusses the manually degraded Arabic document corpus, the image enhancement study, and the significant optical character recognition (OCR) improvements obtained with simple scanner driver adjustments.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Paul Herceg, Benjamin Huyck, Christopher Johnson, Linda Van Guilder, and Amlan Kundu "Optimizing OCR accuracy for bi-tonal, noisy scans of degraded Arabic documents", Proc. SPIE 5817, Visual Information Processing XIV, (25 May 2005); https://doi.org/10.1117/12.606447
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image enhancement

Scanners

Image processing

Image quality

Printing

RGB color model

RELATED CONTENT


Back to Top