Paper
14 April 1993 Machine-assisted human classification of segmented characters for OCR testing and training
R. Allen Wilkinson, Michael D. Garris, Jon C. Geist
Author Affiliations +
Proceedings Volume 1906, Character Recognition Technologies; (1993) https://doi.org/10.1117/12.143622
Event: IS&T/SPIE's Symposium on Electronic Imaging: Science and Technology, 1993, San Jose, CA, United States
Abstract
NIST needed a large set of segmented characters for use as a test set for the First Census Optical Character Recognition (OCR) Systems Conference. A machine-assisted human classification system was developed to expedite the process. The testing set consists of 58,000 digits and 10,000 upper and lower case characters entered on forms by high school students and is distributed as Testdata 1. A machine system was able to recognize a majority of the characters but all system decisions required human verification. The NIST recognition system was augmented with human verification to produce the testing database. This augmented system consists of several parts, the recognition system, a checking pass, a correcting pass, and a clean up pass. The recognition system was developed at NIST. The checking pass verifies that an image is in the correct class. The correcting pass allows classes to be changed. The clean-up pass forces the system to stabilize by making all images accepted with verified classifications or rejected. In developing the testing set we discovered that segmented characters can be ambiguous even without context bias. This ambiguity can be caused by over- segmentation or by the way a person writes. For instance, it is possible to create four ambiguous characters to represent all ten digits. This means that a quoted accuracy rate for a set of segmented characters is meaningless without reference to human performance on the same set of characters. This is different from the case of isolated fields where most of the ambiguity can be overcome by using context which is available in the non-segmented image. For instance, in the First Census OCR Conference, one system achieved a forced decision error rate for digits of 1.6% while 21 other systems achieved error rates of 3.2% to 5.1%. This statement cannot be evaluated until human performance on the same set of characters presented one at a time without context has been measured.
© (1993) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
R. Allen Wilkinson, Michael D. Garris, and Jon C. Geist "Machine-assisted human classification of segmented characters for OCR testing and training", Proc. SPIE 1906, Character Recognition Technologies, (14 April 1993); https://doi.org/10.1117/12.143622
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Optical character recognition

Image segmentation

Databases

Image classification

Classification systems

Binary data

Image processing

RELATED CONTENT

Review of chart recognition in document images
Proceedings of SPIE (February 04 2013)
Region-of-interest detection for fingerprint classification
Proceedings of SPIE (February 25 1994)
Recognizing characters of ancient manuscripts
Proceedings of SPIE (February 16 2010)
Intelligent image database indexing and query system
Proceedings of SPIE (October 18 1999)

Back to Top