1 January 1996 Detection and location of multicharacter sequences in lines of imaged text
Author Affiliations +
Abstract
A system for detecting and locating user-specified search strings, or phrases, in lines of imaged text is described. The phrases may be single words or multiple words, and may contain a partially specified word. The imaged text can be composed of a number of different fonts and graphics. Textlines in a deskewed image are hypothesized using multiresolution morphology. For each textline, the baseline, topline and x-height are identified by simple statistical methods and then used to normalize each textline bounding box. Columns of pixels in the resulting bounding box serve as feature vectors. One hidden Markov model is created for each user-specified phrase and another represents all text and graphics other than the user-specified phrases. Phrases are identified using Viterbi decoding on a spotting network created from the models. The operating point of the system can be varied to trade off the percentage of words correctly spotted and the percentage of false alarms. Results are given using a subset of the UW English Document Image Database I.
Francine R. Chen, Dan S. Bloomberg, and Lynn D. Wilcox "Detection and location of multicharacter sequences in lines of imaged text," Journal of Electronic Imaging 5(1), (1 January 1996). https://doi.org/10.1117/12.228768
Published: 1 January 1996
Lens.org Logo
CITATIONS
Cited by 7 scholarly publications and 3 patents.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Image segmentation

Databases

Optical character recognition

Visualization

Statistical modeling

Binary data

RELATED CONTENT

New approach for logo recognition
Proceedings of SPIE (March 31 2000)
Computational Models For Texture Analysis And Synthesis
Proceedings of SPIE (November 12 1981)
Spotting phrases in lines of imaged text
Proceedings of SPIE (March 30 1995)
Table analysis for multiline cell identification
Proceedings of SPIE (December 21 2000)
Very fast recognition of GIRO check forms
Proceedings of SPIE (April 14 1993)

Back to Top