21 May 2012 Binary document image compression using a three-symbol grouped code dictionary
Hermilo Sanchez-Cruz, Mario A. Rodríguez-Díaz
Author Affiliations +
Abstract
A novel method of lossy compression for images of text documents is proposed. The method is based on classifying the objects, characters, and pictures that appear in the images. We used the Tanimoto distance to group the objects into different classes to create an object dictionary; then, we codified the instances of each class by means of a code of three symbols called the three orthogonal symbol chain code (3OT). We applied an entropy coder to the resulting chain, which groups the symbols of 3OT; finally, we compressed the chain obtained by using the Paq8l archiver, which is based on a context-mixing algorithm divided into a predictor and an arithmetic coder. We obtained a high performance in memory storage, with an average of 2.17 times better compression levels with respect to the international standard Joint Bi-level Image Experts Group 2 on its lossy information version.
© 2012 SPIE and IS&T 0091-3286/2012/$25.00 © 2012 SPIE and IS&T
Hermilo Sanchez-Cruz and Mario A. Rodríguez-Díaz "Binary document image compression using a three-symbol grouped code dictionary," Journal of Electronic Imaging 21(2), 023013 (21 May 2012). https://doi.org/10.1117/1.JEI.21.2.023013
Published: 21 May 2012
Lens.org Logo
CITATIONS
Cited by 4 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image compression

Binary data

Associative arrays

Image processing

Image restoration

Image segmentation

Image quality standards

Back to Top