Regular Articles

Image extraction in digital documents

[+] Author Affiliations
Chee Sun Won

Dongguk University, Department of Electronics Engineering, Seoul, 100-715, Korea

J. Electron. Imaging. 17(3), 033016 (August 27, 2008). doi:10.1117/1.2970151
History: Received May 16, 2007; Revised May 11, 2008; Accepted July 08, 2008; Published August 27, 2008
Text Size: A A A

Images included in documents usually provide information that may not be readily expressible by words. For example, academic articles with similar pictures may be of interest for researchers. We deal with the problem of extracting images in digital document. Given a digital document, the optimal block size is first determined by finding the best fit of the horizontally projected gray-level pattern to a set of orthogonal basis vectors. Because the block with the optimal size is supposed to contain sufficient information to identify text regions, the proposed method is font-size independent regardless of the size of the words in the text lines. The blocks divided by the optimal block size are classified into one of image, text, and background blocks. This block classification result, in turn, is used for the initial configuration for blockwise document segmentation. The blockwise segmentation method is based on the maximum a posteriori (MAP) framework with a deterministic relaxation algorithm. After the blockwise segmentation, each boundary block in the image region is further divided into four subblocks and the class labels for these subblocks are updated. These subdivision and class updating processes are executed recursively until we have a pixel-level segmentation. Experimental results show that the proposed image extraction method yields 2.9% error rates for 232 documents in the Oulu database.

Figures in this Article
© 2008 SPIE and IS&T

Citation

Chee Sun Won
"Image extraction in digital documents", J. Electron. Imaging. 17(3), 033016 (August 27, 2008). ; http://dx.doi.org/10.1117/1.2970151


Tables

Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging & repositioning the boxes below.

Related Book Chapters

Topic Collections

PubMed Articles
Advertisement
  • Don't have an account?
  • Subscribe to the SPIE Digital Library
  • Create a FREE account to sign up for Digital Library content alerts and gain access to institutional subscriptions remotely.
Access This Article
Sign in or Create a personal account to Buy this article ($20 for members, $25 for non-members).
Access This Proceeding
Sign in or Create a personal account to Buy this article ($15 for members, $18 for non-members).
Access This Chapter

Access to SPIE eBooks is limited to subscribing institutions and is not available as part of a personal subscription. Print or electronic versions of individual SPIE books may be purchased via SPIE.org.