Semi-automated document image clustering and retrieval

Markus Diem; Florian Kleber; Stefan Fiel; Robert Sablatnig

doi:10.1117/12.2043010

24 March 2014 Semi-automated document image clustering and retrieval

Markus Diem, Florian Kleber, Stefan Fiel, Robert Sablatnig

Proceedings Volume 9021, Document Recognition and Retrieval XXI; 90210M (2014) https://doi.org/10.1117/12.2043010
Event: IS&T/SPIE Electronic Imaging, 2014, San Francisco, California, United States

Abstract

In this paper a semi-automated document image clustering and retrieval is presented to create links between different documents based on their content. Ideally the initial bundling of shuffled document images can be reproduced to explore large document databases. Structural and textural features, which describe the visual similarity, are extracted and used by experts (e.g. registrars) to interactively cluster the documents with a manually defined feature subset (e.g. checked paper, handwritten). The methods presented allow for the analysis of heterogeneous documents that contain printed and handwritten text and allow for a hierarchically clustering with different feature subsets in different layers.

Citation Download Citation

Markus Diem, Florian Kleber, Stefan Fiel, and Robert Sablatnig "Semi-automated document image clustering and retrieval", Proc. SPIE 9021, Document Recognition and Retrieval XXI, 90210M (24 March 2014); https://doi.org/10.1117/12.2043010

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available