Figures inserted in documents mediate a kind of information for which the visual modality is more appropriate than the
text. A complete understanding of a figure often necessitates the reading of its caption or to establish a relationship with
the main text using a numbered figure identifier which is replicated in the caption and in the main text. A figure and its
caption are closely related; they constitute single multimodal components (FC-pair) that Document Image Analysis
cannot extract with text and graphics segmentation. We propose a method to go further than the graphics and text
segmentation in order to extract FC-pairs without performing a full labelling of the page components. Horizontal and
vertical text lines are detected in the pages. The graphics are associated with selected text lines to initiate the detector of
FC-pairs. Spatial and visual disorders are introduced to define a layout model in terms of properties. It enables to cope
with most of the numerous spatial arrangements of graphics and text lines. The detector of FC-pairs performs operations
in order to eliminate the layout disorder and assigns a quality value to each FC-pair. The processed documents were
collected in medic@, the digital historical collection of the BIUM (Bibliothèque InterUniversitaire Médicale). A first set
of 98 pages constitutes the design set. Then 298 pages were collected to evaluate the system. The performances are the
result of a full process, from the binarisation of the digital images to the detection of FC-pairs.
In this paper, we present a new sliding window based local thresholding technique 'NICK' and give a detailed
comparison of some existing sliding-window based thresholding algorithms with our method. The proposed method aims
at achieving better binarization results, specifically, for ancient document images. NICK has been inspired from the
Niblack's binarization method and exhibits its robustness and effectiveness when evaluated on low quality ancient
document images.
A page of a document is a set of small components which are grouped by a human reader into higher level components,
such as lines and text blocs. Document image analysis is aimed at detecting these components in document images. We
propose the encoding of local information by considering the properties that determine perceptual grouping. Each
connected component is labelled according to the location of its nearest neighbour connected component. These labelled
components constitute the input of a rule-based incremental process. Vertical and horizontal text lines are detected
without prior assumption on their direction. Touching characters belonging to different lines are detected early and
discarded from the grouping process to avoid line merging. The tolerance for grouping components increases in the
course of the process until the final decision. After each step of the grouping process, conflict resolution rules are
activated. This work was motivated by the automatic detection of Figure&Caption pairs in the documents of the
historical collection of the BIUM digital library (Bibliotheque InterUniversitaire Medicale). The images that were used
in this study belong to this collection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.