Paper
12 March 2002 Interactive mining of schema for semistructured data
Yubao Liu, Yucai Feng
Author Affiliations +
Abstract
Semistructured data such as HTML, SGML and XML documents are specified in lack of any fixed and rigid schema, but typically some implicit structures appear in the data. The crucial problem of mining of schema is to discover the similarly hidden structures of the semistructured data. The huge of amounts of on-line applications make it important to mine schemas for semistructured data. Notice that the user may have to dynamically tune the minimum support of schema, in the course of mining, since the minimum support always describes the user's special interests, we present the problem of interactive mining of schema for semistructured data in this paper. In the course of interactive mining, as the old minimum support of schema is tuned by the user, one possible way of discovering the interesting schemas is to re-run the mining algorithm of schema on the new minimum from scratch. However this approach is not efficient for it does not utilize the already mined results. Hence an incremental mining algorithm is presented. In addition, an improved algorithm for finding the maximal schema tree sets is also given. The experimental results show that the incremental algorithm is more efficient than the non-incrementally A-priori-like algorithm.
© (2002) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yubao Liu and Yucai Feng "Interactive mining of schema for semistructured data", Proc. SPIE 4730, Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, (12 March 2002); https://doi.org/10.1117/12.460250
Lens.org Logo
CITATIONS
Cited by 2 patents.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Mining

Databases

Data modeling

Data mining

Internet

Calculus

Computer science

RELATED CONTENT

A topological-based spatial data clustering
Proceedings of SPIE (April 20 2016)
Empirical evaluation of interest-level criteria
Proceedings of SPIE (February 25 1999)
PNP: mining of profile navigational patterns
Proceedings of SPIE (March 12 2002)
Feature transformations and structure of attributes
Proceedings of SPIE (March 12 2002)
Adapting the right web pages to the right users
Proceedings of SPIE (April 06 2000)

Back to Top