The DBSCAN algorithm can discover clusters of arbitrary shapes, but it has difficulty in predicting the appropriate clustering parameters. In this study, the data field is introduced into the number field space, and the relative mass (RM) calculation method of the data field is proposed, and the first N points with larger mass in the dataset are calculated as the initial points of clustering by the RM algorithm. Then the optimized influence factor sigma is used to calculate the force range radius to achieve the optimization of the field radius parameter, so as to select the appropriate clustering parameters. In addition, this study improves the efficiency of computing large datasets by implementing the improved algorithm for parallel computing in a distributed cluster. Finally, the effectiveness of the improved algorithm is verified on three publicly available datasets, and the efficiency of parallel computation is verified on three large datasets. The results show that (1) the improved DBSCAN algorithm can effectively solve the problem of difficult selection of clustering parameters. (2) The maximum speedup ratio of parallel computation reaches 2.12 when the size of the large dataset is increased from 30,000 to 150,000 and the number of nodes involved in the computation is increased from one to five, and the average operation efficiency of the improved algorithm is improved by 32.45% compared with the original algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.