Detecting nonsense for Chinese comments based on logistic regression

Ren Zhuolin; Chen Guang; Chen Shu

doi:10.1117/12.2242283

11 July 2016 Detecting nonsense for Chinese comments based on logistic regression

Ren Zhuolin, Chen Guang, Chen Shu

Proceedings Volume 10011, First International Workshop on Pattern Recognition; 100111J (2016) https://doi.org/10.1117/12.2242283
Event: First International Workshop on Pattern Recognition, 2016, Tokyo, Japan

Abstract

To understand cyber citizens’ opinion accurately from Chinese news comments, the clear definition on nonsense is present, and a detection model based on logistic regression (LR) is proposed. The detection of nonsense can be treated as a binary-classification problem. Besides of traditional lexical features, we propose three kinds of features in terms of emotion, structure and relevance. By these features, we train an LR model and demonstrate its effect in understanding Chinese news comments. We find that each of proposed features can significantly promote the result. In our experiments, we achieve a prediction accuracy of 84.3% which improves the baseline 77.3% by 7%.

Citation Download Citation

Ren Zhuolin, Chen Guang, and Chen Shu "Detecting nonsense for Chinese comments based on logistic regression", Proc. SPIE 10011, First International Workshop on Pattern Recognition, 100111J (11 July 2016); https://doi.org/10.1117/12.2242283

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available