Paper
7 March 2022 Salary prediction using random forest with fundamental features
Jingyi Chen, Shuming Mao, Qixuan Yuan
Author Affiliations +
Proceedings Volume 12167, Third International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021); 1216720 (2022) https://doi.org/10.1117/12.2628520
Event: 2021 Third International Conference on Electronics and Communication, Network and Computer Technology, 2021, Harbin, China
Abstract
Salary prediction can get the income range during a certain period, which is widely used in a credit decision, career choice, and HR strategy. The decision tree is a common method for salary prediction, and it can sum up the experience of training data. However, the high dimension and high variance in splitting data remain a challenge of the decision tree in salary prediction. To overcome the above challenge, in this paper, we use Random Forest (RF) to predict salary and improve accuracy. Firstly, we preprocess the dataset by dealing with missing data, categorical data, and deleting the useless data. Then, we construct RF to eliminate the variance of a single decision tree by repeating grouping and splitting data. To decrease variance produced by multiple variables, we arbitrarily choose subsets of given variables to reduce factors. Besides, the repetition of tree model constructions in RF eliminates clashes caused by splitting data in each tree level. To verify the effectiveness of the proposed method, we compare it with other state-of-art baselines, including decision tree, logistic regression, naïve bayes, and k-nearest neighbor on the Adult dataset. The experimental results demonstrate that the proposed method outperforms related benchmarks in predicting salary.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jingyi Chen, Shuming Mao, and Qixuan Yuan "Salary prediction using random forest with fundamental features", Proc. SPIE 12167, Third International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021), 1216720 (7 March 2022); https://doi.org/10.1117/12.2628520
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Performance modeling

Computer programming

Machine learning

Mathematical modeling

Binary data

Data analysis

Back to Top