Volume 38 Issue 1
Feb.  2024
Turn off MathJax
Article Contents
LIU Wenjie, WANG Guoqiang. Self-training credit evaluation integrated classification model based on data editing[J]. Journal of Shanghai University of Engineering Science, 2024, 38(1): 83-89. doi: 10.12299/jsues.23-0054
Citation: LIU Wenjie, WANG Guoqiang. Self-training credit evaluation integrated classification model based on data editing[J]. Journal of Shanghai University of Engineering Science, 2024, 38(1): 83-89. doi: 10.12299/jsues.23-0054

Self-training credit evaluation integrated classification model based on data editing

doi: 10.12299/jsues.23-0054
  • Received Date: 2023-03-06
  • Publish Date: 2024-03-30
  • Aiming at the problems of unbalance of credit data and difficult acquisition of label data, a self-training credit evaluation integrated classification model based on data editing was proposed. Firstly, synthetic minority over-sampling technique (SMOTE) was used to sample labeled samples to alleviate data imbalance. Secondly, a Stacking integration model was constructed on a few labeled sample datasets and unlabeled samples were "falsified" to obtain label-like data. Finally, an improved semi-supervised double-weighted K-nearest neighbor algorithm was proposed, which was used to clip the pseudo-label data and expand the training set until the model converged. Simulation experiments of UCI and Kaggle credit evaluation dataset show that the model has better predictive performance and can identify a few types of samples more effectively.
  • loading
  • [1]
    高俊光, 刘旭, 朱辰辰. 小微企业信用评估的数据挖掘方法综述[J] . 金融理论与实践,2015(10):98 − 101. doi: 10.3969/j.issn.1003-4625.2015.10.019
    [2]
    周永圣, 崔佳丽, 周琳云, 等. 基于改进的随机森林模型的个人信用风险评估研究[J] . 征信,2020,38(1):28 − 32. doi: 10.3969/j.issn.1674-747X.2020.01.006
    [3]
    张田华, 罗康洋. 基于集成学习的上市公司高送转预测实证研究[J] . 计算机工程与应用,2022,58(10):255 − 262. doi: 10.3778/j.issn.1002-8331.2011-0224
    [4]
    罗康洋, 王国强. 基于改进的MRMR算法和代价敏感分类的财务预警研究[J] . 统计与信息论坛,2020,35(3):77 − 85. doi: 10.3969/j.issn.1007-3116.2020.03.011
    [5]
    张涛, 汪御寒, 李凯, 等. 基于样本依赖代价矩阵的小微企业信用评估方法[J] . 同济大学学报(自然科学版),2020,48(1):149 − 158. doi: 10.11908/j.issn.0253-374x.19017
    [6]
    YAROWSKY D. Unsupervised word sense disambiguation rivaling supervised methods[C]// Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Cambridge: Association for Computational Linguistics, 1995: 189−196.
    [7]
    ZHOU Z H. When semi-supervised learning meets ensemble learning[C]// Proceedings of the 8th International Workshop on Multiple Classifier Systems. Reykjavik: Springer, 2009: 529−538.
    [8]
    HADY M F A, SCHWENKER F. Co-training by committee: A generalized framework for semi-supervised learning with committees[J] . International Journal of Software and Informatics,2008,2(2):95 − 124.
    [9]
    黎春, 周振宇. 信用评分模型中拒绝推断问题研究: 基于半监督协同训练法的改进[J] . 统计研究,2019,36(9):82 − 92.
    [10]
    WANG G. D-self-smote: New method for customer credit risk prediction based on self-training and smote[J] . Icic Express Letters Part B Applications An International Journal of Research & Surveys,2018,9(3):241 − 246.
    [11]
    肖进, 李思涵, 贺小舟, 等. 代价敏感的客户流失预测半监督集成模型研究[J] . 系统工程理论与实践,2021,41(1):188 − 199. doi: 10.12011/SETP2019-2879
    [12]
    张天翼, 丁立新. 一种基于SMOTE的不平衡数据集重采样方法[J] . 计算机应用与软件,2021,38(9):273 − 279. doi: 10.3969/j.issn.1000-386x.2021.09.043
    [13]
    CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: Synthetic minority over-sampling technique[J] . Journal of Artificial Intelligence Research,2002,16:321 − 357. doi: 10.1613/jair.953
    [14]
    岳鹏, 侯凌燕, 杨大利, 等. 基于XGBoost特征选择的疾病诊断XLC-Stacking方法[J] . 计算机工程与应用,2020,56(17):136 − 141.
    [15]
    WOLPERT D H. Stacked generalization[J] . Neural Networks,1992,5(2):241 − 259. doi: 10.1016/S0893-6080(05)80023-1
    [16]
    陆万荣, 许江淳, 李玉惠. 面向Stacking集成的改进分类算法及其应用[J] . 计算机应用与软件,2022,39(2):281 − 286. doi: 10.3969/j.issn.1000-386x.2022.02.045
    [17]
    韩嵩, 韩秋弘. 半监督学习研究的述评[J] . 计算机工程与应用,2020,56(6):19 − 27. doi: 10.3778/j.issn.1002-8331.1911-0083
    [18]
    龚旭. 半监督协同训练算法中样本去噪的研究[D]. 重庆: 重庆师范大学, 2021.
    [19]
    潘用科, 贺紫平, 夏克文, 等. 改进的协同训练半监督SVM在油层识别中的应用[J] . 郑州大学学报(工学版),2022,43(1):14 − 19. doi: 10.13705/j.issn.1671-6833.2022.01.001
    [20]
    陈日新, 朱明旱. 半监督k近邻分类方法[J] . 中国图象图形学报,2013,18(2):195 − 200. doi: 10.11834/jig.20130210
    [21]
    陈振洲, 李磊, 姚正安. 基于SVM的特征加权KNN算法[J] . 中山大学学报(自然科学版),2005(1):17 − 20. doi: 10.3321/j.issn:0529-6579.2005.01.005
    [22]
    UCI Machine Learning Repository. Taiwanese Bankruptcy Prediction[EB/OL]. (2020-06-27)[2022-12-23]. https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction.html.
    [23]
    Kaggle. Financial distress Preduction[EB/OL]. (2017-12-15)[2022-12-23]. https://www.kaggle.com/datasets/shebrahimi/financial-distress.html.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(1)  / Tables(5)

    Article Metrics

    Article views (147) PDF downloads(38) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return