Subject The research discretizes default factors of claims to loan repayment, as an aspect of statistical modeling in banking. Objectives The research identifies multiple valid discretization algorithms for credit scoring and choose the most appropriate one. It also demonstrates that discretization is necessary for building a predictive model given the logistic regression method is applied. Methods For purposes of research, I conducted the statistical analysis, content analysis of sources. Results As part of statistical analysis, the proposed algorithm (TreeR) was proved to be most appropriate among algorithms that are compliant with Basel II requirements and existing criteria. TreeR splits the continuous variable as a result of the algorithm raising decision trees for a binary dependent variable. The algorithm is a brand new solution to the discretization of the continuous variable. What distinguishes TreeR is that it sits on the open access software and relies upon publicly available libraries. Conclusions and Relevance The findings can be used for credit scoring as well as for statistical modeling based on the logistic regression.
Keywords: credit scoring, logistic regression, discretization, data preprocessing, continuous variable
References:
Tomczak J.M., Zięba M. Classification Restricted Boltzmann Machine for Comprehensible Credit Scoring Model. Expert Systems with Applications, 2015, vol. 42, iss. 4, pp. 1789–1796. URL: Link
Guégan D., Hassani B. Regulatory Learning: How to Supervise Machine Learning Models? An Application to Credit Scoring. The Journal of Finance and Data Science, 2018, vol. 4, iss. 3, pp. 157–171. URL: Link
Xia Y., Liu C., Da B., Xie F. A Novel Heterogeneous Ensemble Credit Scoring Model Based on Bstacking Approach. Expert Systems with Applications, 2018, vol. 93, pp. 182–199. URL: Link
Florez-Lopez R., Ramon-Jeronimo J.M. Enhancing Accuracy and Interpretability of Ensemble Strategies in Credit Risk Assessment. A Correlated-Adjusted Decision Forest Proposal. Expert Systems with Applications, 2015, vol. 42, iss. 13, pp. 5737–5753. URL: Link
Salem D. Supervised Versus Unsupervised Discretization for Improving Network Intrusion Detection. International Journal of Computer Science and Information Security (IJCSIS), 2016, vol. 14, iss. 10, pp. 583–590.
García S., Luengo J., Saéz J.A. et al. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering, 2013, vol. 25, no. 4, pp. 734–750. URL: Link
Kotsiantis S.B., Kanellopoulos D. Discretization Techniques: A Recent Survey. GESTS International Transactions on Computer Science and Engineering, 2006, vol. 32, iss. 1, pp. 47–58.
Kohavi R., Sahami M. Error-Based and Entropy-Based Discretization of Continuous Features. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). Portland, AAAI Press, 1996, pp. 114–119. URL: Link
Boulle M. Khiops: A Statistical Discretization Method of Continuous Attributes. Machine Learning, 2004, vol. 55, iss. 1, pp. 53–69. URL: Link
Fayyad U.M., Irani K.B. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. International Joint Conferences on Artificial Intelligence. AAAI Press, 1993, vol. 2, pp. 1022–1027.
Zighed D.A., Rabaséda S., Rakotomalala R. FUSINTER: A Method for Discretization of Continuous Attributes. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998, vol. 6, no. 3, pp. 307–326. URL: Link
Kerber R. ChiMerge: Discretization of Numeric Attributes. Proceedings of the Tenth National Conference on Artificial Intelligence. San Jose, California, AAAI Press, 1992, pp. 123–128. URL: Link
Kurgan L.A., Cios K.J. CAIM Discretization Algorithm. IEEE Transactions on Knowledge and Data Engineering, 2004, vol. 16, iss. 2, pp. 145–153. URL: Link
Tay F.E.H., Shen L. A Modified Chi2 Algorithm for Discretization. IEEE Transactions on Knowledge and Data Engineering, 2002, vol. 14, iss. 3, pp. 666–670. URL: Link
Hothorn T., Hornik K., Zeileis A. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 2006, vol. 15, iss. 3, pp. 651–674. URL: Link
Yu Sang, Heng Qi, Keqiu Li et al. An Effective Discretization Method for Disposing High-Dimensional Data. Information Sciences, 2014, vol. 270, pp. 73–91. URL: Link
Tsai C.J., Lee C.I., Yang W.P. A Discretization Algorithm Based on Class-Attribute Contingency Coefficient. Information Sciences, 2008, vol. 178, iss. 3, pp. 714–731. URL: Link
Gonzalez-Abril L., Cuberos F.J., Velasco F., Ortega J.A. Ameva: An Autonomous Discretization Algorithm. Expert Systems with Applications, 2009, vol. 36, iss. 3, part 1, pp. 5327–5332. URL: Link