EVALUASI KLASIFIKASI HASIL CATUR BLITZ PADA DATASET TIDAK SEIMBANG SKALA BESAR MENGGUNAKAN COST-SENSITIVE LEARNING
DOI:
https://doi.org/10.30656/xv0d9m54Abstract
The rapid growth of online chess platforms has generated large-scale structured game data that can be utilized for data-driven analysis. In blitz mode games, match outcomes are categorized into win, lose, and draw; however, the distribution of these outcomes is inherently imbalanced, with draw representing a small minority of the dataset. This study aims to evaluate the effectiveness of cost-sensitive learning through balanced class weighting in improving classification performance on an imbalanced large-scale blitz chess dataset. A total of 100,000 rated blitz games were extracted from the Lichess open database and processed through preprocessing, feature extraction, and stratified data splitting. Three supervised learning algorithms - Support Vector Machine (SVM), Decision Tree, and Random Forest - were implemented. Model performance was evaluated using Macro F1-score as the primary metric, along with accuracy and 5-fold stratified cross-validation. The results indicate that without cost-sensitive learning, the recall for the minority class (draw) approaches zero despite achieving higher overall accuracy (0.54). In contrast, applying balanced class weighting significantly improves minority class detection, increasing recall for draw up to 0.73 with a Macro F1-score of approximately 0.40, although overall accuracy decreases to 0.45. This demonstrates the trade-off between global performance and minority class sensitivity. Feature importance analysis further reveals that move count is the most influential predictor of match outcomes. These findings confirm that imbalance-aware learning plays a critical role in large-scale chess outcome classification and highlight the importance of appropriate evaluation metrics in imbalanced datasets.
Keywords: Chess outcome classification, Imbalanced classification, Cost-sensitive learning, Support Vector Machine, Macro F1-score, Large-scale chess dataset
References
[1] A. Rapp and A. Boldi, “The quantification of the gaming experience: Self-tracking practices and game metrics among casual players, esports players, and streamers,” Comput. Human Behav., vol. 174, Jan. 2026, doi: 10.1016/j.chb.2025.108826.
[2] S. Chowdhary, I. Iacopini, and F. Battiston, “Quantifying human performance in chess,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-27735-9.
[3] X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,” Soc. Sci. Res., vol. 110, Feb. 2023, doi: 10.1016/j.ssresearch.2022.102817.
[4] K. Samara, A. Antreassian, M. Klug, and M. S. Hasan, “Machine Learning Approaches for Classifying Chess Game Outcomes: A Comparative Analysis of Player Ratings and Game Dynamics,” Electronics (Basel)., vol. 15, no. 1, p. 1, Dec. 2025, doi: 10.3390/electronics15010001.
[5] Q. Zhou and B. Sun, “Adaptive K-means clustering based under-sampling methods to solve the class imbalance problem,” Data Inf. Manag., vol. 8, no. 3, Sep. 2024, doi: 10.1016/j.dim.2023.100064.
[6] T. Ishii, T. Yang, and M. Goto, “BIG-PU: An evaluation metric for exploration based on preference elicitation in recommender systems,” Expert Syst. Appl., vol. 308, p. 131077, May 2026, doi: 10.1016/j.eswa.2025.131077.
[7] M. Ali, “Classification of imbalanced travel mode choice dataset with SMOTE and prediction using interpretable machine learning,” Sustainable Futures, vol. 10, Dec. 2025, doi: 10.1016/j.sftr.2025.101119.
[8] R. Drezewski and G. Wator, “Chess as sequential data in a chess match outcome prediction using deep learning with various chessboard representations,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 1760–1769. doi: 10.1016/j.procs.2021.08.180.
[9] Y. Bhargava, S. K. Shetty, and V. Baths, “Subjective Cognitive Decline Prediction on Imbalanced Data Using Data-Resampling and Cost-Sensitive Training Methods,” in Procedia Computer Science, Elsevier B.V., 2024, pp. 1964–1979. doi: 10.1016/j.procs.2024.04.186.
[10] S. Ünalan, O. Günay, I. Akkurt, K. Gunoglu, and H. O. Tekin, “A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning,” J. Radiat. Res. Appl. Sci., vol. 17, no. 4, p. 101080, Dec. 2024, doi: 10.1016/j.jrras.2024.101080.
[11] Y. Qin, Q. Xu, T. Kujala, X. Wang, and F. Cong, “Evaluating spatial normalization for SVM-based EEG decoding: A within- and between-subjects perspective,” Biomed. Signal Process. Control, vol. 116, May 2026, doi: 10.1016/j.bspc.2026.109535.
[12] M. A. Lones, “Avoiding common machine learning pitfalls,” Oct. 11, 2024, Cell Press. doi: 10.1016/j.patter.2024.101046.
[13] R. De Leone, F. Maggioni, and A. Spinelli, “A robust twin parametric margin support vector machine for multiclass classification,” EURO Journal on Computational Optimization, vol. 13, Jan. 2025, doi: 10.1016/j.ejco.2025.100115.
[14] A. Alfaleh, N. Ben Khedher, S. M. Eldin, M. Alturki, I. Elbadawi, and R. Kumar, “Predicting thermal conductivity and dynamic viscosity of nanofluid by employment of Support Vector Machines: A review,” Energy Reports, vol. 10, pp. 1259–1267, Nov. 2023, doi: 10.1016/j.egyr.2023.08.001.
[15] M. Curran, E. Howley, and J. Duggan, “Classification of System Dynamics model outputs using decision trees,” Machine Learning with Applications, vol. 21, p. 100713, Sep. 2025, doi: 10.1016/j.mlwa.2025.100713.
[16] L. Barreñada, P. Dhiman, D. Timmerman, A.-L. Boulesteix, and B. Van Calster, “Understanding overfitting in random forest for probability estimation: a visualization and simulation study,” Diagn. Progn. Res., vol. 8, no. 1, Sep. 2024, doi: 10.1186/s41512-024-00177-1.
[17] C. Liu et al., “Metabolomics for origin traceability of lamb: An ensemble learning approach based on random forest recursive feature elimination,” Food Chem. X, vol. 29, Jul. 2025, doi: 10.1016/j.fochx.2025.102856.
[18] H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Dec. 15, 2024, Mesopotamian Academic Press. doi: 10.58496/BJML/2024/007.
[19] J. C-Rella, G. Claeskens, R. Cao, and J. M. Vilar, “Instance-dependent cost-sensitive parametric learning,” Neurocomputing, vol. 615, Jan. 2025, doi: 10.1016/j.neucom.2024.128875.
[20] B. BAKIRARAR and A. H. ELHAN, “Class Weighting Technique to Deal with Imbalanced Class Problem in Machine Learning: Methodological Research,” Turkiye Klinikleri Journal of Biostatistics, vol. 15, no. 1, pp. 19–29, 2023, doi: 10.5336/biostatic.2022-93961.
[21] A. X. Wang, V. T. Le, H. N. Trung, and B. P. Nguyen, “Addressing imbalance in health data: Synthetic minority oversampling using deep learning,” Comput. Biol. Med., vol. 188, Apr. 2025, doi: 10.1016/j.compbiomed.2025.109830.
[22] J. Zheng, S. Wang, H. Yan, and H. Sun, “Binary classification for imbalanced datasets using a novel metric method,” Egyptian Informatics Journal, vol. 33, p. 100890, Mar. 2026, doi: 10.1016/j.eij.2026.100890.
[23] S. Liu et al., “Comparison of evaluation metrics of deep learning for imbalanced imaging data in osteoarthritis studies,” Osteoarthritis Cartilage, vol. 31, no. 9, pp. 1242–1248, Sep. 2023, doi: 10.1016/j.joca.2023.05.006.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Khairuddin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
- This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
-
Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
- Information
- Notice about change in the copyright policy of the journal 'Jurnal Sistem Informasi (JSiI)' : "From Vol 1, onwards the copyright of the article published in the journal 'Jurnal Sistem Informasi' will be retained by the author"


















