PERBANDINGAN MODEL PEMBELAJARAN MESIN BERBASIS SMOTE MENINGKATKAN IDENTIFIKASI SISWA BERISIKO DI SEKOLAH MENENGAH PERTAMA
DOI:
https://doi.org/10.30656/jsii.v12i1.10382Abstract
Penelitian ini mengevaluasi efektivitas Teknik Pengambilan Sampel Berlebih Minoritas Sintetis (SMOTE) dalam mengatasi ketidakseimbangan kelas pada kumpulan data pendidikan, dengan fokus pada peningkatan model prediktif untuk mengidentifikasi siswa berisiko di sekolah menengah pertama di Indonesia. Fakta lapangan menunjukkan bahwa tidak semua data yang diambil memiliki ketidakseimbangan kelas. Kinerja model Decision Tree, Random Forest, dan SVM dinilai menggunakan kurva ROC dan metrik AUC sebelum dan setelah penerapan SMOTE. Random Forest menunjukkan peningkatan AUC paling signifikan (0,95→0,99) karena generalisasi ensembel pada data yang seimbang, sementara Decision Tree mengalami peningkatan marjinal (0,94→0,95) dan SVM mengalami trade-off kecil (0,93→0,94) karena sensitivitas terhadap noise sintetis. Semua model mengungguli tebakan acak (AUC>0,93), mengonfirmasi manfaat SMOTE dalam meningkatkan deteksi kelas minoritas untuk aplikasi seperti identifikasi siswa berisiko. Penelitian ini memajukan kerangka kerja praktis untuk memanfaatkan teknik pembelajaran ketidakseimbangan kelas pada dataset dalam pendidikan, menekankan wawasan yang dapat ditindaklanjuti dan meningkatkan hasil pembelajaran melalui pengambilan keputusan berbasis data. Dengan menerapkan teknik seperti SMOTE, sekolah dapat lebih akurat mengidentifikasi siswa berisiko, memungkinkan intervensi dini yang efektif dan alokasi sumber daya yang lebih efisien. Selain itu, penelitian ini mendorong pengembangan kebijakan pendidikan yang inklusif dan adil, mengurangi bias sistemik serta meningkatkan hasil pembelajaran secara keseluruhan.
Keywords: Data Mining, Decision Tree C4.5, academic performance, educational classification, determinant factors
References
[1] L. N. Munaroh, “Asesmen dalam Pendidikan : Memahami Konsep,Fungsi dan Penerapannya,” J. Pendidik. Sos. Hum., vol. 3, no. 3, hal. 281–297, 2024.
[2] D. Sudrajat, A. I. Purnamasari, A. R. Dikananda, D. A. Kurnia, dan A. Bahtiar, “Klasifikasi Mutu Pembelajaran Hybrid berdasarkan Algoritma C.45, Random Forest dan Naïve Bayes dengan Optimasi Bootsrap Areggating (Bagging) pada masa COVID-19,” JURIKOM (Jurnal Ris. Komputer), vol. 9, no. 6, hal. 2227, 2022, doi: 10.30865/jurikom.v9i6.5179.
[3] A. Y. Ananta, N. Noprianto, dan V. N. Wijayaningrum, “Desain Sistem Smart Attendance Menggunakan Kombinasi Smart Card Dan Sidik Jari,” Sistemasi, vol. 9, no. 3, hal. 480, 2020, doi: 10.32520/stmsi.v9i3.874.
[4] D. K. Sari dan P. Simanjuntak, “Sistem Pakar Penentuan Minat Dan Bakat Ekstrakurikuler Siswa,” Glob. Transitions Proc., vol. 3, no. 2, hal. 103–112, 2020.
[5] Fatini et al., “Media Sosial Dianggap Mampu Melakukan Fungsi Dari Dauran Promosi Secara Terpadu Hingga ke Tahap Transaksi,” J. Ekon. Manajemen, Bisnis Dan Sos., vol. 1, no. 2, hal. 126–131, 2021, [Daring]. Tersedia pada: https://doi.org/10.5281/zenodo.4575272#.YEAONaLn1YM.mendeley
[6] N. Kusstianti, S. Dwiyanti, dan S. Usodoningtyas, “Pengembangan Kurikulum Pendidikan Tata Rias Berbasis Outcome Based Education (OBE),” J. Vocat. Tech. Educ., vol. 4, no. 2, hal. 1–9, 2022, doi: 10.26740/jvte.v4n2.p1-9.
[7] U. P. Sanjaya, T. Pribadi, dan I. W. D. Prastya, “Klasifikasi Dana Hibah Usaha Mikro Kecil dan Menengah dengan Metode Naïve Bayes,” Indones. J. Comput. Sci., vol. 11, no. 3, hal. 975–984, 2022, doi: 10.33022/ijcs.v11i3.3099.
[8] S. N. Br Sembiring, H. Winata, dan S. Kusnasari, “Pengelompokan Prestasi Siswa Menggunakan Algoritma K-Means,” J. Sist. Inf. Triguna Dharma (JURSI TGD), vol. 1, no. 1, hal. 31, 2022, doi: 10.53513/jursi.v1i1.4784.
[9] M. Hanafy dan R. Ming, “Improving Imbalanced Data Classification in Auto Insurance by the Data Level Approaches,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, hal. 493–499, 2021, doi: 10.14569/IJACSA.2021.0120656.
[10] D. Vassallo, V. Vella, dan J. Ellul, “Application of Gradient Boosting Algorithms for Anti-money Laundering in Cryptocurrencies,” SN Comput. Sci., vol. 2, no. 3, hal. 1–15, 2021, doi: 10.1007/s42979-021-00558-z.
[11] Y. Fu, Y. Du, Z. Cao, Q. Li, dan W. Xiang, “A Deep Learning Model for Network Intrusion Detection with Imbalanced Data,” Electron., vol. 11, no. 6, hal. 1–13, 2022, doi: 10.3390/electronics11060898.
[12] M. Cascella et al., “Utilizing an artificial intelligence framework (conditional generative adversarial network) to enhance telemedicine strategies for cancer pain management,” J. Anesth. Analg. Crit. Care, vol. 3, no. 1, 2023, doi: 10.1186/s44158-023-00104-8.
[13] J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, dan S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Appl. Sci., vol. 13, no. 6, 2023, doi: 10.3390/app13064006.
[14] V. Nirmala, H. S. Shashank, M. M. Manoj, G. Satish Royal, dan J. Premaladha, “Skin Cancer Classification Using Image Processing with Machine Learning Techniques,” Intell. Data Anal. IoT, Blockchain, hal. 1–15, 2023, doi: 10.1201/9781003371380-1.
[15] Asniar, N. U. Maulidevi, dan K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 6, hal. 3413–3423, 2022, doi: 10.1016/j.jksuci.2021.01.014.
[16] A. Özdemir, K. Polat, dan A. Alhudhaif, “Classification of imbalanced hyperspectral images using SMOTE-based deep learning methods,” Expert Syst. Appl., vol. 178, no. March, 2021, doi: 10.1016/j.eswa.2021.114986.
[17] L. Yu, G. Chen, A. Koronios, S. Zhu, dan X. Guo, “Yu - Application and comparison of classification techniques in credit risk - 2007,” Tsinghua Univ., hal. 111–145.
[18] N. H. A. Malek, W. F. W. Yaacob, Y. B. Wah, S. A. Md Nasir, N. Shaadan, dan S. W. Indratno, “Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data,” Indones. J. Electr. Eng. Comput. Sci., vol. 29, no. 1, hal. 598–608, 2023, doi: 10.11591/ijeecs.v29.i1.pp598-608.
[19] X. Wang et al., “Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier,” BMC Med. Inform. Decis. Mak., vol. 21, no. 1, hal. 1–14, 2021, doi: 10.1186/s12911-021-01471-4.
[20] A. R. Salehi dan M. Khedmati, “A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data,” Sci. Rep., vol. 14, no. 1, hal. 1–17, 2024, doi: 10.1038/s41598-024-55598-1.
[21] C. Azad, B. Bhushan, R. Sharma, A. Shankar, K. K. Singh, dan A. Khamparia, “Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus,” Multimed. Syst., vol. 28, no. 4, hal. 1289–1307, 2022, doi: 10.1007/s00530-021-00817-2.
[22] T. Wongvorachan, S. He, dan O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Inf., vol. 14, no. 1, 2023, doi: 10.3390/info14010054.
[23] H. Fei et al., “Cotton Classification Method at the County Scale Based on Multi-Features and Random Forest Feature Selection Algorithm and Classifier,” Remote Sens., vol. 14, no. 4, 2022, doi: 10.3390/rs14040829.
[24] J. Platt, N. Cristianini, dan J. Shawe-Taylor, “Large Margin DAGs for Multiclass Classification,” Adv. Neural Inf. Process. Syst., hal. 547–553, 2000, doi: 10.1.1.158.4557.
[25] C. Jin, R. Liu, B. Tang, dan B. Cai, “Predict FTSE100 Stock Movements Using Business News Sentiment and Machine Learning,” Theor. Nat. Sci., vol. 2, no. 1, hal. 50–55, 2023, doi: 10.54254/2753-8818/2/20220148.
[26] A. Gutiérrez-Gallego et al., “Combination of Machine Learning Techniques to Predict Overweight/Obesity in Adults,” J. Pers. Med., vol. 14, no. 8, 2024, doi: 10.3390/jpm14080816.
[27] Z. He, M. Wu, X. Zhao, S. Zhang, dan J. Tan, “Representative null space LDA for discriminative dimensionality reduction,” Pattern Recognit., vol. 111, hal. 107664, 2021, doi: 10.1016/j.patcog.2020.107664.
[28] L. R. H. Y. Ibrahim Irawan, Gani Hilmansyah, “Perbandingan algoritma naïve bayes dan c4.5 untuk klasifikasi bantuan rumah sehat,” JUIK (JURNAL ILMU KOMPUTER), vol. 2, 2022.
[29] M. Amjad, I. Ahmad, M. Ahmad, P. Wróblewski, P. Kamiński, dan U. Amjad, “Prediction of Pile Bearing Capacity Using XGBoost Algorithm: Modeling and Performance Evaluation,” Appl. Sci., vol. 12, no. 4, 2022, doi: 10.3390/app12042126.
[30] S. Bakheet dan A. Al-Hamadi, “Automatic detection of COVID-19 using pruned GLCM-Based texture features and LDCRF classification,” Comput. Biol. Med., vol. 137, no. June, hal. 104781, 2021, doi: 10.1016/j.compbiomed.2021.104781.
[31] O. Iparraguirre-Villanueva, L. Mirano-Portilla, M. Gamarra-Mendoza, dan W. Robles-Espiritu, “Predicting Obesity in Nutritional Patients using Decision Tree Modeling,” Int. J. Adv. Comput. Sci. Appl., vol. 15, no. 3, hal. 254–260, 2024, doi: 10.14569/IJACSA.2024.0150326.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Hevi Alvina Damayanti, Ucta Pradema Sanjaya

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
- This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
-
Author(s)' Warranties
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).
- Information
- Notice about change in the copyright policy of the journal 'Jurnal Sistem Informasi (JSiI)' : "From Vol 1, onwards the copyright of the article published in the journal 'Jurnal Sistem Informasi' will be retained by the author"