Investigating the Performance of the Combined Dagging Method with the Hoeffding Tree Base Algorithm in the Qualitative Classification of Drinking Water

Document Type : Research Paper

Authors

1 Associate Professor, Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran.

2 M.Sc. Student, Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz, Iran.

Abstract

For the effective qualitative management of drinking water, it is necessary to estimate the level of water pollution. In this research, to calculate the quality index of drinking water from the chemical parameters of Total Hardness, Alkalinity, Electrical Conductivity, Total Dissolved Solids, Calcium, Sodium, Magnesium, Potassium, Chlorine, Carbonate, Bicarbonate, and Sulfate in the hydrometric station of Bagh Kelayeh, Qazvin province used in the statistical period of 23 years (1998-2020). According to the calculated numerical values ​​and existing standards, water quality classified into two classes, good and excellent. To predict the quality class of drinking water based on chemical parameters, different combinations of parameters were considered in the form of several scenarios. In this regard, correlation and relief algorithms were used to select different scenarios. Hoeffding tree was used as a basic model for classifying water quality based on different combinations of parameters. Also, the performance of the combined Dagging approach in improving the results was evaluated. The results showed that the combined Dagging improves the water quality classification results. Scenario 6 Dagging with Hoeffding tree base algorithm, including HCO3, Ca, SO3, TDS, EC and TH parameters, with Kappa = 1, was introduced as the best method which is able to classify test samples correctly.

Keywords


 
جاویدان، س.، ستاری، م. ت.، کریم‌زاده، پ.، و مهرابی، ا.، (1401)، "تحلیل عملکرد روش‌های هیدرولوژیکی و داده-مبنا در برآورد میزان رسوب معلق"، مجله محیط‌زیست و مهندسی آب، 8(2)، 468-480، https://doi.org/10.22034/jewe.2021.305599.1632.
دزفولی، د.، موغاری، م.ح.، ابراهیمی، ک.، و عراقی‎نژاد، ش.، (1396)، "تعیین طبقه‎بندی کیفی آب بر اساس حداقل پارامترهای کیفی (مطالعه موردی: رودخانه کارون)"، محیط‌زیست طبیعی، مجله منابع طبیعی ایران، 70(3)، 595-583، https://doi.org/10.22059/jne.2017.213338.1224.
ستاری، م. ت.، میرعباسی، ر.، و عباسقلی نایب‎زاد، م.، (1396)، "استفاده از داده‌کاوی در پیش‌بینی کیفیت آب‌های سطحی (مطالعۀ موردی: رودخانه‌های دامنۀ شمالی سهند)"، اکوهیدرولوژی، 4(2)، 419-407، https://doi.org/10.22059/ije.2017.61477.
Babar, R., and Babar, S., (2017), “Predicting river water quality index using data mining techniques”, Environmental Earth Sciences, 76(504), 1-15, https://doi.org/10.1007/s12665-017-6845-9.
Domingos, P., and Hulten, G., (2003), “A general framework for mining massive data streams”, Journal of Computational and Graphical Statistics, 12(4), 945-949, https://doi.org/10.1198/1061860032544.
Elish, M., and Elish, K., (2021), “An empirical comparison of resampling ensemble methods of deep learning Neural Networks for cross-project software defect prediction”, International Journal of Intelligent Engineering and Systems, 14(3), 201-209, https://doi.org/10.22266/ijies2021.0630.18.
Gakii, C., and Jepkoech, J., (2019), “A classification model for water quality analysis using desision tree”, European Journal of Computer Science and Information Technology, 7(3), 1-8.
Hall, M.A., (1999), “Correlation-based feature selection for machine learning”, Ph.D. Thesis, University of Waikato.
Khan, M.S.I., Islam, N., Uddin, J., Islam, S., and Nasir, M.K. (2022). Water quality prediction and classification based on principal component regression and gradient boosting classifier approach. Journal of King Saud University-Computer and Information Sciences, 34(8), 4773-4781, https://doi.org/10.1016/j.jksuci.2021.06.003
Kavita, D., and Jagdish, S., (2012), “Water resources management and water quality, case of Bhopal”, International Conference on Chemical, Ecology and Environmental Sciences, Bangkok.
Khalil, B., Ouarda, T., and St-Hilaire, A., (2011), “Estimation of water quality characteristics at ungauged sites using artificial neural networks and canonical correlation analysis”, Journal of Hydrology, 405, 277-287, https://doi.org/10.1016/j.jhydrol.2011.05.024.
Kira, K., and Rendell, L. A., (1992), “The Feature Selection Problem: Traditional methods and a new algorithm”, Proceedings of the 10th National Conference on Artificial intelligence, 129-134.
Kotsianti, S.B., and Kanellopoulos, D., (2007), "Combining bagging, boosting and dagging for classification problems", In: Apolloni, B., Howlett, R.J., Jain, L. (eds.), Knowledge-Based Intelligent Information and Engineering Systems, KES 2007, Lecture Notes in Computer Science, Vol. 4693, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74827-4_62.
Kotti, M.E., Vlessidis, A.G., Thanasoulias, N.C., and Evmiridis, N.P., (2005), “Assessment of river water quality in Northwestern Greece”, Water Resources Management, 19, 77-94, https://doi.org/10.1007/s11269-005-0294-z.
Meddouri, N., Khoufi, H., and Maddouri, M., (2021), "A performant dagging approach of classification based on formal concept", International Journal of Artificial Intelligence and Machine Learning (IJAIML), 11(2), 38-62, http://doi.org/10.4018/IJAIML.20210701.oa3.
Mehta, V., and Sanghavi, V., (2019), “Comparative study of various decision tree methods for data stream mining”, In: 3rd International Conference on Information and Communication Technology (ICICT), Springer International Publishing, pp. 371-379, https://doi.org/10.1007/978-981-13-1165-9_34.
Sattari, M.T., Feizi, H., Colak, M., Ozturk, A., Ozturk, F., and Apaydin, H., (2021), “Surface water quality classification using data mining approaches: Irrigation along the Aladag River”, Irrigation and Drainage, 70(5), 1227-1246, https://doi.org/10.1002/ird.2594.
Singh, D.F, (1992), “Studies on the water quality index of some major rivers of Pune, Maharashtra”, Proceedings of the Academy of Environmental Biology 1, 61-66.
Ting, K.M., and Witten, I.H., (1997), “Stacking bagged and dagged models”, In: Fourteenth International Conference on Machine Learning, San Francisco, CA, pp. 367-375.
Yusri, H., Ab Rahim, A., Hassan, S., Halim, I., and Abdullah, N., (2022), “Water quality classification using SVM and XGBoost method”, IEEE 13th Control and System Graduate Research Colloquium (ICSGRC), pp. 231-236, https://doi.org/10.1109/ICSGRC55096.2022.9845143.