Scientific Journal

Applied Aspects of Information Technology

FRAMEWORK FOR SYSTEMATIZATION OF DATA SCIENCE METHODS
Abstract:
The rapid development of data science has led to the accumulation of many models, methods, and techniques that had been successfully applied. As the analysis of publications has shown, the systematization of data science methods and techniques is an urgent task. However, in most cases, the results are relevant to applications in a particular problem domain. The paper develops the framework for the systematization of data science methods, neither domain-oriented nor task-oriented. The metamodel-method-technique hierarchy organizes the relationships between existing methods and techniques and reduces the complexity of their understanding. The first level of the hierarchy consists of metamodels of data preprocessing, data modeling, and data visualization. The second level comprises methods corresponded to metamodels. The third level collects the main techniques grouped according to methods. The authors describe the guiding principles of the framework use. It provides a possibility to define the typical process of problem-solving with data science methods. A case study is used to verify the framework’s appropriateness. Four cases of applying data science methods to solve practical problems described in publications are examined. It is shown that the described solutions are entirely agreed with the proposed framework. The recommended directions for applying the framework are defined. The constraint of the framework applying is structured or semi-structured data that should be analyzed. Finally, the ways of further research are given.
Authors:
Keywords
DOI
10.15276/aait.01.2021.7
References
  1. Vermeulen, A. F. “Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business”. New York: US: Apress. 2018. 805 p. DOI: 10.1007/978-1-4842-3054-1.
  2. Rollins, J. B. “Foundational Methodology for Data Science”. Somers: US. IBM Analytics. 2015. –Available at: https://www.ibm.com/downloads/cas/WKK9DX51. – [Accessed: 17th January 2021].
  3. Nasution, M. K. M., Sitompul, O. S. & Nababan, E. B. “Data science”. Journal of Physics Conference Series. 2020; 1566: 012034. DOI: 10.1088/1742-6596/1566/1/012034.
  4. Neifer, T., Lawo, D. & Esau, M. “Data Science Canvas: Evaluation of a Tool to Manage Data Science Projects.” Proceedings of the 54th Hawaii International Conference on System Sciences. Maui: Hawaii. 2021. p. 5399–5408.
  5. Cao, L. “Data Science: A Comprehensive Overview.” ACM Computing Surveys. 2017; 50(3): 43. DOI: 10.1145/3076253
  6. Manovich, L. “Data Science and Digital Art History.” International Journal for Digital Art History. 2015; 1: 13–35. DOI: 10.11588/dah.2015.1.21631.
  7. George, G., Osinga, E. C., Lavie, D. & Scott, B. A. “Big Data and Data Science Methods for Management Research”. Academy of Management Journal. 2016; 59(5): 1493–1507. DOI: 10.5465/amj.2016.4005.
  8. Byeong, S. K., Bong, G. K., Seon, H. C. & Tag, G. K. “Data modeling versus simulation modeling in the big data era: case study of a greenhouse control system”. Simulation: Transactions of the Society for Modeling and Simulation International. 2017; 93(7): 579–594. DOI: 10.1177/0037549717692866
  9. Akter, Sh. & Wamba, S. F. “Big data analytics in E-commerce: a systematic review and agenda for future research.” Electron Markets. 2016; 26: 173–194. DOI 10.1007/s12525-016-0219-0.
  10. Nguyen, Tr., Zhou, L., Spiegler, V., Ieromonachou, P. & Lin, Y. “Big data analytics in supply chain management: A state-of-the-art literature review.” Research. 2018; 98: 254–264. DOI: 10.1016/j.cor.2017.07.004.
  11. Alonso-Fernández, C., Calvo-Morata, A., Freire, M., Martinez-Ortiz, I. & Fernández-Manjón, B. “Applications of data science to game learning analytics data: A systematic literature review”. Computers in Education. 2019; 141: 103612. DOI: 10.1016/j.compedu.2019.10361.
  12. Alonso, S. G., de la Torre Díez, I., Rodrigues, J. J. P. C., Hamrioui, S. & López-Coronado, M. “A systematic review of techniques and sources of big data in the healthcare sector”. Journal of Medical Systems. 2017; 41(11): 183. DOI: 10.1007/s10916-017-0832-2.
  13. Parimbelli, E., Wilk, S., Cornet, R., Sniatala, P., Sniatala, K., Glaser, S., Fraterman, I., Boekhout, A. H, Ottaviano, M. & Peleg, M. “A Review of AI and Data Science Support for Cancer Management”. medRxiv preprint. medRxiv: 2020.08.07.20170191. 2020. 41 p. DOI: 10.1101/2020.08.07.20170191.
  14. Alonso, S. G., de la Torre-Díez, I., Hamrioui, S., López-Coronado, M., Barreno, D. C. & Nozaleda, L. M. “Data mining algorithms and techniques in mental health: a systematic review”. Journal of Medical Systems. 2018; 42(9): 161. DOI: 10.1007/s10916-018-1018-2.
  15.  Bao, Y., Chen, Zh., Wei, Sh., Xu, Y., Tang, Zh. & Li, H. “The State of the Art of Data Science and Engineering in Structural Health Monitoring”. Publ.Engineering. 2019; 5(2): 234–242. DOI: 10.1016/j.eng.2018.11.027.
  16. Endert, A., Ribarsky, W., Turkay, C., Wong, B., Nabney, I., Blanco, I. D. & Rossi, F. “The state of the art in integrating machine learning into visual analytics.” Computer Graphics Forum. 2017; 36(8): 458–486. DOI: 10.1111/cgf.13092.
  17. Kemal, M. “Data Visualization Tools In Action Choosing a Visualization Software.” Technical Report. University of Liverpool. Liverpool: 2019. 11 p. DOI: 10.13140/RG.2.2.11690.26560.
  18. Raghav, R. S, Pothula, S., Vengattaraman, T. & Ponnurangam D. “A survey of data visualization tools for analyzing large volume of data in big data platform.” International Conference on Communication and Electronics Systems. Coimbatore: India. 2016. p. 1–6. DOI: 10.1109/CESYS.2016.7889976.
  19. Lowe, J. & Matthee, M. “Requirements of Data Visualization Tools to Analyze Big Data: A Structured Literature Review.” Conference on e-Business, e-Services, and e-Society. Skukuza: South Africa. 2020. p. 469–480. DOI: 10.1007/978-3-030-44999-5_39.
  20. Abdallah, Z. S., Du, L. & Webb, G. I. “Data Preparation”. Encyclopedia of Machine Learning and Data Mining. Boston: US. Springer US. 2017. p. 318–327. DOI: 10.1007/978-1-4899-7687-1_62.
  21. Barapatre, D. & A, V. “Data preparation on large datasets for data science”.  Asian Journal of Pharmaceutical and Clinical Research. 2017; 10(13): 485–488. DOI: 10.22159/ajpcr.2017.v10s1.20526.
  22. Vyas, S. & Vaishnav, P. “A comparative study of various ETL process and their testing techniques in data warehouse”. Journal of Statistics and Management Systems. 2017; 20(4): 753–763. DOI: 10.1080/09720510.2017.1395194.
  23. Souibgui, M., Atigui, F., Zammali, S., Cherfi, S. & Yahia, S. B. “Data quality in ETL process: A preliminary study”. Procedia Computer Science. 2019; 159: 676–687.  DOI: 10.1016/j.procs.2019.09.223.
  24. Wang, J., Wang, X., Yang, Y., Zhang, H. & Fang, B. “A Review of Data Cleaning Methods for Web Information System.” Computers, Materials & Continua. 2020; 62(3): 1053–1075. DOI: 10.32604/cmc.2020.08675.
  25. Flood, R. L. & Carson, E. “Dealing with Complexity. An Introduction to the Theory and Application of Systems Science”. New York: US. Springer US. 1993. 280 p. DOI: 10.1007/978-1-4757-2235-2.
  26. Ayyoubzadeh, S. M., Ayyoubzadeh, S. M., Zahedi, H., Ahmadi, M. & Niakan Kalhori, S. R. “Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study”. JMIR Public Health Surveillance. 2020; 6(2): e18828. DOI: 10.2196/18828.
  27. Bhuiyan, H., Ashiquzzaman, A. & Juthi, T. I. “A Survey of existing E-mail spam filtering methods considering machine-learning techniques.” Global Journal of Computer Science and Technology. 2018; 18(2): 21–29.
  28. Komleva, N., Liubchenko, V. & Zinovatna, S. “Methodology of information monitoring and diagnostics of objects represented by quantitative estimates based on cluster analysis.” Applied Aspects of Information Technology. Publ. Science i Technical. Odesa: Ukraine. 2020; 3(1): 376–392. DOI: 10.15276/aait.01.2020.1
  29. Nguyen, Th. Kh. T., Antoshchuk, S. G., Nikolenko, A. A., Tran, K. Th. & Babilunha, O. Yu. “Non-stationary time series prediction using one-dimensional convolutional neural network models.” Herald of Advanced Information Technology. Publ. Science i Technical. Odesa: Ukraine. 2020; 3(1): 362–372. DOI: 10.15276/hait01.2020.3.
Published:
Last download:
11 Oct 2021

Contents


[ © KarelWintersky ] [ All articles ] [ All authors ]
[ © Odessa National Polytechnic University, 2018.]