Abstract

    Open Access Research Article Article ID: ARA-4-102

    Enhancing Imbalanced Dataset by Utilizing (K-NN Based SMOTE_3D Algorithm)

    Khaldoon Alshouiliy*, Sujan Ray, Ali AlGhamdi and Dharma P Agrawal

    Big data is currently a huge industry that has grown significantly every year. Big data is being used by machine learning and deep learning algorithm to study, analyze and parse big data and then drive useful and beneficial results. However, most of the real datasets are collected through different organizations and social media and mainly fall under the category of Big Data applications. One of the biggest and most drawbacks of such datasets is an imbalance representation of samples from different categories. In such case, the classifiers and deep learning techniques are not capable of handling issues like these. A majority of existing works tend to overlook these issues. Typical data balancing methods in the literature resort to data resampling whether it is under sampling a majority class samples or oversampling the minority class of samples. In this work, we focus on the minority sample and ignore the majority ones. Many researchers have done many works as most of the work suffers from over sampling or form the generated noise in the dataset. Additionally, works are either suitable for either big data or small data. Moreover, some other work suffers from a long processing time as complicated algorithms are used with many steps to fix the imbalance problem. Therefore, we introduce a new algorithm that deals with all these issues. We have created a short example to explain briefly how the SMOTE works and why we need to enhance the SMOTE and we have done this by using a very well-known imbalance dataset that we downloaded from the Kaggle website. We collect the results by using Azure machine learning platform. Then, we compare the results to see that the model is functional just good with SMOTE and way better than without it. 

    Keywords:

    Published on: Apr 25, 2020 Pages: 1-6

    Full Text PDF Full Text HTML DOI: 10.17352/ara.000002
    CrossMark Publons Harvard Library HOLLIS Search IT Semantic Scholar Get Citation Base Search Scilit OAI-PMH ResearchGate Academic Microsoft GrowKudos Universite de Paris UW Libraries SJSU King Library SJSU King Library NUS Library McGill DET KGL BIBLiOTEK JCU Discovery Universidad De Lima WorldCat VU on WorldCat

    Indexing/Archiving

    Pinterest on ARA