TY - GEN
T1 - Combining uniform manifold approximation with localized affine shadowsampling improves classification of imbalanced datasets
AU - Bej, Saptarshi
AU - Srivastava, Prashant
AU - Wolfien, Markus
AU - Wolkenhauer, Olaf
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/7/18
Y1 - 2021/7/18
N2 - Oversampling approaches are a popular choice to improve classification on imbalanced datasets. The SMOTE algorithm is the pioneer for many algorithms, built as extensions of SMOTE, to solve its problem of over-generalization of the minority class. Some extensions adopt the approach of learning the minority class data distribution through clustering and manifold learning techniques. The Localised Random Affine Shadowsampling (LoRAS) algorithm, models the convex space, controlling the local variance of a synthetic sample by constructing them from convex combinations of multiple shadow samples generated by adding Gaussian noise to the original minority samples. LoRAS also uses t-SNE for a manifold learning step to identify minority class data neighbourhoods. The algorithm is known to outperform some early SMOTE extensions, improving F1-Score and Balanced accuracy for highly imbalanced classification problems. However, the state-of-the-art manifold learning algorithm UMAP is known to preserve the local and global structure of the latent data manifold better than t-SNE and is considerably faster. We have integrated the UMAP for manifold learning with localized affine shadowsampling, to build the LoRAS-UMAP algorithm. We have benchmarked the new algorithm LoRAS-UMAP against some state-of-the-art oversampling algorithms on 14 publicly available datasets characterized by high imbalance, high dimensionality, and high absolute imbalance. In summary, we incorporated UMAP for the manifold learning step yielding better F1-Score, Balanced accuracy and runtime for the LoRAS algorithm in comparison to t-SNE for manifold learning, particularly in the case of high-dimensional datasets.
AB - Oversampling approaches are a popular choice to improve classification on imbalanced datasets. The SMOTE algorithm is the pioneer for many algorithms, built as extensions of SMOTE, to solve its problem of over-generalization of the minority class. Some extensions adopt the approach of learning the minority class data distribution through clustering and manifold learning techniques. The Localised Random Affine Shadowsampling (LoRAS) algorithm, models the convex space, controlling the local variance of a synthetic sample by constructing them from convex combinations of multiple shadow samples generated by adding Gaussian noise to the original minority samples. LoRAS also uses t-SNE for a manifold learning step to identify minority class data neighbourhoods. The algorithm is known to outperform some early SMOTE extensions, improving F1-Score and Balanced accuracy for highly imbalanced classification problems. However, the state-of-the-art manifold learning algorithm UMAP is known to preserve the local and global structure of the latent data manifold better than t-SNE and is considerably faster. We have integrated the UMAP for manifold learning with localized affine shadowsampling, to build the LoRAS-UMAP algorithm. We have benchmarked the new algorithm LoRAS-UMAP against some state-of-the-art oversampling algorithms on 14 publicly available datasets characterized by high imbalance, high dimensionality, and high absolute imbalance. In summary, we incorporated UMAP for the manifold learning step yielding better F1-Score, Balanced accuracy and runtime for the LoRAS algorithm in comparison to t-SNE for manifold learning, particularly in the case of high-dimensional datasets.
UR - https://www.scopus.com/pages/publications/85116416109
U2 - 10.1109/IJCNN52387.2021.9534072
DO - 10.1109/IJCNN52387.2021.9534072
M3 - Conference contribution
AN - SCOPUS:85116416109
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 International Joint Conference on Neural Networks, IJCNN 2021
Y2 - 18 July 2021 through 22 July 2021
ER -