Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

ReVar : Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

NeurIPS 2023

Fudan University

^† corresponding authors

Abstract

This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance, and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.

Framework

Overall pipeline of ReVar. (a) Two different views of the graph are obtained by graph augmentation transform, and are subsequently fed into GNN encoder. (b) Intra-class and inter-class representations are aggregated, which means, for labeled nodes, it’s positive samples not only belong to the same class in both view but also in the other view. (c) Variance estimation. Specifically, the label probability distribution is computed for each node in two views based on it’s similarity with each class center. And the difference between two probability distributions is used to approximate the model’s variance and also optimized as one term in the loss function.

Experiment

Comparison of our method ReVar and other baselines on three benchmark datasets. Experimental results are measured by averaged balanced accuracy (bAcc.,%) and F1-score (%) with the standard errors over 5 repetitions on three GNN architectures. Highlighted are the top first and second. ∆ is the margin by which our method leads state-of-the-art method.

Comparison of our method ReVar and other baselines on CS-Random. Highlighted are the top first and second. ∆ is the margin by which our method leads state-of-the-art method.

BibTeX

@article{yan2023rethinking, title={Rethinking semi-supervised imbalanced node classification from bias-variance decomposition}, author={Yan, Divin and Wei, Gengchen and Yang, Chen and Zhang, Shengzhong and others}, journal={Advances in Neural Information Processing Systems}, volume={36}, pages={29174--29200}, year={2023} } }