The rapid development of deep learning has brought benefits to modern society. However, training models requires large-scale datasets and significant computational resources, which can lead to situations in which third parties become involved in the training process. A backdoor attack injects malicious data into the training dataset so that, at test time, the model produces attacker-chosen outputs. Since machine learning models depend heavily on the quality of training data, establishing effective defenses against backdoor attacks is important. A representative line of work analyzes activations to identify and remove poisoned data from the training dataset. Notable examples include Activation Clustering (AC) and Spectral Signature (SS). However, because AC employs the K-means algorithm, it implicitly assumes that the activations of clean and poisoned data form spherical and linearly separable clusters. In contrast, SS removes the same fraction of data from all classes, which can lead to the removal of clean data when only a small number of labels are attacked. In this work, we focus on a characteristic V-shaped structure that emerges when activations are projected into two dimensions via principal component analysis. By introducing a polar coordinate representation, we propose a new defense method that uses this geometric pattern. We further investigate the underlying causes of this V-shaped structure based on discussions in prior studies. Experiments on Fashion-MNIST and CIFAR-10 demonstrate that the proposed method, which leverages the properties of the observed distributions, can separate clean and poisoned data more effectively than existing methods.

Top