Machine learning plays an important role in a wide range of domains, including healthcare, finance, and education. However, training data often contain sensitive personal information, and privacy risks arise especially when data are collected and analyzed by an untrusted analyst. Local Differential Privacy (LDP), where each data owner perturbs their data locally before disclosure, has attracted attention as a practical approach to mitigate such risks. As a framework to enable machine learning under LDP, SUPM has been proposed. SUPM consists of a dimension reduction phase, PPTraining, and PPTesting, and supports LDP-compliant classification even for high-dimensional data. Nonetheless, in PPTraining, SUPM feeds perturbed observations as deterministic inputs, which makes learning unstable because uncertainty induced by perturbation is not explicitly reflected. In addition, statistics that can be estimated from data collected in the dimension reduction phase are not sufficiently exploited during training. In this work, we propose an uncertainty-aware data transformation method that computes posterior distributions for observed values based on distributions estimated in the dimension reduction phase of SUPM and the known perturbation probabilities of the underlying mechanism. Specifically, we introduce softOHE, which replaces one-hot representations of categorical attributes with probability vectors, softTE, which corrects target encoding by posterior expectations, and softLabel, which estimates the true label distribution from the observed label and observed features. We evaluate the proposed method on the Adult and BR datasets under two settings, TTS (Trusted Test Server) and UTS (Untrusted Test Server), depending on the trust assumption at inference time. Experimental results show that MacroF1 improves by up to 25.8% over SUPM in TTS and by up to 4.7% in UTS. Furthermore, posterior-based correction reduces the average feature reconstruction error by up to 56.4%, confirming that the proposed uncertainty-aware transformation yields representations closer to the true values.

Top