DP-FL Research

Automating the Optimal Privacy Budget Selection for Differential Privacy in Federated Learning Environments

An innovative epsilon-aware strategy that dynamically adapts the privacy-utility trade-off, eliminating the need for manual tuning.

The Privacy-Utility Dilemma

The core challenge in Differentially Private Federated Learning (DP-FL) is setting the privacy budget, epsilon (ε).

Low Epsilon (ε)

Provides strong privacy by adding more noise to the data. However, this high level of noise can significantly reduce the model's accuracy, making it less useful.

High Epsilon (ε)

Leads to better model accuracy by adding less noise. The trade-off is a weaker privacy guarantee, increasing the risk of exposing sensitive information.

Finding the perfect balance is critical, but manual tuning is inefficient and rarely optimal.

Our Solution: The Epsilon-Aware Strategy

We've developed an adaptive system that automates the selection of the optimal epsilon in each round of federated training. Here’s how it works:

System Workflow Diagram

How It Works: The Algorithm Explained

Our algorithm finds the best epsilon by treating the selection as a mini-optimization problem after each round of federated learning.

Step 1: Candidate Evaluation & Proxy Training

After aggregating client updates, the server creates several copies (clones) of the new global model. Each clone is paired with a different candidate epsilon (e.g., 1.0, 2.0, 5.0). The server then trains each clone for a few epochs on a small, public proxy dataset to quickly estimate its performance.

Step 2: Performance Measurement & Normalization

The performance (F1-score) of each trained clone is measured. To make a fair comparison between performance and privacy cost, both the F1-scores and the epsilon values are normalized to a common scale of [0, 1].

Step 3: Weighted Scoring Calculation

A custom score is calculated for each candidate using a weighted formula. This formula is designed to reward high model performance while penalizing high epsilon values (weaker privacy).

score = (w1 * Norm_F1) - (w2 * Norm_Epsilon)

Here, `w1` and `w2` are weights that can be tuned to prioritize either model utility or privacy stringency.

Step 4: Selection and Distribution

The candidate epsilon that achieves the highest final score is selected as the optimal choice for the next round. This winning epsilon is then distributed to all clients along with the updated global model, ensuring the system adapts for the next iteration of training.

Key Results

Our experiments demonstrate the effectiveness of the adaptive system.

Dynamic Selection

The optimal epsilon dynamically adapted each round, typically converging between 0.5 and 2.0.

Peak Performance

The system consistently found that ε = 2.0 offered the best balance of accuracy and privacy in early rounds.

Practical Efficiency

The selection process adds a consistent and manageable computational overhead, making it viable for real-world use.

Technology Stack

PyTorch

PyTorch

Flower

Opacus

Python

Python

Citation & Resources

If you use this work in your research, please cite our paper.

The BibTeX entry will be added here once the paper is published.
}