Traditional ML is "move data to compute"—pull everything into a central data center, then train. Federated learning flips this to "move compute to data"—the same idea as the data locality you know from MapReduce: don't move the data, push the code down. The difference is that the reduce stage here aggregates not data but the model gradients each node computed locally; raw data never leaves the user's device.
Pain point: hospital records, phone keyboard logs, bank transactions—this data is both sensitive and scattered; law (GDPR) and commercial interest both forbid centralizing it. But without centralizing it you can't train a good model. The federated answer: the model goes to the data, not the data to the model.
The mechanism is FedAvg (Federated Averaging), proposed by McMahan et al. in 2016—a repeating loop: the server sends the current global model to a batch of clients → each client trains a few steps on its local data → it sends back only the weight update (not the data) → the server takes a data-size-weighted average to form the new global model.
The aggregation formula is plain: w ← Σ (n_k / n) · w_k. w_k is client k's trained local weights, n_k its sample count, n the total. Intuition: whoever has more data gets more say in the average—it's just a weighted reduce.
import numpy as np # FedAvg's core is really just this weighted-average fn (frameworks: Flower/FedML) def fedavg(client_weights, client_sizes): # client_weights: list of each client's locally trained weight vectors # client_sizes: each client's local sample count n_k total = sum(client_sizes) # Weight by sample count: more data → larger share global_w = sum( (n_k / total) * w_k for w_k, n_k in zip(client_weights, client_sizes) ) return global_w # the global model to push next round # Simulate: 3 clients, raw data never leaves them w = fedavg([np.array([1.,2]), np.array([3.,4]), np.array([2.,2])], [100, 50, 200]) # bank(200) has the most say print(w) # [1.857 2.286] — biased toward the data-rich client
Differential privacy is like a database query's "controlled redaction": you can ask "what's the average salary," but the system mixes in carefully calibrated noise so you can't reverse-engineer Zhang's salary by diffing "with Zhang vs. without Zhang." The key: privacy no longer rests on "I feel it's safe" but on a guarantee quantifiable by a number ε and accumulable as a budget—like the rate-limit quota (token bucket) you know: each query spends budget, and when it's gone you can't query more.
Pain point: traditional "anonymization" (drop names, drop IDs) keeps getting broken—a few side fields cross-referenced can re-identify a person. DP gives a harder definition: a randomized algorithm M satisfies ε-differential privacy iff for any two datasets D, D′ differing by one record and any output S:
Pr[M(D) ∈ S] ≤ eε · Pr[M(D′) ∈ S]
Intuition: whether or not your record is included, the output distribution barely changes (bounded by the factor eε). So an attacker seeing the result can't tell whether you were in the dataset—your presence is "drowned out." The smaller ε (the privacy budget), the more alike the two distributions, the stronger the privacy—but the more noise and lower the accuracy. This is an unavoidable dial between privacy and utility.
How to implement? The classic Laplace mechanism: add Laplace noise to the true answer, with magnitude = sensitivity / ε. Sensitivity = how much changing one record can shift the answer at most (a "count" query has sensitivity 1). Applied to deep learning, this is Abadi et al. 2016's DP-SGD: during training, first clip each sample's gradient (bounding single-sample influence = controlling sensitivity), then add Gaussian noise, and use the "moments accountant" to precisely tally the total ε spent across training.
import numpy as np # Laplace mechanism: a differentially private release of a "count" query def private_count(true_count, epsilon, sensitivity=1.0): # Count query: adding/removing one record shifts result by at most 1 scale = sensitivity / epsilon # noise magnitude ∝ 1/ε noise = np.random.laplace(loc=0, scale=scale) return true_count + noise # release the noised result true = 1000 # truly 1000 patients have some disease print(private_count(true, epsilon=0.1)) # strong privacy → big noise, maybe 987 or 1013 print(private_count(true, epsilon=2.0)) # weak privacy → small noise, ~1000.4 # For neural nets, the official lib Opacus (PyTorch) takes over in one line: # from opacus import PrivacyEngine # model, optim, loader = PrivacyEngine().make_private(...) # auto clip + noise
Ordinary encryption is like locking data in a safe—to use it you must unlock it, and at that instant the plaintext is exposed to the server. Homomorphic encryption (HE) is a "safe with gloves": you can operate inside the box directly, never opening it. In your world: the server can run SUM / dot product directly on encrypted database fields, returns ciphertext, and only you (holding the private key) can open it—while the server never saw a single plaintext.
Pain point: you want to use cloud compute or a third-party model on sensitive data (records, finances) but don't trust them. Uploading plaintext = leak; computing locally = no compute power. HE lets you "outsource the computation without handing over the data."
The math core is "preserving operation structure": the encryption function E satisfies E(a) ⊕ E(b) = E(a + b). That is, some operation ⊕ on ciphertext corresponds, after decryption, to addition on plaintext. A scheme supporting both ciphertext addition and multiplication, for arbitrary depth, is Fully Homomorphic Encryption (FHE)—first constructed by Craig Gentry in 2009 using ideal lattices, a decades-long cryptographic milestone.
Why was it impossible before? Each ciphertext operation accumulates noise, multiplication especially fast, and once noise exceeds a threshold it can no longer be decrypted. Gentry's breakthrough was bootstrapping: use homomorphic operations to "decrypt the ciphertext itself and re-encrypt," resetting the noise, enabling unlimited-depth computation. The cost is extreme slowness—the root reason HE isn't yet mainstream. In ML the more common scheme is CKKS (Cheon et al. 2017), supporting approximate real-number arithmetic, well suited to neural inference that tolerates small errors.
import tenseal as ts # OpenMined's HE library, wraps CKKS # 1) Build a context = generate keys + set CKKS parameters ctx = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree=8192, coeff_mod_bit_sizes=[60, 40, 40, 60]) ctx.global_scale = 2**40 ctx.generate_galois_keys() # 2) Encrypt two vectors (think: your private features) a = ts.ckks_vector(ctx, [1.0, 2.0, 3.0]) b = ts.ckks_vector(ctx, [4.0, 5.0, 6.0]) # 3) Dot product directly on ciphertext—server-side, never sees plaintext enc_dot = a.dot(b) # entirely ciphertext arithmetic print(enc_dot.decrypt()) # ≈ [32.0] = 1*4+2*5+3*6, approximate but accurate enough
Secure multi-party computation (MPC) solves a seemingly paradoxical problem: several parties jointly compute a shared result, yet none reveals its own input. Analogy to the distributed quorum you know: data is split into "secret shares" held by each party, a single share is meaningless, and only combined per protocol does it yield a result—except MPC goes further: what's combined is the "computed result," not the raw data itself. The classic primer is Yao's "Millionaires' Problem": two millionaires want to know who is richer without revealing how much each has.
Pain point: several hospitals want to jointly study a drug's efficacy, but none may see the others' records; several banks want joint fraud detection but are limited by competition and compliance. They need "collaborative computation, zero-trust sharing."
The easiest mechanism is additive secret sharing: split a secret number x into a sum of random numbers x = x₁ + x₂ + x₃ (mod p), each party getting just one xᵢ. Any single share is a uniform random number leaking zero information; but since addition can be done by each party on its own share independently, summing the result shares recovers the true value of x + y—and no one ever saw the raw x, y. Multiplication is harder (needs extra protocols like Beaver triples), but the idea is the same.
In federated learning the most important application is Bonawitz et al. 2017's Secure Aggregation: clients use pairwise masks to "additively share" their gradients with the server, which can only recover the sum of all gradients—never any single client's gradient—and it tolerates clients dropping out midway. This exactly fills the hole in concept ① ("sharing only gradients still leaks").
import random P = 2**31 - 1 # a large prime; all arithmetic is mod P def share(secret, n=3): # split into n shares: first n-1 random, last makes the sum = secret parts = [random.randrange(P) for _ in range(n - 1)] parts.append((secret - sum(parts)) % P) return parts # any single share is uniform random, leaks zero info # Two parties each split their secret, distribute to 3 compute nodes sx, sy = share(5), share(3) # Alice=5, Bob=3, not told to each other # Each node adds only the shares it holds (local, no communication) node_sums = [(a + b) % P for a, b in zip(sx, sy)] # Publicly merge node results → recover x+y, but 5 and 3 never exposed print(sum(node_sums) % P) # 8 ✓
They aren't mutually exclusive substitutes but building blocks protecting different layers; production systems usually stack them:
| Technique | Protects | Main cost | Trust assumption |
|---|---|---|---|
| Federated Learning | data not centralized (location) | communication, Non-IID | semi-honest server |
| Differential Privacy | individual unidentifiable (result) | accuracy drop | noise-adder trusted |
| Homomorphic Encryption | invisible during compute (process) | orders of magnitude slower | no need to trust compute |
| Secure MPC | each party's input secret (process) | many communication rounds | majority don't collude |
Typical stack: Federated learning sets the frame (data stays put) → Secure Aggregation / MPC hides single gradients from the server → Differential privacy noises the model against individual re-inference → add HE for the most sensitive slivers if needed. In one line: FL guards location, MPC/HE guard process, DP guards result.