← Blog
Privacy Engineering April 17, 2026 · 12 min read

AI Privacy by Design: Building GDPR Article 25 Compliance into AI Systems

Privacy by Design is not a best practice — it is a legal requirement under GDPR Article 25. For AI systems, it means designing data minimisation, purpose limitation, and privacy defaults into the architecture before a single line of training code is written.

What GDPR Article 25 Actually Requires

GDPR Article 25 imposes two distinct obligations on data controllers:

Data Protection by Design (Art.25(1))

At the time of determining the means of processing and at the time of the processing itself, implement appropriate technical and organisational measures designed to implement data-protection principles in an effective manner.

For AI: privacy decisions must be made during system design — not patched in after deployment.

Data Protection by Default (Art.25(2))

Implement appropriate technical and organisational measures to ensure that, by default, only personal data which are necessary for each specific purpose of the processing are processed.

For AI: the most privacy-protective configuration must be the default — users should not have to opt out of data collection.

Enforcement reality

The Belgian DPA fined a company €50,000 for Art.25 violations in an AI-driven advertising system. The EDPB issued guidelines on Art.25 in 2020 confirming it applies to automated processing including AI. Privacy by Design is not aspirational — regulators audit for it.

The Seven Foundational Principles — Applied to AI

Privacy by Design was codified by Ann Cavoukian into seven foundational principles, later incorporated into GDPR Art.25. Here is what each means specifically for AI systems:

1

Proactive, not reactive

Anticipate and prevent privacy-invasive events before they occur. For AI: conduct a DPIA before model training begins, not after deployment.

Implementation action

Run DPIA at project kick-off, before any personal data collection or model training.

2

Privacy as the default

Personal data must be automatically protected — users should not have to take action. For AI: the strictest privacy setting must be the out-of-the-box configuration.

Implementation action

Opt-in to data sharing, not opt-out. Disable inference on sensitive attributes by default.

3

Privacy embedded into design

Privacy is not a bolt-on — it is integrated into system architecture and business practices from the start. For AI: privacy constraints shape model architecture, not vice versa.

Implementation action

Include DPO or privacy counsel in ML sprint planning. Reject architectures that require unnecessary personal data.

4

Full functionality — no false trade-offs

Privacy and functionality are not zero-sum. For AI: a recommendation engine can work with anonymised interaction data; real names are not needed.

Implementation action

Challenge every personal data input. Ask: "Could this feature work with pseudonymised or synthetic data?"

5

End-to-end security

Full lifecycle protection of personal data. For AI: data security from collection through training, inference, output generation, and deletion.

Implementation action

Encrypt training data at rest. Delete raw personal data after feature extraction. Audit inference logs.

6

Visibility and transparency

Components and operations must be visible and verifiable. For AI: model cards, data lineage documentation, and explainability for high-risk decisions.

Implementation action

Document training data sources, preprocessing steps, and model limitations. Publish model card for customer-facing AI.

7

Respect for user privacy

Strong privacy defaults, clear notices, and user-centric options. For AI: meaningful disclosures when AI makes decisions affecting individuals.

Implementation action

Provide plain-English explanations of AI decisions. Honour access, rectification, and erasure requests in model outputs.

AI-Specific Privacy Measures by Category

Privacy by Design requires "state of the art" measures — what is technically feasible and economically proportionate. For AI systems in 2026, these measures are all standard practice:

Data Minimisation

  • Train on the minimum personal data needed — challenge every feature during data engineering
  • Use synthetic data for development and testing; reserve real personal data for production training only
  • Apply differential privacy techniques to add statistical noise while preserving model utility
  • Implement federated learning where the model trains on device — raw data never leaves the user
  • Remove direct identifiers (name, email, ID) from training datasets; use pseudonyms or tokens

Purpose Limitation

  • Document the specific purpose for each personal data input in the AI system record
  • Technically prevent the model from being queried in ways incompatible with the documented purpose
  • Separate models for different purposes — a churn-prediction model should not also power marketing targeting
  • Implement query-level logging to detect purpose-incompatible use of AI outputs
  • Review purpose compatibility whenever the AI system is retrained or extended

Storage Limitation

  • Define and enforce retention periods for all personal data in the AI pipeline
  • Raw personal data used for training: delete after feature extraction or model training is complete
  • AI inference logs containing personal data: apply GDPR retention limits, not indefinite storage
  • Model weights may encode personal data — establish a process for model updates when data deletion is requested
  • Automated deletion pipelines: do not rely on manual processes for data lifecycle management

Access Controls

  • Role-based access to training datasets — not everyone who builds the model needs full access
  • Audit logs for all access to personal data in the AI pipeline
  • Separate production AI inference systems from development/testing environments
  • API-level controls: the AI model API should authenticate callers and log queries
  • Prevent bulk extraction of training data through the model API (membership inference attacks)

The Right to Erasure Problem in AI

GDPR Article 17 gives individuals the right to have their personal data erased. For traditional databases, this means deleting a row. For AI systems, it is far more complex.

If a person's data was included in training, their information is encoded in the model weights — not in a deletable record. Privacy by Design addresses this upstream: if you design the system to avoid including personal data in training, erasure requests are manageable.

Avoid training on personal data

Ideal

Use synthetic data, public datasets, or anonymised data for training. Erasure requests become straightforward — the person's data was never in the model.

Pseudonymisation + deletion log

Practical

Train on pseudonymised data. Maintain a mapping table. On erasure request, delete the mapping — the pseudonym in the model is now meaningless.

Model retraining policy

Acceptable

Document that the model will be retrained on a schedule (e.g., quarterly), and that retrained models exclude data for subjects who have exercised erasure rights.

Machine unlearning

Emerging

Emerging technical approach to surgically remove the influence of specific training data from model weights. Available in research settings but not yet production-ready at scale.

GDPR Article 25 Compliance Checklist for AI Systems

DPIA completed before processing personal data for AI training or inference

Data minimisation documented: what personal data, why it is necessary, what was excluded

Purpose documented in AI system record and technically enforced

Privacy-by-default configured: most restrictive settings are the default

Retention periods defined and automated deletion in place

Pseudonymisation applied where re-identification is not required for AI function

Access controls implemented for all personal data in AI pipeline

Transparency measures in place: users know when AI is making decisions affecting them

Right to erasure process exists for training data and model inference logs

Third-party DPAs in place for all AI vendors processing personal data

Privacy by Design Under the EU AI Act

The EU AI Act reinforces Privacy by Design for high-risk AI systems. Article 10 requires that training data be subject to appropriate data governance practices — which the EDPB interprets as including Art.25 compliance.

EU AI Act Art.10

Data governance for training data — minimisation, quality, bias examination

EU AI Act Art.13

Transparency for high-risk AI — what personal data is used and how

GDPR Art.25

Privacy by Design and by Default at architecture stage

GDPR Art.35

DPIA required when Art.25 risks cannot be mitigated by design alone

Document Your AI Privacy Architecture

ComplianceIQ helps you document Privacy by Design decisions, run DPIAs, and track Art.25 compliance across all AI systems — with evidence collection for auditors.

Run a Free AI Risk Assessment