AI Data Governance: Building a Practical Framework
EU AI Act Article 10 imposes specific data governance requirements on high-risk AI systems. GDPR data quality, minimisation, and erasure obligations also apply. Here is how to build a data governance framework that satisfies both.
Why AI Needs Dedicated Data Governance
Standard data governance frameworks were not designed for AI. Three characteristics of AI systems create new governance challenges:
Scale and complexity
AI training datasets may contain billions of records, from hundreds of sources, with complex lineage. Standard ROPA records do not capture this level of detail.
Data as code
In AI, the training data is inseparable from the model output. Biases in data become biases in decisions. Data governance is therefore AI safety governance.
Erasure is not deletion
Deleting data from a training set does not remove it from a trained model's weights. The right to erasure takes on a new dimension that standard governance frameworks have not addressed.
The Five Pillars of AI Data Governance
Data inventory and classification
Know what data you have, where it comes from, and what it is used for in AI systems.
- AI system data register: for each AI system, document all data inputs and their sources
- Data classification: personal data / sensitive personal data / anonymised / synthetic / public
- Data lineage: where did training data come from? Is it appropriately licensed?
- Third-party data: what data from vendors, data brokers, or partners enters AI systems?
EU AI Act: Article 10(3): Training data must be documented including data sources, characteristics, and limitations.
GDPR: Article 30 (Records of Processing): AI data processing should be reflected in your ROPA.
Data quality requirements
EU AI Act Article 10 requires specific data quality practices for high-risk AI systems.
- Relevance: training data must be relevant and representative for the intended purpose
- Sufficiency: training data must be sufficient in volume for the intended use
- Freedom from errors: appropriate data cleaning and validation before use
- Completeness: known data gaps must be identified and documented
- Bias examination: examination for possible biases that could lead to discrimination
EU AI Act: Article 10(2)–10(4): Mandatory for high-risk AI — data quality, bias examination, and documentation.
GDPR: Article 5(1)(d) accuracy principle — data must be accurate and kept up to date.
Data minimisation for AI
AI systems tend to consume more data than they need. Governance requires active minimisation.
- Feature selection policy: document why each data feature is necessary for the AI model's purpose
- Training data pruning: remove data that is not required once the model is trained
- Synthetic data evaluation: where synthetic data can replace real personal data, prefer it
- Aggregation vs individual-level data: use aggregated data where individual-level is not necessary
EU AI Act: Article 10(6): Only personal data that is strictly necessary may be used for high-risk AI training (narrow exception).
GDPR: Article 5(1)(c) data minimisation principle.
Retention and deletion
AI creates new retention challenges: model weights encode patterns from training data, and deletion from training data does not remove it from model weights.
- Training data retention schedule: define how long training data is kept
- Model retraining schedule: when do you retrain to incorporate data deletions?
- Right to erasure policy for AI: what is your response when a data subject requests deletion of data used in AI training?
- Model versioning: keep records of which model version used which data vintage
EU AI Act: Article 10(5): High-risk AI must process personal data used for bias monitoring with Article 9 safeguards.
GDPR: Article 5(1)(e) storage limitation. Article 17 right to erasure creates challenges for AI training data.
Access controls and governance
Who can access AI training data, modify it, or use it for new purposes.
- Least-privilege access to AI training datasets
- Audit logs for data access to AI training environments
- Separation of duties: model training team vs data governance team
- Approval process for using new data sources in AI systems
EU AI Act: Article 10 and Article 9 quality management system.
GDPR: Articles 25, 32: privacy by design and security of processing.
The Right to Erasure Challenge in AI
The interaction between GDPR Article 17 (right to erasure) and AI model training is one of the most difficult unsolved compliance problems. Here are the two main issues and how organisations are addressing them:
Training data deletion
Problem: Individual requests erasure of their data. You can delete it from your dataset — but the model has already been trained on it.
Current approaches:
- •Model retraining without the data (expensive, not always feasible)
- •Machine unlearning techniques (emerging, not yet reliable at scale)
- •Documenting that deletion from training set occurred, and scheduling next model retrain
Inference memorisation
Problem: Large language models can memorise and reproduce training data verbatim. If your model has memorised someone's personal data, deletion from training set doesn't remove it from model weights.
Current approaches:
- •Differential privacy techniques during training to reduce memorisation
- •Post-training testing for memorisation of sensitive data
- •Privacy-preserving fine-tuning methods
EU AI Act Article 10: Bias Examination Requirement
For high-risk AI systems, Article 10(2)(f) requires that training data be "examined in view of possible biases, in particular as regards persons or groups of persons on which the high-risk AI system is to be used." This means:
Governance Structure: Who Owns What
Data Governance Lead / DPO
Overall framework, GDPR intersection, ROPA updates, erasure response process
ML Engineering / Data Science
Training data pipelines, quality checks, bias examination, feature selection documentation
Legal / Compliance
Data licensing, copyright compliance for training data, regulatory change monitoring
IT / Security
Access controls, audit logs, data encryption, infrastructure security
Product Management
Use case documentation, change requests for new data sources, feature retirement
Track Article 10 data governance compliance
ComplianceIQ tracks EU AI Act Article 10 requirements for each AI system in your inventory, including bias examination status, data source documentation, and quality check records.
Start free