prompt-pack-ai-system-data-governance-framework
Rating is derived from the repo's GitHub stars and shown for reference.
name: prompt-pack-ai-system-data-governance-framework
description: Use when drafting a data governance framework specifically for an organization's AI and machine learning systems, addressing training data requirements, bias mitigation, data minimization, purpose limitation, transparency obligations, individual rights in automated decision-making, and compliance with emerging AI regulations. Covers privacy and data protection across MENA and global jurisdictions.
license: MIT
metadata:
id: prompt-pack.ai-system-data-governance-framework
category: prompt-pack
practice_area: privacy-data-protection
priority: P2
intent: [drafting, ai-system-data-governance-framework]
related: [prompt-pack-ai-governance-policy, kb-data-protection-mena, draft-privacy-policy, heuristic-always-state-jurisdiction-first, prompt-pack-aml-kyc-policy]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"
AI System Data Governance Framework
When to use this
Use this skill when an organization needs a data governance framework specifically scoped to its AI/ML systems. This is a more technical and data-architecture-focused document than an AI Governance Policy — it addresses the data lifecycle for AI: how training data is sourced, curated, stored, used, and retired.
The distinction:
- AI Governance Policy ([[prompt-pack-ai-governance-policy]]): principles, oversight, accountability, incident response — the "what we believe and who is responsible"
- AI System Data Governance Framework (this skill): data sourcing, bias testing, data minimization, rights management, regulatory compliance — the "how we handle data throughout the AI lifecycle"
Triggers: data protection audit, AI deployment review, regulatory examination, investor due diligence on AI/ML pipeline.
Prompt template
Draft a data governance framework for [Company's] AI/ML systems addressing training data requirements, bias mitigation, data minimization, purpose limitation, transparency obligations, individual rights in automated decision-making, and compliance with emerging AI regulations.
Use [[conversation-clarifying-questions]] to elicit [bracketed] inputs before drafting.
Required inputs
| Input | Why it matters |
|---|---|
| Company name and type | Sector-specific requirements (financial services, healthcare, HR tech have heightened obligations) |
| AI/ML use cases | The framework must be calibrated to the actual data flows and decision types |
| Jurisdictions of operation | Determines applicable data protection and AI regulations |
| Existing data governance infrastructure | Framework should build on, not duplicate, existing policies |
Document structure
1. Framework scope and objectives
- Define which AI/ML systems are in scope (production models; experimental models; third-party AI tools used in production; models under development)
- Define "personal data processed by AI systems" — includes training data, inference inputs, outputs that are used to make decisions about individuals
- State compliance objectives: data protection law compliance; bias risk management; explainability; audit readiness
2. Training data governance
2.1 Sourcing requirements
- Approved data sources: first-party data (collected with appropriate consent or legitimate interest); licensed third-party datasets; synthetic data; open datasets with reviewed licensing
- Prohibited sources: unlicensed web scraping of personal data; data obtained without legal basis; data subject to country-specific transfer restrictions (UAE: data localization requirements in certain sectors; KSA: PDPL cross-border transfer rules; EG: PDL transfer restrictions)
- Source documentation: every training dataset must have a "data card" recording source, collection date, legal basis, known biases, and limitations
2.2 Data minimization
- Training data must be limited to what is necessary for the stated model purpose (purpose limitation under GDPR Art. 5, UAE PDPL, DIFC Data Protection Law)
- Sensitive categories require explicit consent or alternative statutory grounds: health data, biometric data, financial data, religious/political belief data (if relevant to model)
- Pseudonymization and anonymization: where possible, training data should be pseudonymized; truly anonymized data falls outside data protection law scope (note: re-identification risk must be assessed)
2.3 Data quality requirements
- Completeness: is the dataset representative of the population the model will serve? Under-representation creates bias
- Accuracy: what is the error rate in labels? What is the process for error correction?
- Timeliness: how old is the data? Is it still representative?
- Documentation: data quality assessments must be documented and retained
3. Bias mitigation
3.1 Pre-training assessment
Before training any model that makes or informs consequential decisions about individuals:
- Identify sensitive attributes relevant to the use case (gender, nationality, age, disability status, religion — these are protected characteristics under most jurisdictions' anti-discrimination law)
- Assess whether training data contains proxy variables for sensitive attributes (postcode as a proxy for ethnicity, for example)
- Define fairness metrics appropriate to the use case (demographic parity, equalized odds, etc.)
3.2 Testing protocol
- Disaggregated performance testing: model performance must be measured separately for each demographic group, not only in aggregate
- Bias test threshold: define the maximum acceptable performance gap between groups before deployment is blocked
- Red-teaming: adversarial testing for harmful outputs, particularly for generative AI systems
3.3 Post-deployment monitoring
- Ongoing output monitoring for drift and emerging bias
- Frequency: at minimum quarterly for high-risk models; annually for low-risk
- Escalation: what triggers an immediate review (e.g., a user complaint, a regulatory inquiry, a media report)
4. Purpose limitation and data minimization at inference
- Data collected during model inference (inputs) must be used only for the stated purpose
- Retention of inference inputs and outputs: define retention period per use case; default to minimum necessary
- Secondary use of inference data (e.g., using user queries to a chatbot to further train the model): requires legal basis analysis; user notification; in many cases, opt-in consent
- Profiling: automated profiling of individuals based on AI outputs requires DPIA and, in many jurisdictions, a specific legal basis
5. Transparency obligations
5.1 Internal transparency — model documentation
Every production model must have a "model card" recording:
- Purpose and intended use
- Training data sources and known limitations
- Performance metrics including disaggregated results
- Known failure modes and edge cases
- Human oversight requirements
- Responsible owner
5.2 External transparency — individual notification
Individuals whose data is used in AI training or who are subject to AI-assisted decisions must be informed:
- In the privacy notice: what AI systems exist; what decisions they inform; whether human review is available
- At point of automated decision: notification that an automated process was used and how to request human review
6. Individual rights in automated decision-making
Per applicable data protection law:
- Right to explanation: individuals have the right to an explanation of any automated decision that significantly affects them (GDPR Art. 22; UAE PDPL Art. 30; DIFC Data Protection Law s. 22)
- Right to human review: individuals may request human review of automated decisions
- Right to object: individuals may object to processing their data for automated profiling
Framework obligations:
- Implement a process to receive and respond to these rights requests within statutory timeframes (UAE PDPL: 5 business days; GDPR: one month)
- Maintain records of automated decisions sufficient to support explanation requests
- Train customer-facing staff on how to handle these requests
7. Regulatory compliance mapping
| Regulation | Key requirements for AI data governance |
|---|---|
| UAE Federal Decree-Law 45/2021 (PDPL) | Automated decision-making disclosure; cross-border transfer rules; data residency in some sectors |
| Saudi PDPL (2021) | Consent for automated processing affecting individuals; data localization |
| DIFC Data Protection Law (2020) | GDPR-aligned; automated decision-making rights; DPO appointment requirements |
| ADGM Data Protection Regulations (2021) | Similar to DIFC |
| EU AI Act (2024) | Risk classification; conformity assessment for high-risk AI; prohibited AI practices |
| GDPR | Art. 22 automated decision-making; Art. 25 privacy by design; DPIA for high-risk processing |
8. Model retirement and data deletion
- When a model is retired, document the retirement decision and date
- Training data used exclusively for that model: delete per retention schedule
- Model weights: archive for the statutory limitation period to support any future claims or investigations, then delete
- Inference logs: delete per retention schedule documented in the model card
Jurisdictional notes on cross-border data transfers for AI training
Several MENA jurisdictions impose data localization or transfer restrictions that directly affect AI training data pipelines:
- UAE: certain sectors (financial services, healthcare, government) have data residency requirements; general PDPL requires "adequate protection" in destination country for transfers
- KSA: PDPL requires data about Saudi residents to be processed within the Kingdom for sensitive categories; transfer outside requires consent or adequate-protection finding
- Egypt: PDL (Law 151/2020) restricts cross-border transfer of personal data; licensing required for cross-border processing in some cases
Cloud-based AI training that transfers data to servers in non-MENA jurisdictions must be assessed against these rules.
Common mistakes
- No distinction between training data governance and inference data governance — they have different risk profiles and different regulatory treatment
- Assuming anonymized training data has zero regulatory risk — re-identification risk means this assumption must be tested
- No model card or data card documentation — creates audit and explainability gaps
- Rights request process not operationalized — having a right in a policy that no one knows how to execute is non-compliance
- Framework not updated when new AI use cases are deployed — annual review and change-triggered reviews are essential
Related skills
- [[prompt-pack-ai-governance-policy]] — the companion governance policy (principles, oversight, accountability)
- [[kb-data-protection-mena]] — MENA data protection law reference
- [[draft-privacy-policy]] — privacy notice obligations that must reference AI processing
- [[heuristic-always-state-jurisdiction-first]] — jurisdiction-first drafting