prompt-pack-ai-system-data-governance-framework

Category: Design Risk: Medium risk ★ 3.9 · Rating 3.9/5 (8) sboghossian/mini-claude-for-legal MIT

Rating is derived from the repo's GitHub stars and shown for reference.

network_accessfilesystem_access

name: prompt-pack-ai-system-data-governance-framework
description: Use when drafting a data governance framework specifically for an organization's AI and machine learning systems, addressing training data requirements, bias mitigation, data minimization, purpose limitation, transparency obligations, individual rights in automated decision-making, and compliance with emerging AI regulations. Covers privacy and data protection across MENA and global jurisdictions.
license: MIT
metadata:
id: prompt-pack.ai-system-data-governance-framework
category: prompt-pack
practice_area: privacy-data-protection
priority: P2
intent: [drafting, ai-system-data-governance-framework]
related: [prompt-pack-ai-governance-policy, kb-data-protection-mena, draft-privacy-policy, heuristic-always-state-jurisdiction-first, prompt-pack-aml-kyc-policy]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"

AI System Data Governance Framework

When to use this

Use this skill when an organization needs a data governance framework specifically scoped to its AI/ML systems. This is a more technical and data-architecture-focused document than an AI Governance Policy — it addresses the data lifecycle for AI: how training data is sourced, curated, stored, used, and retired.

The distinction:

  • AI Governance Policy ([[prompt-pack-ai-governance-policy]]): principles, oversight, accountability, incident response — the "what we believe and who is responsible"
  • AI System Data Governance Framework (this skill): data sourcing, bias testing, data minimization, rights management, regulatory compliance — the "how we handle data throughout the AI lifecycle"

Triggers: data protection audit, AI deployment review, regulatory examination, investor due diligence on AI/ML pipeline.


Prompt template

Draft a data governance framework for [Company's] AI/ML systems addressing training data requirements, bias mitigation, data minimization, purpose limitation, transparency obligations, individual rights in automated decision-making, and compliance with emerging AI regulations.

Use [[conversation-clarifying-questions]] to elicit [bracketed] inputs before drafting.


Required inputs

Input Why it matters
Company name and type Sector-specific requirements (financial services, healthcare, HR tech have heightened obligations)
AI/ML use cases The framework must be calibrated to the actual data flows and decision types
Jurisdictions of operation Determines applicable data protection and AI regulations
Existing data governance infrastructure Framework should build on, not duplicate, existing policies

Document structure

1. Framework scope and objectives

  • Define which AI/ML systems are in scope (production models; experimental models; third-party AI tools used in production; models under development)
  • Define "personal data processed by AI systems" — includes training data, inference inputs, outputs that are used to make decisions about individuals
  • State compliance objectives: data protection law compliance; bias risk management; explainability; audit readiness

2. Training data governance

2.1 Sourcing requirements

  • Approved data sources: first-party data (collected with appropriate consent or legitimate interest); licensed third-party datasets; synthetic data; open datasets with reviewed licensing
  • Prohibited sources: unlicensed web scraping of personal data; data obtained without legal basis; data subject to country-specific transfer restrictions (UAE: data localization requirements in certain sectors; KSA: PDPL cross-border transfer rules; EG: PDL transfer restrictions)
  • Source documentation: every training dataset must have a "data card" recording source, collection date, legal basis, known biases, and limitations

2.2 Data minimization

  • Training data must be limited to what is necessary for the stated model purpose (purpose limitation under GDPR Art. 5, UAE PDPL, DIFC Data Protection Law)
  • Sensitive categories require explicit consent or alternative statutory grounds: health data, biometric data, financial data, religious/political belief data (if relevant to model)
  • Pseudonymization and anonymization: where possible, training data should be pseudonymized; truly anonymized data falls outside data protection law scope (note: re-identification risk must be assessed)

2.3 Data quality requirements

  • Completeness: is the dataset representative of the population the model will serve? Under-representation creates bias
  • Accuracy: what is the error rate in labels? What is the process for error correction?
  • Timeliness: how old is the data? Is it still representative?
  • Documentation: data quality assessments must be documented and retained

3. Bias mitigation

3.1 Pre-training assessment

Before training any model that makes or informs consequential decisions about individuals:

  • Identify sensitive attributes relevant to the use case (gender, nationality, age, disability status, religion — these are protected characteristics under most jurisdictions' anti-discrimination law)
  • Assess whether training data contains proxy variables for sensitive attributes (postcode as a proxy for ethnicity, for example)
  • Define fairness metrics appropriate to the use case (demographic parity, equalized odds, etc.)

3.2 Testing protocol

  • Disaggregated performance testing: model performance must be measured separately for each demographic group, not only in aggregate
  • Bias test threshold: define the maximum acceptable performance gap between groups before deployment is blocked
  • Red-teaming: adversarial testing for harmful outputs, particularly for generative AI systems

3.3 Post-deployment monitoring

  • Ongoing output monitoring for drift and emerging bias
  • Frequency: at minimum quarterly for high-risk models; annually for low-risk
  • Escalation: what triggers an immediate review (e.g., a user complaint, a regulatory inquiry, a media report)

4. Purpose limitation and data minimization at inference

  • Data collected during model inference (inputs) must be used only for the stated purpose
  • Retention of inference inputs and outputs: define retention period per use case; default to minimum necessary
  • Secondary use of inference data (e.g., using user queries to a chatbot to further train the model): requires legal basis analysis; user notification; in many cases, opt-in consent
  • Profiling: automated profiling of individuals based on AI outputs requires DPIA and, in many jurisdictions, a specific legal basis

5. Transparency obligations

5.1 Internal transparency — model documentation

Every production model must have a "model card" recording:

  • Purpose and intended use
  • Training data sources and known limitations
  • Performance metrics including disaggregated results
  • Known failure modes and edge cases
  • Human oversight requirements
  • Responsible owner

5.2 External transparency — individual notification

Individuals whose data is used in AI training or who are subject to AI-assisted decisions must be informed:

  • In the privacy notice: what AI systems exist; what decisions they inform; whether human review is available
  • At point of automated decision: notification that an automated process was used and how to request human review

6. Individual rights in automated decision-making

Per applicable data protection law:

  • Right to explanation: individuals have the right to an explanation of any automated decision that significantly affects them (GDPR Art. 22; UAE PDPL Art. 30; DIFC Data Protection Law s. 22)
  • Right to human review: individuals may request human review of automated decisions
  • Right to object: individuals may object to processing their data for automated profiling

Framework obligations:

  • Implement a process to receive and respond to these rights requests within statutory timeframes (UAE PDPL: 5 business days; GDPR: one month)
  • Maintain records of automated decisions sufficient to support explanation requests
  • Train customer-facing staff on how to handle these requests

7. Regulatory compliance mapping

Regulation Key requirements for AI data governance
UAE Federal Decree-Law 45/2021 (PDPL) Automated decision-making disclosure; cross-border transfer rules; data residency in some sectors
Saudi PDPL (2021) Consent for automated processing affecting individuals; data localization
DIFC Data Protection Law (2020) GDPR-aligned; automated decision-making rights; DPO appointment requirements
ADGM Data Protection Regulations (2021) Similar to DIFC
EU AI Act (2024) Risk classification; conformity assessment for high-risk AI; prohibited AI practices
GDPR Art. 22 automated decision-making; Art. 25 privacy by design; DPIA for high-risk processing

8. Model retirement and data deletion

  • When a model is retired, document the retirement decision and date
  • Training data used exclusively for that model: delete per retention schedule
  • Model weights: archive for the statutory limitation period to support any future claims or investigations, then delete
  • Inference logs: delete per retention schedule documented in the model card

Jurisdictional notes on cross-border data transfers for AI training

Several MENA jurisdictions impose data localization or transfer restrictions that directly affect AI training data pipelines:

  • UAE: certain sectors (financial services, healthcare, government) have data residency requirements; general PDPL requires "adequate protection" in destination country for transfers
  • KSA: PDPL requires data about Saudi residents to be processed within the Kingdom for sensitive categories; transfer outside requires consent or adequate-protection finding
  • Egypt: PDL (Law 151/2020) restricts cross-border transfer of personal data; licensing required for cross-border processing in some cases

Cloud-based AI training that transfers data to servers in non-MENA jurisdictions must be assessed against these rules.


Common mistakes

  • No distinction between training data governance and inference data governance — they have different risk profiles and different regulatory treatment
  • Assuming anonymized training data has zero regulatory risk — re-identification risk means this assumption must be tested
  • No model card or data card documentation — creates audit and explainability gaps
  • Rights request process not operationalized — having a right in a policy that no one knows how to execute is non-compliance
  • Framework not updated when new AI use cases are deployed — annual review and change-triggered reviews are essential

  • [[prompt-pack-ai-governance-policy]] — the companion governance policy (principles, oversight, accountability)
  • [[kb-data-protection-mena]] — MENA data protection law reference
  • [[draft-privacy-policy]] — privacy notice obligations that must reference AI processing
  • [[heuristic-always-state-jurisdiction-first]] — jurisdiction-first drafting