eval-dataset-nda-prompts-30

Category: Coding Risk: Medium risk ★ 3.9 · Rating 3.9/5 (8) sboghossian/mini-claude-for-legal MIT

Rating is derived from the repo's GitHub stars and shown for reference.

network_access

name: eval-dataset-nda-prompts-30
description: Use when running the NDA benchmark that tests drafting, review, intake, and edge-case handling across LB/KSA/UAE/DIFC/FR/UK. Contains 30 prompts covering mutual and unilateral NDAs, bilingual AR/EN side-by-side, multi-party structures, and adversarial edge cases. Primary benchmark for confidentiality-related AI capabilities.
license: MIT
metadata:
id: eval.dataset.NDA-prompts-30
category: eval
priority: P0
intent: [eval, nda, benchmark, dataset, mena, drafting]
related: [eval-benchmark-runner, eval-dataset-employment-prompts-30, eval-regression-detector, eval-rubric-legal-soundness, eval-rubric-citation-quality, eval-rubric-jurisdiction-awareness, eval-rubric-completeness]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"

Eval Dataset — NDA Prompts (30)

Scope

30 NDA-related prompts spanning drafting, review, intake clarification, and edge cases. NDA drafting is the single highest-volume legal AI request globally — it is the entry point for most users of legal AI tools. Quality on this dataset directly correlates with first-impression retention.

Storage: eval/datasets/NDA-prompts-30.jsonl

Format: one JSON object per line:

{
  "id": "nda-001",
  "prompt": "...",
  "category": "standard_draft",
  "jurisdiction": "UAE",
  "expected_signals": ["mutual", "confidential_info_defined", "governing_law_uae", "dispute_resolution"]
}

How to use this pack

  1. Run all 30 prompts against the deployed model.
  2. Score each output against [[eval-rubric-legal-soundness]] + [[eval-rubric-citation-quality]] + [[eval-rubric-jurisdiction-awareness]] + [[eval-rubric-completeness]].
  3. Aggregate scores; track week-over-week trend in [[eval-regression-detector]].
  4. Flag any output where expected_signals are missing — even if the rubric score is acceptable, missing a governing-law clause is a structural gap.

Prompt categories

Category 1 — Standard draft (~6 prompts)

Drafting requests across jurisdictions for basic NDA types:

# Type Jurisdiction Key expected signals
1 Mutual NDA UAE onshore UAE Civil Transactions Law governing; Arabic available; 2-year term standard
2 Unilateral NDA KSA Saudi governing law; Shariah compliance note; Arabic version noted
3 Mutual NDA DIFC DIFC Contract Law; English language; common-law drafting style
4 Mutual NDA Lebanon Lebanese Code of Obligations; French or Arabic version option
5 Mutual NDA France Code civil; French governing law; RGPD data protection cross-reference
6 Mutual NDA UK English law; PECR note if digital communications involved

Category 2 — Review (~5 prompts)

Paste a draft NDA; ask for redlines or risk identification:

# Scenario
7 NDA with overly narrow definition of Confidential Information — model should flag
8 NDA that lacks a governing law clause — model should flag as critical gap
9 NDA with a 10-year term — model should flag as potentially unenforceable in civil law jurisdictions
10 NDA with a compelled disclosure clause — model should check it correctly handles court orders
11 NDA missing a return/destroy clause — model should flag

Category 3 — Intake / clarification (~5 prompts)

Ambiguous requests where the model should ask clarifying questions rather than draft:

# Ambiguous input
12 "I need an NDA." (no jurisdiction, no parties, no type specified)
13 "Draft an NDA for a tech deal." (insufficient — which jurisdiction? mutual or one-way?)
14 "NDA between my company and a Saudi partner." (type unclear; should ask mutual vs unilateral)
15 "I need an NDA urgently, can you just make a quick one?" (prompt for minimum viable info)
16 Arabic-language ambiguous request: "أريد NDA" (respond in Arabic, ask clarifiers in Arabic)

Expected behavior: Ask for jurisdiction, party types, NDA type (mutual/unilateral), and confidential information scope before drafting.

Category 4 — Edge cases (~5 prompts)

# Edge case Expected handling
17 "Draft an NDA with confidential information defined as 'everything'" Flag as overly broad; suggest standard scope with carve-outs
18 "Draft an NDA with a 99-year term" Flag as potentially unenforceable; suggest 2–5 years with auto-renewal
19 "Draft an NDA with no governing law — I want it to be internationally neutral" Explain why governing law is necessary; offer alternatives (ICC arbitration, DIFC as neutral)
20 "I want an NDA that says we own any ideas the other party shares with us" Flag: an NDA is a confidentiality instrument, not an IP assignment — suggest adding a separate IP clause or using an NDA + IP assignment
21 "Make an NDA that's enforceable in both the UAE and the US simultaneously" Multi-jurisdiction enforceability explanation; suggest appropriate governing law strategy

Category 5 — Bilingual AR/EN (~4 prompts)

# Request
22 "Draft a mutual NDA in Arabic and English, side by side. Arabic controls."
23 Arabic-only prompt: "أعدّ اتفاقية سرية بالعربي والإنجليزي."
24 "Translate this English NDA clause into formal Arabic."
25 "Is the Arabic version of this NDA consistent with the English version? Identify discrepancies."

Category 6 — Multi-party / consortium (~5 prompts)

# Scenario
26 Three-party mutual NDA (startup, investor, technology partner) under UAE law
27 Consortium NDA for a KSA government tender — 5 parties
28 "How should we structure an NDA for a joint venture where one party is a UAE company and one is a Saudi company?"
29 Multi-jurisdictional NDA with carve-out provisions per jurisdiction
30 NDA renewal and amendment — add a new party to an existing two-party NDA

Scoring targets

Category Legal soundness target Jurisdiction awareness target Completeness target
Standard draft ≥ 4.0 ≥ 4.0 ≥ 4.0
Review ≥ 3.5 ≥ 3.5 ≥ 3.5
Intake N/A (evaluate on asking clarifiers) N/A N/A
Edge cases ≥ 3.5 ≥ 3.5
Bilingual ≥ 3.5 ≥ 3.5 ≥ 3.5
Multi-party ≥ 3.5 ≥ 3.5 ≥ 3.0

Jurisdictional notes for graders

  • UAE onshore: UAE Civil Transactions Law (Federal Law No. 5 of 1985) and Federal Decree-Law No. 4 of 2022 on Commercial Transactions govern. No statutory definition of "confidentiality agreement" — governed by general contract principles.
  • DIFC: DIFC Contract Law (DIFC Law No. 6 of 2004) applies; common-law interpretation; English is the operative language.
  • KSA: Saudi law is Shariah-based; commercial confidentiality enforced through general principles; Arabic is required for Saudi court proceedings.
  • Lebanon: Code des Obligations et des Contrats (Code of Obligations and Contracts, 1932) governs; both French and Arabic are official court languages.
  • France: Code civil (particularly obligations law post-2016 reform). RGPD applies to personal data provisions.

Caveats & currency

Review the dataset annually. DIFC legislation updates regularly (check DIFC Laws portal); UAE Commercial Transactions Law amendments should trigger a dataset review.

  • [[eval-benchmark-runner]] — orchestrates this dataset in the full eval pipeline
  • [[eval-rubric-legal-soundness]] — primary scoring rubric
  • [[eval-rubric-jurisdiction-awareness]] — jurisdiction accuracy scoring
  • [[eval-rubric-completeness]] — structural completeness check
  • [[eval-regression-detector]] — week-over-week trend tracking