import-xlsx-processing-anthropic

Category: Coding Risk: High risk ★ 3.9 · Rating 3.9/5 (8) sboghossian/mini-claude-for-legal MIT

Rating is derived from the repo's GitHub stars and shown for reference.

network_accessfilesystem_accesscredential_accessautomation_control

name: import-xlsx-processing-anthropic
description: Use when ingesting legal data, clause libraries, matter lists, contract schedules, or bulk templates stored in Excel (.xlsx) format into the Louis/mini-claude-for-legal data model via the Anthropic Claude API. Handles column detection, header normalization, multi-sheet parsing, and mapping of spreadsheet rows to SKILL.md or structured JSON entries, with a mandatory dry-run preview before any write. Relevant any time a law firm or legal ops team wants to migrate Excel-based legal data into the open-source skill ecosystem.
license: MIT
metadata:
id: import.xlsx-processing-anthropic
category: import
jurisdictions: [multi]
priority: P3
intent: [import, xlsx, excel, spreadsheet, bulk-import, anthropic]
related: [import-vscode-extension-builder-lawvable, connector-document-upload, review-contract-batch]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"

Import — XLSX Processing via Anthropic

What it does

This import connector reads .xlsx workbooks — a ubiquitous format in legal operations for matter lists, contract registers, clause libraries, fee schedules, and compliance matrices — and maps their contents into the Louis data model using the Anthropic Claude API for intelligent column interpretation and content normalization.

The connector goes beyond dumb CSV parsing: it uses Claude to infer column semantics (e.g., "Effective Date" vs "Commencement Date" → same field), detect jurisdiction references from free-text cells, and generate draft SKILL.md bodies from clause-text columns. Every run produces a dry-run preview before writing anything.


When to use this

  • A law firm exports its standard clause library from a spreadsheet tracker into Louis skills.
  • A legal ops team imports a contract register (columns: counterparty, jurisdiction, expiry, key obligations) to feed [[review-contract-batch]].
  • Bulk onboarding of fee schedules, compliance checklists, or regulatory thresholds maintained in Excel.
  • One-time migration of legacy matter data from Excel into the skills store.
  • Periodic sync (e.g., updated rate cards, new clauses added by the firm's know-how team).

Setup / auth

Requirement Detail
Anthropic API key Required; set as ANTHROPIC_API_KEY env var
Model claude-3-5-sonnet or claude-3-haiku (haiku for cost-sensitive bulk runs)
xlsx library Node.js xlsx / exceljs, or Python openpyxl / pandas depending on runtime
Target directory Writable path within skills/ hierarchy or a structured JSON store

Use a least-privilege Anthropic key scoped to read-only if the import pipeline is automated.


Capabilities

  • Multi-sheet workbook parsing (each sheet can map to a different target schema)
  • Intelligent column header normalization via Claude (handles synonyms, abbreviations, mixed languages including Arabic headers)
  • Row-to-skill mapping: each data row can become a skill, a matter entry, or a structured JSON record
  • Variable extraction: detects {{placeholder}} patterns in clause-text cells
  • Jurisdiction detection: infers jurisdiction from free-text (e.g., "DIFC", "KSA", "الإمارات")
  • Dry-run diff: shows new records, updates, conflicts, and unmapped columns before committing
  • PII detection: flags cells containing email addresses, phone numbers, or ID numbers

Usage patterns

Basic dry-run on a clause library

import source="xlsx" file="firm-clause-library-2024.xlsx" schema="skill" dry_run=true

Output: per-row mapping table, unmapped column warnings, proposed skill names and target paths.

Confirmed bulk import

import source="xlsx" file="clause-library.xlsx" schema="skill" dry_run=false overwrite=false

Contract register import (structured JSON, not SKILL.md)

import source="xlsx" file="contract-register.xlsx" schema="contract_record" target="data/contracts/" dry_run=true

Multi-sheet workbook

import source="xlsx" file="legal-ops-master.xlsx" sheet_map='{"Clauses": "skill", "Matters": "matter_record", "Rates": "fee_schedule"}' dry_run=true

Column mapping reference

The importer uses Claude to resolve column semantics. Common mappings:

Excel column (examples) Louis field Notes
Title, Name, Clause Name name Hyphenated on write
Jurisdiction, Country, Governing Law jurisdictions ISO-normalized
Practice Area, Department, Type practice_area Mapped to Louis taxonomy
Clause Text, Body, Content SKILL.md body Claude structures into sections
Tags, Keywords, Labels intent Comma-split; __import__ appended
Effective Date, Start Date effective_date ISO 8601 normalized
Expiry, Renewal Date expiry_date ISO 8601; flags expired
Notes, Comments, Drafting Notes ## Drafting standards Preserved verbatim

Unmapped columns are logged as warnings; the run proceeds unless strict_mode=true.


Claude API integration detail

The importer sends column headers + a sample of up to 5 rows to Claude with a structured prompt requesting:

  1. Column-to-field mapping JSON
  2. Confidence score per mapping (low-confidence columns flagged in dry-run)
  3. Detected language(s) in the data
  4. PII risk flags

Claude's response is parsed and applied to the full workbook locally — full row data is not sent to the API by default. Set send_full_data=true only for small workbooks (<500 rows) where per-row enrichment is desired.


Permissions & safety

  • Dry-run is mandatory before any write. The system blocks confirmed writes if dry-run has not completed in the same session.
  • PII scrubbing: cells flagged as containing personal data are masked in the imported output and logged separately. Do not import client-identifying matter data into public skill repositories.
  • API cost control: default model is claude-haiku for column-mapping inference. Override with model=claude-sonnet for more complex schema resolution.
  • No auto-overwrite: overwrite=false by default. Existing skills are skipped; set overwrite=true only after reviewing the dry-run diff.

Failure modes

Failure Cause Resolution
column_mapping_confidence_low Claude cannot reliably map headers Add manual column map in config; check header language
schema_conflict Row data doesn't fit target schema Review dry-run output; use schema=raw_json as fallback
pii_detected_blocked PII found and block_pii=true Sanitize source file; or set block_pii=false with explicit acknowledgment
anthropic_rate_limit Too many API calls in bulk run Add rate_limit_delay=2s to config; or batch in smaller chunks
xlsx_parse_error Corrupted or password-protected file Decrypt/repair source file first
empty_sheet Sheet has headers but no data rows Logged as warning; import continues for other sheets

  • [[import-vscode-extension-builder-lawvable]]
  • [[connector-document-upload]]
  • [[review-contract-batch]]
  • [[draft-clause-library-standard]]