feed-status-incident-watcher

Category: Design Risk: Low risk ★ 3.9 · Rating 3.9/5 (8) sboghossian/mini-claude-for-legal MIT

Rating is derived from the repo's GitHub stars and shown for reference.

automation_control

Download zip View source

name: feed-status-incident-watcher
description: Use when the platform needs to monitor and surface Louis platform status events — incidents, degradations, scheduled maintenance, and recovery confirmations — to users in a non-intrusive but timely way. Integrates with the status page and incident management system to convert operational events into user-facing feed items, push alerts, and in-app banners calibrated to incident severity and user activity state.
license: MIT
metadata:
id: feed.status-incident-watcher
category: feed
jurisdictions: [multi]
priority: P3
intent: [feed, system-status, incident-management, uptime-transparency]
related: [feed-changelog-watcher, feed-haqq-press-releases, ops-churn-risk-detector]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"

Status Incident Watcher Feed Surface

Purpose

Legal practitioners depend on Louis for time-sensitive work — deadlines, court filings, deal closings. Platform unavailability or degradation during a critical moment erodes trust disproportionately. This feed surface provides transparent, real-time, proactive communication about system status events, ensuring users are never left wondering whether a problem is on their side or the platform's.

The status incident watcher is the negative-space companion to [[feed-changelog-watcher]]: where the changelog communicates improvements, this surface communicates problems and resolutions.

Event types and severity levels

Severity	Definition	Examples
P0 — Critical outage	Full service unavailability; no users can access Louis	API down, auth service failure, database corruption
P1 — Major degradation	Core feature broken for significant user subset	AI skill responses failing, document generation broken
P2 — Minor degradation	Non-core feature impaired; workaround exists	Feed not updating, push notifications delayed
P3 — Maintenance	Planned downtime or partial service impact	Scheduled database migration, API version upgrade
Resolved	Recovery from P0/P1/P2	Service restored; post-incident report pending

Monitoring sources

Status page (e.g., statuspage.io or self-hosted): primary source of structured incident data.
Internal alerting (PagerDuty / OpsGenie / equivalent): incident creation triggers feed item.
Synthetic monitoring: automated health checks (endpoint uptime, AI response latency, document generation pipeline health).
User-reported issues: if > N users report the same error within a short window, auto-escalate to P2.

Delivery logic by severity

P0 — Critical outage

Immediate in-app banner (red) surfaced to all active users.
Push notification to all users with notifications enabled.
Email to enterprise/eFirm account admins.
Status feed item: created within 5 minutes of incident declaration.

P1 — Major degradation

In-app banner (orange) surfaced to affected users.
Push notification to users actively using the affected feature.
Status feed item: created within 10 minutes.

P2 — Minor degradation

Status feed item only (no banner, no push unless user is actively experiencing the issue).
Created within 30 minutes.

P3 — Planned maintenance

Status feed item + in-app notice created ≥ 24 hours before the maintenance window.
Email to enterprise admins if downtime > 30 minutes.

Resolved

In-app banner cleared.
Resolution status feed item surfaced to users who saw the incident item.
Post-incident summary published ≤ 48 hours after P0/P1 resolution.

Output spec

{
  "id": "status-item-uuid",
  "incident_id": "INC-2025-0512-001",
  "severity": "P1",
  "status": "investigating | identified | monitoring | resolved",
  "title": "AI Skill Responses Intermittently Failing",
  "message": "We are investigating reports of intermittent failures when invoking legal drafting and review skills. Some users may experience errors or delayed responses. Our team has identified the issue and is deploying a fix.",
  "started_at": "2025-05-12T14:23:00Z",
  "resolved_at": null,
  "affected_features": ["draft-skills", "review-skills"],
  "source_url": "https://status.louis.law/incidents/INC-2025-0512-001",
  "updates": [
    {
      "timestamp": "2025-05-12T14:23:00Z",
      "status": "investigating",
      "message": "Investigating reports of AI skill failures."
    },
    {
      "timestamp": "2025-05-12T14:45:00Z",
      "status": "identified",
      "message": "Root cause identified: upstream model API rate limit. Fix deploying."
    }
  ]
}

Communication tone guidelines

Status communications must be:

Honest: acknowledge the impact accurately. Do not minimize a P1 as "minor."
Non-technical by default: "AI responses are taking longer than usual" is better than "increased p99 latency on inference endpoint."
Action-oriented where possible: if a workaround exists, state it. If the user should save their work before maintenance, say so.
Timely: an incident update 2 hours after a P0 is declared is worse than no update; the target is < 15 minutes to first communication.
Closed-loop: every incident must have a "Resolved" status update. Incidents cannot be quietly forgotten.

Post-incident report format

For P0 and P1 incidents, publish a post-incident report:

Summary: what happened, what users experienced, duration.
Root cause: what caused the incident (non-technical summary).
Resolution: what was done to fix it.
Prevention: what changes will prevent recurrence.
Timeline: key timestamps in the incident lifecycle.

Failure modes

Status page itself unavailable: if the status page is unreachable, the in-app system falls back to a static banner: "We're experiencing technical difficulties. Our team is investigating. Check status.louis.law."
Overcommunication fatigue: P2 incidents that resolve within 15 minutes should be suppressed from the user feed (log in ops only) unless the user was actively using the affected feature.
False alarms: synthetic monitoring false positives should be resolved before surfacing to users. Target: zero false-alarm user notifications per month.

[[feed-changelog-watcher]]
[[feed-haqq-press-releases]]
[[ops-churn-risk-detector]]