feed-status-incident-watcher

Category: Design Risk: Low risk ★ 3.9 · Rating 3.9/5 (8) sboghossian/mini-claude-for-legal MIT

Rating is derived from the repo's GitHub stars and shown for reference.

automation_control

name: feed-status-incident-watcher
description: Use when the platform needs to monitor and surface Louis platform status events — incidents, degradations, scheduled maintenance, and recovery confirmations — to users in a non-intrusive but timely way. Integrates with the status page and incident management system to convert operational events into user-facing feed items, push alerts, and in-app banners calibrated to incident severity and user activity state.
license: MIT
metadata:
id: feed.status-incident-watcher
category: feed
jurisdictions: [multi]
priority: P3
intent: [feed, system-status, incident-management, uptime-transparency]
related: [feed-changelog-watcher, feed-haqq-press-releases, ops-churn-risk-detector]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"

Status Incident Watcher Feed Surface

Purpose

Legal practitioners depend on Louis for time-sensitive work — deadlines, court filings, deal closings. Platform unavailability or degradation during a critical moment erodes trust disproportionately. This feed surface provides transparent, real-time, proactive communication about system status events, ensuring users are never left wondering whether a problem is on their side or the platform's.

The status incident watcher is the negative-space companion to [[feed-changelog-watcher]]: where the changelog communicates improvements, this surface communicates problems and resolutions.

Event types and severity levels

Severity Definition Examples
P0 — Critical outage Full service unavailability; no users can access Louis API down, auth service failure, database corruption
P1 — Major degradation Core feature broken for significant user subset AI skill responses failing, document generation broken
P2 — Minor degradation Non-core feature impaired; workaround exists Feed not updating, push notifications delayed
P3 — Maintenance Planned downtime or partial service impact Scheduled database migration, API version upgrade
Resolved Recovery from P0/P1/P2 Service restored; post-incident report pending

Monitoring sources

  • Status page (e.g., statuspage.io or self-hosted): primary source of structured incident data.
  • Internal alerting (PagerDuty / OpsGenie / equivalent): incident creation triggers feed item.
  • Synthetic monitoring: automated health checks (endpoint uptime, AI response latency, document generation pipeline health).
  • User-reported issues: if > N users report the same error within a short window, auto-escalate to P2.

Delivery logic by severity

P0 — Critical outage

  • Immediate in-app banner (red) surfaced to all active users.
  • Push notification to all users with notifications enabled.
  • Email to enterprise/eFirm account admins.
  • Status feed item: created within 5 minutes of incident declaration.

P1 — Major degradation

  • In-app banner (orange) surfaced to affected users.
  • Push notification to users actively using the affected feature.
  • Status feed item: created within 10 minutes.

P2 — Minor degradation

  • Status feed item only (no banner, no push unless user is actively experiencing the issue).
  • Created within 30 minutes.

P3 — Planned maintenance

  • Status feed item + in-app notice created ≥ 24 hours before the maintenance window.
  • Email to enterprise admins if downtime > 30 minutes.

Resolved

  • In-app banner cleared.
  • Resolution status feed item surfaced to users who saw the incident item.
  • Post-incident summary published ≤ 48 hours after P0/P1 resolution.

Output spec

{
  "id": "status-item-uuid",
  "incident_id": "INC-2025-0512-001",
  "severity": "P1",
  "status": "investigating | identified | monitoring | resolved",
  "title": "AI Skill Responses Intermittently Failing",
  "message": "We are investigating reports of intermittent failures when invoking legal drafting and review skills. Some users may experience errors or delayed responses. Our team has identified the issue and is deploying a fix.",
  "started_at": "2025-05-12T14:23:00Z",
  "resolved_at": null,
  "affected_features": ["draft-skills", "review-skills"],
  "source_url": "https://status.louis.law/incidents/INC-2025-0512-001",
  "updates": [
    {
      "timestamp": "2025-05-12T14:23:00Z",
      "status": "investigating",
      "message": "Investigating reports of AI skill failures."
    },
    {
      "timestamp": "2025-05-12T14:45:00Z",
      "status": "identified",
      "message": "Root cause identified: upstream model API rate limit. Fix deploying."
    }
  ]
}

Communication tone guidelines

Status communications must be:

  • Honest: acknowledge the impact accurately. Do not minimize a P1 as "minor."
  • Non-technical by default: "AI responses are taking longer than usual" is better than "increased p99 latency on inference endpoint."
  • Action-oriented where possible: if a workaround exists, state it. If the user should save their work before maintenance, say so.
  • Timely: an incident update 2 hours after a P0 is declared is worse than no update; the target is < 15 minutes to first communication.
  • Closed-loop: every incident must have a "Resolved" status update. Incidents cannot be quietly forgotten.

Post-incident report format

For P0 and P1 incidents, publish a post-incident report:

  1. Summary: what happened, what users experienced, duration.
  2. Root cause: what caused the incident (non-technical summary).
  3. Resolution: what was done to fix it.
  4. Prevention: what changes will prevent recurrence.
  5. Timeline: key timestamps in the incident lifecycle.

Failure modes

  • Status page itself unavailable: if the status page is unreachable, the in-app system falls back to a static banner: "We're experiencing technical difficulties. Our team is investigating. Check status.louis.law."
  • Overcommunication fatigue: P2 incidents that resolve within 15 minutes should be suppressed from the user feed (log in ops only) unless the user was actively using the affected feature.
  • False alarms: synthetic monitoring false positives should be resolved before surfacing to users. Target: zero false-alarm user notifications per month.
  • [[feed-changelog-watcher]]
  • [[feed-haqq-press-releases]]
  • [[ops-churn-risk-detector]]