feed-status-incident-watcher
Rating is derived from the repo's GitHub stars and shown for reference.
automation_control
name: feed-status-incident-watcher
description: Use when the platform needs to monitor and surface Louis platform status events — incidents, degradations, scheduled maintenance, and recovery confirmations — to users in a non-intrusive but timely way. Integrates with the status page and incident management system to convert operational events into user-facing feed items, push alerts, and in-app banners calibrated to incident severity and user activity state.
license: MIT
metadata:
id: feed.status-incident-watcher
category: feed
jurisdictions: [multi]
priority: P3
intent: [feed, system-status, incident-management, uptime-transparency]
related: [feed-changelog-watcher, feed-haqq-press-releases, ops-churn-risk-detector]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"
Status Incident Watcher Feed Surface
Purpose
Legal practitioners depend on Louis for time-sensitive work — deadlines, court filings, deal closings. Platform unavailability or degradation during a critical moment erodes trust disproportionately. This feed surface provides transparent, real-time, proactive communication about system status events, ensuring users are never left wondering whether a problem is on their side or the platform's.
The status incident watcher is the negative-space companion to [[feed-changelog-watcher]]: where the changelog communicates improvements, this surface communicates problems and resolutions.
Event types and severity levels
| Severity | Definition | Examples |
|---|---|---|
| P0 — Critical outage | Full service unavailability; no users can access Louis | API down, auth service failure, database corruption |
| P1 — Major degradation | Core feature broken for significant user subset | AI skill responses failing, document generation broken |
| P2 — Minor degradation | Non-core feature impaired; workaround exists | Feed not updating, push notifications delayed |
| P3 — Maintenance | Planned downtime or partial service impact | Scheduled database migration, API version upgrade |
| Resolved | Recovery from P0/P1/P2 | Service restored; post-incident report pending |
Monitoring sources
- Status page (e.g., statuspage.io or self-hosted): primary source of structured incident data.
- Internal alerting (PagerDuty / OpsGenie / equivalent): incident creation triggers feed item.
- Synthetic monitoring: automated health checks (endpoint uptime, AI response latency, document generation pipeline health).
- User-reported issues: if > N users report the same error within a short window, auto-escalate to P2.
Delivery logic by severity
P0 — Critical outage
- Immediate in-app banner (red) surfaced to all active users.
- Push notification to all users with notifications enabled.
- Email to enterprise/eFirm account admins.
- Status feed item: created within 5 minutes of incident declaration.
P1 — Major degradation
- In-app banner (orange) surfaced to affected users.
- Push notification to users actively using the affected feature.
- Status feed item: created within 10 minutes.
P2 — Minor degradation
- Status feed item only (no banner, no push unless user is actively experiencing the issue).
- Created within 30 minutes.
P3 — Planned maintenance
- Status feed item + in-app notice created ≥ 24 hours before the maintenance window.
- Email to enterprise admins if downtime > 30 minutes.
Resolved
- In-app banner cleared.
- Resolution status feed item surfaced to users who saw the incident item.
- Post-incident summary published ≤ 48 hours after P0/P1 resolution.
Output spec
{
"id": "status-item-uuid",
"incident_id": "INC-2025-0512-001",
"severity": "P1",
"status": "investigating | identified | monitoring | resolved",
"title": "AI Skill Responses Intermittently Failing",
"message": "We are investigating reports of intermittent failures when invoking legal drafting and review skills. Some users may experience errors or delayed responses. Our team has identified the issue and is deploying a fix.",
"started_at": "2025-05-12T14:23:00Z",
"resolved_at": null,
"affected_features": ["draft-skills", "review-skills"],
"source_url": "https://status.louis.law/incidents/INC-2025-0512-001",
"updates": [
{
"timestamp": "2025-05-12T14:23:00Z",
"status": "investigating",
"message": "Investigating reports of AI skill failures."
},
{
"timestamp": "2025-05-12T14:45:00Z",
"status": "identified",
"message": "Root cause identified: upstream model API rate limit. Fix deploying."
}
]
}
Communication tone guidelines
Status communications must be:
- Honest: acknowledge the impact accurately. Do not minimize a P1 as "minor."
- Non-technical by default: "AI responses are taking longer than usual" is better than "increased p99 latency on inference endpoint."
- Action-oriented where possible: if a workaround exists, state it. If the user should save their work before maintenance, say so.
- Timely: an incident update 2 hours after a P0 is declared is worse than no update; the target is < 15 minutes to first communication.
- Closed-loop: every incident must have a "Resolved" status update. Incidents cannot be quietly forgotten.
Post-incident report format
For P0 and P1 incidents, publish a post-incident report:
- Summary: what happened, what users experienced, duration.
- Root cause: what caused the incident (non-technical summary).
- Resolution: what was done to fix it.
- Prevention: what changes will prevent recurrence.
- Timeline: key timestamps in the incident lifecycle.
Failure modes
- Status page itself unavailable: if the status page is unreachable, the in-app system falls back to a static banner: "We're experiencing technical difficulties. Our team is investigating. Check status.louis.law."
- Overcommunication fatigue: P2 incidents that resolve within 15 minutes should be suppressed from the user feed (log in ops only) unless the user was actively using the affected feature.
- False alarms: synthetic monitoring false positives should be resolved before surfacing to users. Target: zero false-alarm user notifications per month.
Related skills
- [[feed-changelog-watcher]]
- [[feed-haqq-press-releases]]
- [[ops-churn-risk-detector]]