eng-token-budget-by-tier

Category: General Risk: Medium risk ★ 3.9 · Rating 3.9/5 (8) sboghossian/mini-claude-for-legal MIT

Rating is derived from the repo's GitHub stars and shown for reference.

network_accessfilesystem_access

name: eng-token-budget-by-tier
description: Use when implementing or configuring the per-tier token budget system that limits LLM usage by subscription plan. Defines the monthly token allowances per tier, enforcement logic, overage handling, BYO-key bypass rules, and the Edge Function middleware that gates requests before they reach the LLM.
license: MIT
metadata:
id: eng.token-budget-by-tier
category: eng
jurisdictions: [multi]
priority: P2
intent: [eng, billing, token-budget, rate-limiting, tier]
related: [eng-supabase-edge-functions-patterns, eng-streaming-response-rules-mobile, eng-posthog-event-naming-convention]
source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
version: "1.0"

Token Budget by Tier

What it does

Every LLM call costs tokens; token costs must be controlled per subscription tier. The token budget system tracks monthly token consumption per user/workspace, enforces hard limits before requests reach the LLM API, and provides the data the frontend needs to display usage warnings. It also handles the BYO-key (Bring Your Own Key) path, where the user's own Anthropic API key is used and platform token limits do not apply.

Setup / auth

Supabase schema:

CREATE TABLE token_usage (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES auth.users(id),
  workspace_id UUID NOT NULL REFERENCES workspaces(id),
  period TEXT NOT NULL,        -- "2026-05" (YYYY-MM)
  tokens_input INT NOT NULL DEFAULT 0,
  tokens_output INT NOT NULL DEFAULT 0,
  tokens_total INT GENERATED ALWAYS AS (tokens_input + tokens_output) STORED,
  last_updated TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(user_id, workspace_id, period)
);

-- RLS: users see only their own usage
ALTER TABLE token_usage ENABLE ROW LEVEL SECURITY;
CREATE POLICY "own usage" ON token_usage FOR SELECT
  USING (user_id = auth.uid());

Capabilities

Tier definitions

Tier Monthly token budget Model access Notes
free 20,000 tokens claude-haiku only Resets on 1st of month
pro 500,000 tokens claude-sonnet Resets on billing anchor date
business 3,000,000 tokens claude-sonnet + opus on request Workspace-level pool
byo Unlimited (user's own key) Any model Platform does not count tokens
internal Unlimited Any model Dev/test accounts

Token counts are input + output combined (total tokens billed). Cache reads count at 10% weight (Anthropic prompt caching discount).

Budget check middleware

async function checkTokenBudget(userId: string, tier: Tier, estimatedTokens: number): Promise<BudgetResult> {
  if (tier === "byo" || tier === "internal") return { allowed: true, remaining: Infinity };

  const period = getCurrentPeriod();  // "2026-05"
  const budget = TIER_BUDGETS[tier];

  const { data } = await supabase
    .from("token_usage")
    .select("tokens_total")
    .eq("user_id", userId)
    .eq("period", period)
    .single();

  const used = data?.tokens_total ?? 0;
  const remaining = budget - used;

  if (remaining <= 0) return { allowed: false, remaining: 0, reason: "budget_exhausted" };
  if (estimatedTokens > remaining) return { allowed: false, remaining, reason: "request_too_large" };

  return { allowed: true, remaining };
}

Token recording (post-call)

After every successful LLM response, upsert the usage record:

async function recordTokenUsage(userId: string, workspaceId: string, usage: { input: number; output: number }) {
  const period = getCurrentPeriod();
  await supabase.rpc("increment_token_usage", {
    p_user_id: userId,
    p_workspace_id: workspaceId,
    p_period: period,
    p_input: usage.input,
    p_output: usage.output,
  });
}

SQL function (atomic upsert):

CREATE FUNCTION increment_token_usage(
  p_user_id UUID, p_workspace_id UUID, p_period TEXT,
  p_input INT, p_output INT
) RETURNS VOID LANGUAGE SQL AS $
  INSERT INTO token_usage(user_id, workspace_id, period, tokens_input, tokens_output)
  VALUES (p_user_id, p_workspace_id, p_period, p_input, p_output)
  ON CONFLICT (user_id, workspace_id, period)
  DO UPDATE SET
    tokens_input = token_usage.tokens_input + EXCLUDED.tokens_input,
    tokens_output = token_usage.tokens_output + EXCLUDED.tokens_output,
    last_updated = NOW();
$;

BYO-key path

When a user has configured a BYO Anthropic key:

  1. The Edge Function decrypts the stored key from Supabase Vault.
  2. The key is used in the Anthropic API call instead of the platform key.
  3. checkTokenBudget returns { allowed: true, remaining: Infinity } immediately.
  4. Token usage is still recorded locally (for analytics) but not enforced as a limit.
  5. The PostHog event chat_response_completed includes tier: "byo".

Usage API endpoint

GET /api/usage
→ {
    tier: "pro",
    period: "2026-05",
    tokensUsed: 123456,
    tokensBudget: 500000,
    tokensRemaining: 376544,
    percentUsed: 24.7,
    resetDate: "2026-06-01"
  }

This feeds the frontend usage display and [[eng-streaming-response-rules-mobile]] token pill.

Permissions & safety

  • Budget enforcement must happen server-side in the Edge Function, before any LLM call. Client-side budget checks are for UX only, never for enforcement.
  • The tokens_total column is a generated column — it cannot be manipulated by application code.
  • Do not expose the raw token_usage table via the public REST API. Use a dedicated /api/usage endpoint that returns only the user's own data.
  • Overage: if checkTokenBudget returns allowed: false, return HTTP 402 with { error: "token_budget_exhausted", upgradeUrl: "/billing/upgrade" }.

Failure modes

Failure Impact Mitigation
Token recording fails (DB error) Usage under-counted; budget not enforced Log failure; retry async; set usage = budget as conservative fallback
Budget check skipped for BYO tier Fine by design; but must verify tier correctly Always resolve tier from DB, not from JWT claim which could be stale
Period boundary race condition User gets double budget on month boundary Upsert with ON CONFLICT handles this; period key is YYYY-MM
Estimated tokens wildly wrong Request allowed but exceeds budget Estimate conservatively (2× average); record actual after call
  • [[eng-supabase-edge-functions-patterns]] — the Edge Function that runs the budget check
  • [[eng-streaming-response-rules-mobile]] — displays remaining budget in the chat UI
  • [[eng-posthog-event-naming-convention]] — token_budget_exhausted event definition