# Risk Register (Operations, Scalability, Integrations)

Scope based on PRD §15 and integration surfaces.

## Top Risks and Mitigations

- OpenAI latency spikes or outages
  - Impact: AI latency > 2.5s, degraded UX.
  - Mitigations: 20s timeout + single retry; circuit breaker; queue long tasks; feature flag per module; cached answers for common queries (short TTL) (PRD §15).

- majidapi instability, rate limits, or schema drift
  - Impact: empty/late opportunities; scoring unreliable.
  - Mitigations: client with backoff & retry; schema guards + tolerant parser; alert on drops; disable scraper profiles via admin; local cache of last N ads; fallback rules (PRD §12/§15).

- Payment gateway (Zarinpal) verify timeouts or callback failures
  - Impact: user pays but subscription not active.
  - Mitigations: idempotent verify; reconcile job; persistent payment logs; user-facing retry link with guidance (PRD §7.14, §15).

- Telegram webhook delivery failures or token leak
  - Impact: bot unresponsive; abuse.
  - Mitigations: webhook secret; rotate tokens; 200 on errors with internal logging; alerting; restricted admin commands.

- Vector store/file processing delays (OpenAI)
  - Impact: RAG degraded; modules answer poorly.
  - Mitigations: show file status in admin; pre-warm/attach critical docs; fall back to core prompts; disable RAG per module if file_search unstable.

- Affordable-ads false positives/negatives due to thin data
  - Impact: user trust erosion; low CTR.
  - Mitigations: enforce min-N for median; fallback −10% then rule-based; human review for pushes; show concise reasons; archive >72h (PRD §12).

- Data growth (chats, ads) impacts query latency
  - Impact: slow panels and user queries.
  - Mitigations: indexes as in ERD; partition/TTL archive for old chats/opportunities; paginate aggressively.

- Cost overrun (AI tokens)
  - Impact: burn rate too high.
  - Mitigations: quotas per module; rate limits; prefer small model defaults; cap context length; cache frequent answers (PRD §15).

- Privacy/compliance gaps (out of scope)
  - Impact: risk if mishandled later.
  - Mitigations: flag as out-of-scope for v1; avoid PII persistence beyond Telegram IDs; document retention policy later (not in PRD).

## Monitoring & Alerts

- Daily report: latency p50/80/95, error%, token usage, CTR, module usage (PRD §2, §15).
- Threshold alerts: AI latency p80 > 2.5s (3h), payment success < 98% (24h), majidapi error rate > 10% (1h).

## OPEN QUESTIONS

- Required SLAs for majidapi not defined; confirm acceptable error rates.
- Incident communication path (to users) not defined in PRD.

