How We Built Aanshbot: A Practical Technical Deep Dive

February 22, 202615 min readby Briefcase AI Team
Technical ImplementationRAGAI SafetyRetrievalEvaluationPython

See how Briefcase AI eliminates escalations in your stack

From trace-level diagnostics to compliance-ready evidence.

How We Built Aanshbot: A Practical Technical Deep Dive

This is the technical companion to our executive launch post. For the strategic narrative, read Aanshbot: Turning Discovery Conversations Into Decision Intelligence.


Product Requirements That Drove the Architecture

Aanshbot was built around one product goal: improve the next discovery question using corpus evidence while enforcing role-based privacy by default.

Five requirements shaped the implementation:

  1. Responses must follow a strict coaching contract, not generic Q&A.
  2. Internal and external users should use one app with different evidence visibility.
  3. Retrieval must be grounded in curated corpus evidence.
  4. Safety controls must handle redaction and leakage automatically.
  5. The system must ship quickly on managed infrastructure with clear boundaries.

The resulting design is retrieval-first, schema-constrained, and role-aware.

Stack and Service Boundaries

Current stack:

  • Next.js App Router + TypeScript (UI and API routes)
  • Supabase (Auth, Postgres, pgvector, RLS)
  • OpenAI (generation + embeddings)
  • Vercel-ready deployment model

In practice, this is one Next.js app with route groups for intake, chat, analysis, planning, evidence exploration, playbooks, feedback, ingestion, and version diffs.

Aanshbot system architecture across app, data, model, and policy layersAanshbot system architecture across app, data, model, and policy layers

FIGURE 1: Layered system architecture for Aanshbot across interface, orchestration, retrieval, and safety layers.

Data Model and Access Posture

The data model is centered on session context, evidence lineage, and traceable outputs.

Core entities include users, sessions, documents, chunks, evidence refs, messages, feedback, playbooks, and document diffs.

Two implementation choices are critical:

  1. Generated responses are traceable to chunk-level evidence.
  2. Evidence has role-specific display tags for internal vs external rendering.

RLS is enabled broadly. User-owned session/message/feedback/playbook flows remain accessible, while corpus and quota-sensitive tables are protected behind server-side access paths.

Role-aware data access and RLS posture mapRole-aware data access and RLS posture map

FIGURE 2: Role-safe access posture showing user-owned paths versus protected corpus and control tables.

Ingestion Pipeline and Corpus Versioning

Ingestion accepts multiple source formats and normalizes them into retrieval-ready chunks.

Pipeline:

  1. Parse file content into normalized text.
  2. Build structured blocks (interview sections when available).
  3. Chunk into semantic segments with metadata.
  4. Generate embeddings (text-embedding-3-small default).
  5. Insert chunks and evidence references.
  6. Switch active corpus version.
  7. Compute and store version-diff summary when prior version exists.

Current implementation note: active-version switching uses a two-step flow, not a single transactional swap.

Ingestion and corpus versioning pipelineIngestion and corpus versioning pipeline

FIGURE 3: Ingestion-to-activation flow, including chunk generation, evidence refs, and version diff creation.

Hybrid Retrieval and App-Layer Reranking

Retrieval combines SQL-level hybrid scoring with app-level reranking under latency budget controls.

Core blend formula:

hybrid_score = semantic_score * 0.82 + keyword_score * 0.18

SQL phase handles semantic + lexical matching over active corpus only. App phase improves continuity and ranking quality for multi-turn interview coaching.

PYTHON
1def retrieve_ranked_evidence(query, conversation_context, budget_ms):
2    retrieval_query = build_query(query, conversation_context)
3
4    primary = sql_hybrid_search(retrieval_query, top_k=24)
5    candidates = primary
6
7    if weak_or_sparse(primary) and time_remaining(budget_ms):
8        rescue = keyword_rescue_search(retrieval_query, top_k=12)
9        candidates = merge_unique(primary, rescue)
10
11    ranked = rerank_with_features(
12        candidates,
13        lexical_overlap=True,
14        recency=True,
15        diversity_penalty="mmr_style",
16    )
17
18    return ranked[:8]
100%
Rendering diagram...

Hybrid retrieval and reranking sequenceHybrid retrieval and reranking sequence

FIGURE 4: Retrieval sequence from hybrid search through rescue and reranking for final evidence selection.

Generation Contract and Schema Enforcement

Generation is prompt-constrained, JSON-only, and schema-validated.

Model routing:

  • Primary: gpt-4.1-mini
  • Fallback: gpt-4o-mini
  • Verifier default: gpt-4o-mini

Output contract requires exactly three buckets (problem, workflow, risk) and required fields per question.

PYTHON
1class Question(BaseModel):
2    main: str
3    rephrased: str
4    why_it_matters: str
5    what_to_listen_for: str
6    confidence: float
7
8class ResponseContract(BaseModel):
9    synthesis: str
10    problem: list[Question]
11    workflow: list[Question]
12    risk: list[Question]
13    follow_up_paths: list[str]
14
15
16def normalize_contract(payload):
17    parsed = ResponseContract.model_validate(payload)
18    ensure_exact_three_buckets(parsed)
19    return fill_safe_defaults(parsed)

Verifier Gate and Repair Strategy

A verifier stage is triggered when output quality/grounding/confidence risk crosses threshold inside a bounded latency budget.

Repair strategy has two levels:

  1. Deterministic bucket repairs (fast path)
  2. Optional model rewrite for only failing questions (time-budget permitting)
PYTHON
1def maybe_repair_output(contract, quality, grounding, confidence, budget_ms):
2    if not should_verify(quality, grounding, confidence):
3        return contract
4
5    repaired = deterministic_repair(contract)
6
7    if still_failing(repaired) and time_remaining(budget_ms):
8        repaired = targeted_model_rewrite(repaired, failing_buckets_only=True)
9
10    return repaired

Generation plus verifier decision flowGeneration plus verifier decision flow

FIGURE 5: Verifier-gated repair flow balancing quality recovery with strict response-time budgets.

Safety Modes and Redaction Loop

Safety is role-aware and enforced in layered stages:

  1. Role policy in generation prompts
  2. External-mode redaction transforms
  3. Leakage detection on generated output
  4. Stricter regeneration when required
  5. Safe fallback coaching when leakage persists

Implemented safety states:

  • internal_named
  • external_redacted
  • external_regenerated
  • external_fallback
PYTHON
1def safe_external_response(draft, sensitive_terms):
2    redacted = redact_external(draft, sensitive_terms)
3    if not leaks(redacted):
4        return redacted, "external_redacted"
5
6    regenerated = regenerate_strict(redacted)
7    if not leaks(regenerated):
8        return regenerated, "external_regenerated"
9
10    return fallback_coaching(), "external_fallback"
100%
Rendering diagram...

Safety mode state machine for internal and external responsesSafety mode state machine for internal and external responses

FIGURE 6: Safety state transitions for redaction, regeneration, and fallback protection.

Shared Feature Modules (Analyze, Plan, Playbooks, Explorer)

Aanshbot keeps advanced features on the same retrieval and policy stack:

  • Answer Analyzer
  • Interview Plan mode
  • Question quality scoring (specificity, leading strength, decision relevance)
  • Contradiction detection by theme
  • Confidence meter from retrieval strength, diversity, and recency
  • Saved playbooks
  • Version diffs
  • Evidence explorer filters

This avoids fragmented prompt islands and keeps behavior consistent across workflows.

Auth Reliability and Quota Controls

Auth reliability was hardened around three areas:

  1. Redirect base resolution across configured base URL, forwarded origin, and host fallback.
  2. Session callback handling for both token-hash and code exchange paths.
  3. Quota/rate pathways that convert provider limits into clear user-facing retry guidance.
PYTHON
1def resolve_auth_redirect_base(configured_base, request):
2    if configured_base:
3        return configured_base
4    if request.forwarded_origin:
5        return request.forwarded_origin
6    return request.host_origin

Exact threshold values and anti-abuse tuning are intentionally omitted from this public write-up.

Observability and Learning Loop

Two primary outcome signals are persisted:

  • usefulness_score (1-5)
  • used_question (boolean)

Structured assistant payloads are also stored to support later analysis by mode and context.

Confidence meter skeleton:

confidence = w_r * retrieval + w_d * diversity + w_c * recency

Production weights are internal and tuned over time.

Product learning loop from usage to feedback to prompt and playbook updatesProduct learning loop from usage to feedback to prompt and playbook updates

FIGURE 7: Learning loop from conversation usage signals to retrieval/prompt/playbook refinements (illustrative).

Tradeoffs and Current Boundaries

Current scope is intentionally narrow:

  • Curated manual ingestion (no auto transcript sync in v1)
  • Single active corpus retrieval path
  • Strict output contract over open-ended generative flexibility
  • Managed infrastructure speed over custom infra complexity

That constraint set kept the system deployable for real discovery workflows while preserving a clean foundation for deeper evaluation and analytics.

To see the product experience that this architecture supports, visit askaansh.briefcaseai.org.

Want fewer escalations? See a live trace.

See Briefcase on your stack

Reduce escalations: Catch issues before they hit production with comprehensive observability

Auditability & replay: Complete trace capture for debugging and compliance