AI OS Finance

2025 AI ENGINEERING · FINTECH OPENAI · FASTAPI · N8N · PYDANTIC

Deterministic DCF engine with AI-augmented research — LLM never touches the math, only the narrative

A finance-grade system where the separation between AI and computation is an architectural constraint enforced by agent boundaries — not a guardrail bolted on afterwards. The LLM has no access to the computation layer, only to its validated output.

specialised agents with typed boundaries

execution modes — MOCK and PROD

LLM calls in the valuation computation path

100%

agent responses as typed Pydantic envelopes

Replace the spreadsheet without introducing the opposite failure mode: numbers that look plausible but can't be audited.

Financial analysts run valuation models — DCF, comparables, sensitivity tables — manually in spreadsheets. The workflow is slow to iterate on, impossible to version, and entirely disconnected from the research context that informs the assumptions. Replacing it with an LLM introduces the opposite failure mode: models that produce numbers that look plausible but aren't deterministically reproducible and can't be audited back to their inputs.

A finance-grade system needs both: AI-assisted research and interpretation, with deterministic computation underneath it that an analyst can verify line by line. The hard part is enforcing that separation architecturally — not as a comment in the code, but as a boundary the LLM physically cannot cross.

Four specialised agents, typed Pydantic boundaries, zero LLM involvement in the computation path.

The pipeline runs four agents behind FastAPI endpoints: a Research Agent (interprets intent, can call web search), a Data Engineering Agent (normalises incoming dataset context), and a two-stage finance pair — finance_v1 builds the valuation model scaffold and finance_v2 interprets it. Every agent returns a typed Pydantic AgentResponse envelope (status · agent · data · errors · metadata), and every endpoint declares a Pydantic request model, so malformed requests are rejected at the API boundary before an agent runs.
The deterministic computation layer — calculate_dcf and the scenario analyzer — runs pure Python with no LLM involvement at any point. It enforces its own explicit preconditions rather than trusting upstream data: it refuses to run if WACC ≤ terminal growth, requires at least one of shares-outstanding or net-debt, and normalises a scalar growth rate into a validated per-year vector. Outputs (projection table, valuation summary, named-scenario comparison) are exported as CSV for Excel/BI compatibility.
The finance_v2 interpreter then receives the model scaffold and generates the written investment narrative — its system prompt hard-constrains it to the provided scaffold: it must not invent numbers, must not modify inputs, and must state missing values explicitly. It reasons over the numbers, not around them.
n8n acts as the external control plane, orchestrating the agent sequence, triggering data fetches, handling retry logic and error routing, and assembling the final report. FastAPI exposes the computation layer as a service that n8n calls — orchestration and computation stay decoupled.

Why these choices — each enforces the AI/computation separation as a hard constraint, not a soft guideline.

LLMs for narrative only — deterministic Python for all math. Financial calculations (DCF and scenario analysis) are deterministic Python. LLMs are invoked only to interpret results and generate written analysis — never to produce numbers. This isn't a guardrail; it's an architectural constraint: the math lives in calculate_dcf, which no agent's LLM call can reach — the interpreter only ever sees the computed scaffold.
n8n for workflow orchestration over custom DAG code. The data fetch, model trigger, and report assembly steps change frequently as the analytical workflows evolve. A visual DAG in n8n is faster to iterate on than Python scheduler code, makes the workflow auditable without reading source, and handles retry logic and error routing without custom implementation.
Typed boundaries via Pydantic. Every agent returns a Pydantic AgentResponse envelope, and every FastAPI endpoint declares a Pydantic request model — so a malformed request is rejected at the API boundary before any agent runs. The deterministic layer then adds its own explicit numeric guards on top (WACC > terminal growth; a required equity input), because a well-typed request can still carry economically invalid numbers.
MOCK mode as a first-class execution mode. All LLM calls route through a single call_llm entrypoint; in MOCK mode every one of them is swapped for a deterministic response, making the pipeline fully testable without API keys or spend. The pure-Python computation path is identical in both modes. CI runs against MOCK exclusively.

Full Docker Compose stack · n8n + FastAPI + OpenAI SDK · MOCK and PROD modes.

specialised agents with typed I/O

LLM calls in computation path

execution modes (MOCK / PROD)

100%

agent responses typed (Pydantic envelope)

What broke during build — and what I changed to fix it.

The interpreter drifted from the model output when its prompt was too open-ended. An early prompt asked the interpreter to "summarise the DCF analysis" and it paraphrased — occasionally contradicting — the actual numbers. The fix hardened the finance_v2 system prompt into a contract: use only the provided scaffold as the source of truth, never invent numbers or assumptions, never modify inputs, and state anything missing explicitly. The LLM interprets; it cannot invent.
Deterministic math still needs explicit failure modes — silence is the dangerous default. An early DCF run with WACC below the terminal growth rate produced a plausible-looking but mathematically nonsensical enterprise value (the Gordon-growth denominator goes negative). The fix: calculate_dcf raises on WACC ≤ terminal growth, on a missing revenue input, and when neither shares-outstanding nor net-debt is supplied — the computation fails loudly at the input boundary rather than emitting a confident wrong number.
An agent could loop on tool calls with no ceiling. The router originally let an agent request a tool, get a result, and request again — an unbounded loop that could run up latency and spend. The fix: the orchestrator enforces a single tool-call maximum — one tool request, one follow-up call with the result, then a final answer. No unbounded loops.
The MOCK response matcher was order-sensitive in a way that silently broke tests. MOCK responses are selected by substring-matching the prompt, and the finance_v2 "interpret" branch had to be checked before the finance_v1 branch — the strings "dcf"/"valuation" in a v1 request would otherwise swallow a v2 interpret request and return the wrong shape. The fix pinned the interpreter branch first (marked "MUST BE FIRST" in the code) so MOCK mode stays faithful to what PROD would return.

LLMs for narrative.
Python for the math.

Deterministic DCF engine with AI-augmented research — LLM never touches the math, only the narrative

Replace the spreadsheet without introducing the opposite failure mode: numbers that look plausible but can't be audited.

Four specialised agents, typed Pydantic boundaries, zero LLM involvement in the computation path.

Why these choices — each enforces the AI/computation separation as a hard constraint, not a soft guideline.

Full Docker Compose stack · n8n + FastAPI + OpenAI SDK · MOCK and PROD modes.

What broke during build — and what I changed to fix it.

How a DCF analysis
actually runs.

Run a DCF analysis.
Watch the agents.

ai-os-finance / data → valuation → narrative

AI OS Finance

LLMs for narrative.Python for the math.

Deterministic DCF engine with AI-augmented research — LLM never touches the math, only the narrative

Replace the spreadsheet without introducing the opposite failure mode: numbers that look plausible but can't be audited.

Four specialised agents, typed Pydantic boundaries, zero LLM involvement in the computation path.

Why these choices — each enforces the AI/computation separation as a hard constraint, not a soft guideline.

Full Docker Compose stack · n8n + FastAPI + OpenAI SDK · MOCK and PROD modes.

What broke during build — and what I changed to fix it.

How a DCF analysisactually runs.

Run a DCF analysis.Watch the agents.

ai-os-finance / data → valuation → narrative

LLMs for narrative.
Python for the math.

How a DCF analysis
actually runs.

Run a DCF analysis.
Watch the agents.