Conversational AI is now genuinely good at the front of a clinical interaction — it listens to a person describe their symptoms, asks follow-ups, and sounds calm and human while doing it. That competence is seductive, and it invites a dangerous design: let the conversational model also make the triage call.
We do not believe that is safe. This thread is about an architectural answer — separate the conversation from the decision — and the auditability you buy by refusing to put both in the same model.

A model optimized to sound reassuring is being pulled by a force that has nothing to do with correct triage. When that same model also decides who is seen first, the pressure to be fluent and the obligation to be right are running in one set of weights — and you cannot tell them apart after the fact.
So the research question is architectural, not conversational: in a safety-critical decision system, where should the decision actually live — and what do you gain by refusing to put it in the same model that runs the conversation? Put plainly, it is a question of knowing which knob to turn. Fluency and correctness are different knobs; conflating them means you can no longer tune either one.
Split the system into two cooperating parts with sharply different jobs. One owns the conversation. The other owns the decision and the state behind it. They are deliberately not the same system.
Owns the human interaction. Listens and speaks in natural voice, manages turn-taking, keeps the person comfortable. Optimized for fluency and latency. Crucially, it does not own the decision and does not own the state — it feeds observations upstream.
Owns the triage decision and all of the state behind it. Grounded in a recognized clinical triage protocol — every decision is a decision within that protocol, not a free-form generation. It decides what to ask next, what the disposition is, and persists the record.
The frontend can be a strong realtime conversational model; the supervisor can be a different model entirely, chosen because it reasons well over structured protocol state. Provider and model are an implementation detail behind the split — the architecture is what carries the safety property, not any one model.
This is the heart of the thread, and the argument generalizes well beyond healthcare. Four properties fall out of the split — each one a thing you cannot get when the same model both talks and decides.
Because the supervisor decides within a protocol and owns the state, every disposition traces back to a protocol step. You can ask "why was this person routed here?" and get a grounded answer — not a post-hoc rationalization of a chat transcript.
A model optimizing to sound reassuring is under a subtle pressure unrelated to correct triage. Keeping the decision in a separate model that never sees "be reassuring" as an objective removes that failure mode by construction, not by hope.
When the conversational layer is stateless with respect to the decision, there is no ambiguity about what the system currently believes. The supervisor is the single source of truth — exactly the property you want when the cost of an inconsistent state is a missed urgent case.
The voice frontend and the decision supervisor can be upgraded, swapped, or re-grounded on a new protocol without entangling the other. Conversation quality and decision quality are different engineering problems and should be solved separately.
The general principle is worth stating plainly: in a safety-critical system, the component that talks to the human and the component that makes the decision should be different components — and the decision component should be grounded in an explicit, auditable protocol rather than in open-ended generation.
Conflating them trades away the very auditability that makes the system deployable at all. The split is not extra machinery; it is the thing that lets you ship.
A model that is good at talking should not be the model that decides who is seen first. Separate the two, and a safety-critical system becomes one you can actually trust.
Two directions are open. First, generalizing the supervisor pattern beyond triage to other domains where a pleasant conversation must sit in front of a decision that has to be right and explainable. Second, tightening the protocol-grounding so the supervisor's reasoning is not just auditable after the fact but constrained to the protocol by design.
The further we push the second, the closer the system gets to a property we care about across all of our work: the human stays at the gates and the judgment calls, and the machine is provably inside its lane.

The supervisor pattern is the same instinct behind our voice-intake and conversational-forms products — a fluent front end wired to a system that owns the truth. The protocol-grounding work connects to our research on retrieval over regulatory corpora, where decisions must trace back to an explicit source. And the human-at-the-gates boundary is the through-line of our case work, from credit intelligence to clinical decisions.