DEV.co
Custom ChatGPT / LLM Apps

A branded AI assistant that knows your business.

Not a thin wrapper around an API key — a production LLM application grounded in your data, scoped to your use case, and wrapped in the guardrails, evals, and cost controls that keep it reliable.

OpenAI · Anthropic · open-weight models · streaming · tool calling · eval-first delivery

The demo took a weekend. Production takes engineering.

Anyone can wire a chat box to an LLM API. The gap between that demo and an assistant your customers or employees rely on is where most projects stall.

Production LLM apps need grounding so they don't make things up, guardrails so they stay on-topic and safe, streaming and latency work so they feel fast, cost controls so a viral day doesn't bankrupt you, and an eval harness so you can change the prompt without silently breaking quality. That's the work we do.

What a production LLM app actually includes.

01

Grounding

Retrieval over your docs, database, or APIs so answers are based on your data — not the model's training set.

02

Tool calling

The assistant can look things up and take actions through functions you control, with permissions.

03

Guardrails

Input/output filtering, topic boundaries, PII handling, and refusal behavior tuned to your risk tolerance.

04

Streaming UX

Token streaming, stop/regenerate, citations, and a chat interface that feels native to your product.

05

Evals

A golden dataset and automated scoring so prompt and model changes are measured, not guessed.

06

Cost + observability

Per-conversation cost tracking, caching, rate limits, and full request logging.

Weekend wrapper vs. production app.

ConcernAPI wrapperWhat we ship
AccuracyHallucinates on your dataRetrieval-grounded with citations
SafetyWhatever the base model doesTuned guardrails + refusals
CostUnboundedCaching, limits, per-user budgets
Quality over timeDrifts silentlyEval harness blocks regressions
ActionsChat onlyPermissioned tool calling

Ways to engage.

Prototype
2–3 weeks
from $18,000
  • Working grounded assistant
  • One data source
  • Deployed to a staging URL
Start a Prototype
Production App
6–10 weeks
from $60,000
  • Grounding + tool calling
  • Guardrails + eval harness
  • Cost controls + observability
  • 30-day support
Start a Build
Ongoing
monthly
from $9,000/mo
  • Eval review + tuning
  • Model migrations
  • New capabilities
Discuss Operations
Show, don't tell

Streaming, tools, and guardrails — wired in.

Not a chat box around an API key: permissioned tool calls, topic and PII guardrails, and grounded citations.

assistant.tstypescript
const stream = await assistant.run({  messages,  tools: [searchDocs, createTicket],          // permissioned actions  guardrails: { topics: ["billing", "product"], pii: "redact" },})for await (const token of stream) {  send(token)                                 // stream to the client}// every answer carries grounded, verifiable citations
Streamed reply
Your plan renews on Apr 12. Want me to switch
you to annual billing? [pricing.md]
tool_call: createTicket — awaiting approval

The model is treated as untrusted: tools are permissioned, output is validated, and anything irreversible waits for a human.

Quality that holds

An eval harness so you can change things safely.

The reason most LLM apps quietly degrade is that no one can measure quality. We ship a golden dataset and automated scoring from day one.

Swap the model, tune the prompt, change the retrieval — and know within minutes whether quality went up or down, before it reaches a customer.

Start an LLM app

Common questions.

Which model should we use?
Whichever wins on your evals and budget. We benchmark frontier and open-weight models on your actual task rather than defaulting to one vendor.
Can we self-host the model?
Yes — for data-sensitivity or cost reasons we deploy open-weight models in your environment. See our private LLM service.
How do you stop it hallucinating?
Retrieval grounding, strict 'answer from context' prompting, citations, and an eval that catches regressions before they ship.
What about prompt injection?
Input sanitization, tool-permission scoping, and output validation. We treat the LLM as untrusted and design around it.

Tell us what your assistant should do.

A 30-minute call on the use case, the data it needs, and the quality bar. We'll tell you honestly what it takes to ship.