LiteLLM — Gateway & Router
Unified endpoint for all LLM providers and local models. Usage tracking, rate limiting, cost control.
One appliance. Twelve layers of certified core. A signed manifest enforces what runs where — and Pure Mode keeps anything custom from blocking your SLA.
External entities at the top. The appliance below. The support boundary line cleanly separates certified core from anything you build in T3.
Developers · analysts · support · legal · operations.
Browser · IDE · Slack/Teams · Email · CLI.
Your existing IdP — we federate, never replace.
e.g., Okta · Azure AD · Google · Ping.
Whatever you already use — chat, source control, ticketing, docs, CRM, mail, storage.
TLS termination · reverse proxy · routing · rate-limiting
Traefik · Kong · NGINX
Federated to your IdP via OIDC / SAML — never replaces it · SCIM user provisioning · role mapping (Admin / User / Auditor / Read-Only)
Keycloak · Authentik · Zitadel
OpenAI & Anthropic-compatible API · model routing · per-team budgets · audit logging
LiteLLM
High-throughput model serving · chat · code · embeddings · client fine-tunes · loaded from on-box signed registry
SGLang · vLLM
Vetted MCP catalog (T1) + verified partner connectors (T2). All credentials in on-box vault — they never leave the appliance.
MCP servers · chat · source control · ticketing · CRM · docs · …
Agent runtimes for multi-step tasks · default catalog of agents we configure · client-extensible in T3
openclaw / nemoclaw
Citizen-developer automation + scheduled background workflows
n8n
Vector + RAG store inside knowledge workspace · object storage · cache · optional dedicated DB by agreement
AnythingLLM-managed vectors · MinIO · Redis · (Postgres + pgvector)
LLM tracing · metrics · logs — fully on-prem. No telemetry leaves the box.
Langfuse · Grafana · Loki · Prometheus
Container orchestration · VM management · OS · out-of-band management · signed-update + license daemons
Kubernetes · Portainer · Proxmox · Linux · BMC
Compute · memory · storage · network · power · physical security
Supermicro GPU(s) · CPU · NVMe · 25 / 100 GbE NIC · redundant PSU · TPM · tamper sensors
Custom apps · custom connectors · custom workflows · client-trained models
No host privileges · egress allowlist · isolated secrets · outage here never blocks T1
Defined by you, on your clock — outside our SLA
Every component is signed and labelled. T1 runs with host privileges. T2 in restricted containers. T3 sandboxed with no host access. The admin UI shows tier badges next to every installed component — never ambiguous, never argued.
One-click admin action that disables every T2/T3 component. Use it for security incidents, support diagnosis ("if it reproduces in Pure Mode it's our ticket"), or to keep an audit clean.
Chat, source control, ticketing, docs, CRM, mail, storage — all wired through curated MCP servers. Every credential lives in your on-box vault. Nothing leaves the appliance.
Unified endpoint for all LLM providers and local models. Usage tracking, rate limiting, cost control.
A polished, ChatGPT-like interface for all end users. No training required.
Document ingestion, vector search, and retrieval-augmented generation for enterprise knowledge bases.
AI-powered research and knowledge synthesis. Deep-dive reports generated automatically.
Autonomous agents for complex, multi-step enterprise workflows.
Automatic detection and redaction of sensitive data before it ever reaches a model.
A high-performance engine for running open-weight models locally — pure OSS, no NVIDIA AI Enterprise tax.
The connective tissue that turns these projects into a single, deployable, production-ready appliance. The signed manifest, the tier model, the support boundary, the runbook.
See how the technology lands inside your environment — onboarding, pricing, or just talk to us.