Technology — LLM Machines

Can existing OpenAI API clients use LLM Machines?

Yes. The gateway exposes OpenAI-compatible endpoints so applications can point to the appliance instead of a public cloud API, while keeping authentication, logging and routing local.

Can the appliance run air-gapped?

Yes. Security-sensitive deployments can use offline license activation and local model registries so core inference, RAG and application surfaces work without public internet access.

Which models can run locally?

The architecture is designed for open-weight model families such as Llama, Mistral and Qwen, with model choice sized to your hardware, latency and quality requirements.

Where are connector credentials stored?

Connector credentials live in the on-box vault inside your environment. MCP servers and integration services use those credentials locally instead of sending them to our infrastructure.

What does Pure Mode do?

Pure Mode disables T2/T3 custom components and keeps the certified T1 core running. It is useful for incident response, support diagnosis and audit preparation.

What is inside the support boundary?

The signed certified core, tier model, manifest, gateway, inference services and documented T1/T2 components are supported. Client-built T3 extensions remain isolated from the SLA.

On-Prem AI Appliance Architecture

Reference architecture.

Edge / Gateway

Identity & SSO

App Surfaces · user-facing

Inference Gateway

Inference Servers

Tool / Integration Layer

Agentic Layer

Workflow & Orchestration

Data

Observability & Audit

Platform

Hardware · enterprise / industry-grade

Client BYO Sandbox

T1 / T2 / T3 with manifest enforcement.

Kill everything custom. Keep certified core running.

Vetted connectors out of the box.

Build against local AI like a standard API.

Swap the base URL.

Works with developer workflows.

Observable by default.

LiteLLM — Gateway & Router

LibreChat — User Interface

Knowledge RAG Layer — Retrieval Engine

Open Notebook — Research Agent

NemoClaw / OpenClaw — Agentic Framework

Microsoft Presidio — PII Anonymisation

SGLang — Inference Engine

LLM Machines — Integration Layer

Architecture questions.