How an AI system is assembled, served, and bounded — from the appliance level down to individual protocols.
- On-prem on-premise
- Software or hardware that runs inside an organisation's own data centre or private cloud, rather than in a third-party public cloud. The architectural opposite of SaaS.
- Appliance
- A pre-integrated hardware-plus-software unit shipped as a single product. Network firewalls, storage arrays, and our AI platform are all appliances — the customer doesn't assemble the components.
- API gateway
- A service sitting between clients and backend AI models, handling authentication, rate limiting, routing, logging and observability. In our stack: LiteLLM exposing OpenAI-compatible endpoints.
- Endpoint
- A specific URL at which an API can be called (e.g.
/v1/chat/completions). "OpenAI-compatible endpoints" means the URLs and request format match what OpenAI's API expects, so existing clients work unchanged.
- RAG
- Retrieval-Augmented Generation. An architecture that retrieves relevant documents from a knowledge base and injects them into the model's prompt at query time, grounding responses in trusted data rather than the model's pretrained knowledge alone.
- MCP
- Model Context Protocol. An open standard introduced by Anthropic that lets AI applications connect to external tools, data sources and services through a uniform interface — replacing ad-hoc integrations with one wire protocol.
- Tier model T1 / T2 / T3
- The LLM Machines component-tier system. T1 = certified core, host-privileged; T2 = restricted containers; T3 = sandboxed with no host access. Each component carries a signed tier badge enforced by the manifest.
- Pure Mode
- A one-click admin action on the appliance that disables every T2 and T3 component, leaving only the certified core running. Used during security incidents, audits or support diagnosis.
- Manifest enforcement
- A signed declaration of what each component is, which tier it belongs to, and what privileges it may request. The appliance refuses to load anything not on the manifest.
- Certified core
- The set of components that ship with the appliance, are signed by LLM Machines, and run at tier T1. Includes the gateway, inference layer, vector store and orchestrator.
- Agent agentic workflow
- An AI system that plans, decides and calls tools to accomplish a multi-step task — as opposed to a single prompt-response interaction. Agentic systems use tool calling and often run multiple LLM steps per user request.
- Tool calling function calling
- An LLM capability where the model emits structured calls to external functions or APIs (e.g. search the database, send the email), enabling automation beyond pure text generation.
- Workflow orchestration
- Coordinating multiple AI calls, tool executions and human approvals in a defined sequence. n8n and similar engines provide visual workflow building inside our appliance.
- Sandboxing
- Running code in an isolated environment with restricted system access. T3 components in our stack are sandboxed so they can't see other components' data or escape to the host.
- High availability HA
- System design that tolerates component failure without service interruption, usually via redundancy and automatic failover. Required for enterprise SLAs.
- LiteLLM · AnythingLLM · vLLM · SGLang · open-webui · Presidio
- Stack components we ship at T1. LiteLLM is the API gateway; AnythingLLM is the RAG and chat surface; vLLM and SGLang are inference engines; open-webui is a chat UI; Microsoft Presidio handles PII detection and redaction.