Glossary — LLM Machines

AI & Models.

The model layer — what is being run, how it represents text, and the tuning vocabulary around it.

LLM: Large Language Model. A neural network trained on massive text corpora to predict the next token in a sequence. Today's chat assistants are LLMs running with instruction tuning and safety layers on top.
Open-weight model: A model whose trained parameters are publicly released, so the model can be downloaded, audited, fine-tuned and run locally. Examples: Llama, Mistral, Qwen, DeepSeek. Contrast with proprietary models accessible only through a vendor's API.
Token tokenization: The atomic unit a language model reads and writes. One token is roughly ¾ of an English word. Tokenization is the preprocessing step that splits text into these units; pricing on cloud LLM APIs is per-token.
Context window: The maximum number of tokens a model can consider at once when generating a response. Larger context windows let the model reason over longer documents but raise memory and latency costs.
Embedding: A numeric vector representation of text (or any data) that captures semantic meaning. Embeddings power semantic search, RAG retrieval, clustering and de-duplication.
Vector database vector store: A specialised database that stores embeddings and supports nearest-neighbour search. Required infrastructure for RAG. Examples: Qdrant, Weaviate, Milvus, pgvector.
Inference: Running a trained model to produce output, as opposed to training, which produces the model in the first place. On-prem inference means the model is executed on your own hardware.
Fine-tuning: Adapting a pre-trained model to specific tasks or domain language by continuing training on a smaller, targeted dataset. Distinct from prompt engineering, which shapes input rather than weights.
LoRA / QLoRA: Low-Rank Adaptation. A parameter-efficient fine-tuning method that trains a small set of additional weights instead of updating the full model — drastically cutting compute and storage. QLoRA adds quantisation for further savings.
Quantization: Reducing the numerical precision of model weights (for example, from 16-bit to 4-bit) to shrink model size and accelerate inference, usually with a small accuracy trade-off.
Hallucination: When a model generates plausible-sounding but factually incorrect output. The main reason production AI systems use grounding techniques like RAG.
Temperature: A sampling parameter that controls randomness in model output. Zero produces deterministic, focused responses; higher values produce more creative or varied output.
Llama · Mistral · Qwen · DeepSeek: Major families of open-weight LLMs from Meta, Mistral AI, Alibaba and DeepSeek respectively. The four that most enterprises evaluate first when planning local inference.

Architecture & Infrastructure.

How an AI system is assembled, served, and bounded — from the appliance level down to individual protocols.

On-prem on-premise: Software or hardware that runs inside an organisation's own data centre or private cloud, rather than in a third-party public cloud. The architectural opposite of SaaS.
Appliance: A pre-integrated hardware-plus-software unit shipped as a single product. Network firewalls, storage arrays, and our AI platform are all appliances — the customer doesn't assemble the components.
API gateway: A service sitting between clients and backend AI models, handling authentication, rate limiting, routing, logging and observability. In our stack: LiteLLM exposing OpenAI-compatible endpoints.
Endpoint: A specific URL at which an API can be called (e.g. /v1/chat/completions). "OpenAI-compatible endpoints" means the URLs and request format match what OpenAI's API expects, so existing clients work unchanged.
RAG: Retrieval-Augmented Generation. An architecture that retrieves relevant documents from a knowledge base and injects them into the model's prompt at query time, grounding responses in trusted data rather than the model's pretrained knowledge alone.
MCP: Model Context Protocol. An open standard introduced by Anthropic that lets AI applications connect to external tools, data sources and services through a uniform interface — replacing ad-hoc integrations with one wire protocol.
Tier model T1 / T2 / T3: The LLM Machines component-tier system. T1 = certified core, host-privileged; T2 = restricted containers; T3 = sandboxed with no host access. Each component carries a signed tier badge enforced by the manifest.
Pure Mode: A one-click admin action on the appliance that disables every T2 and T3 component, leaving only the certified core running. Used during security incidents, audits or support diagnosis.
Manifest enforcement: A signed declaration of what each component is, which tier it belongs to, and what privileges it may request. The appliance refuses to load anything not on the manifest.
Certified core: The set of components that ship with the appliance, are signed by LLM Machines, and run at tier T1. Includes the gateway, inference layer, vector store and orchestrator.
Agent agentic workflow: An AI system that plans, decides and calls tools to accomplish a multi-step task — as opposed to a single prompt-response interaction. Agentic systems use tool calling and often run multiple LLM steps per user request.
Tool calling function calling: An LLM capability where the model emits structured calls to external functions or APIs (e.g. search the database, send the email), enabling automation beyond pure text generation.
Workflow orchestration: Coordinating multiple AI calls, tool executions and human approvals in a defined sequence. n8n and similar engines provide visual workflow building inside our appliance.
Sandboxing: Running code in an isolated environment with restricted system access. T3 components in our stack are sandboxed so they can't see other components' data or escape to the host.
High availability HA: System design that tolerates component failure without service interruption, usually via redundancy and automatic failover. Required for enterprise SLAs.
LiteLLM · AnythingLLM · vLLM · SGLang · open-webui · Presidio: Stack components we ship at T1. LiteLLM is the API gateway; AnythingLLM is the RAG and chat surface; vLLM and SGLang are inference engines; open-webui is a chat UI; Microsoft Presidio handles PII detection and redaction.

Privacy, Security & Identity.

Concepts that decide where data sits, who can reach it, and what proof exists that nothing escaped.

PII: Personally Identifiable Information. Data that can identify an individual — name, email, national ID, IP address, in some cases more. GDPR's central concept and the first thing PII-redaction layers look for.
Pseudonymisation vs anonymisation: Pseudonymisation replaces identifiers with reversible tokens — it remains personal data under GDPR. Anonymisation removes identifiers irreversibly and the result is no longer personal data. Most "anonymisation" in industry is actually pseudonymisation.
Data residency: A requirement that data physically resides in a particular geographic location. Often confused with sovereignty — residency is about where bytes sit; sovereignty is about which legal regime governs them.
Data sovereignty: The principle that data is subject to the laws and jurisdiction of the country where it is collected or processed. For EU data, sovereignty means EU law applies and non-EU governments cannot compel disclosure.
Sovereign AI: AI systems where the model, the infrastructure it runs on, and the data flowing through it remain under the user's jurisdictional and operational control. The product category LLM Machines sells.
Air-gapped: A deployment with no network connection to external systems. Air-gapped AI runs purely on internal data and updates ship via removable media. Used in defence, intelligence and critical infrastructure.
Zero-trust architecture: A security model that verifies every request as if it came from an untrusted network, regardless of origin. The maxim: never trust, always verify.
RBAC: Role-Based Access Control. Permissions are assigned to roles (e.g. "engineer", "admin"); users inherit the permissions of their roles. The standard enterprise authorisation model.
SSO: Single Sign-On. A user authenticates once with a central identity provider and accesses many applications without re-entering credentials.
SAML · OIDC: SAML is the XML-based SSO protocol enterprises typically use with legacy identity providers. OIDC (OpenID Connect) is the modern JSON-based equivalent built on OAuth 2.0; preferred for new applications.
Identity federation: A trust relationship between systems that lets a user authenticated by one system be recognised by another, without sharing passwords. The mechanism behind SSO.
mTLS: Mutual TLS. Standard TLS proves the server's identity to the client; mTLS additionally proves the client's identity to the server, using certificates on both sides. Common between trusted internal services.
Vault: A service that stores secrets (API keys, passwords, certificates) encrypted at rest and accessible only through audited APIs. The appliance ships with an on-box vault for all credentials.
Audit log: An append-only record of significant system events — who did what, when, against which resource. Required by GDPR, NIS2, SOC 2 and most other compliance regimes.

Regulations & Compliance.

The EU framework that increasingly shapes which AI architectures are acceptable, plus the global standards enterprise buyers check first.

EU AI Act: The European Union's regulation classifying AI systems by risk and imposing obligations on developers and deployers, especially for high-risk uses. In force from 2024 with phased enforcement through 2026–2027. The first comprehensive horizontal AI law in any major jurisdiction.
GDPR: General Data Protection Regulation. The EU's comprehensive data-protection law, effective 2018. Establishes rights for data subjects and obligations for controllers and processors; fines can reach 4% of global annual turnover.
NIS2: Network and Information Security Directive 2. The EU's cybersecurity framework for essential and important entities. Adds board-level accountability and supply-chain security requirements. Must be transposed into national law by each member state.
EU Data Act: Regulation on harmonised rules for access to and use of data, in force from 2024. Targets cloud lock-in by mandating portability and switching rights for cloud services.
Schrems II: The 2020 Court of Justice of the EU decision that invalidated the EU-US Privacy Shield. Established that EU personal data sent to the US is not adequately protected against US surveillance, making most transatlantic data transfers legally fragile.
DPA: Data Processing Agreement. The contract between a data controller and a data processor mandated by GDPR Article 28. Note: DPA also stands for Data Protection Authority — the national regulator. Context determines meaning.
DPIA: Data Protection Impact Assessment. A GDPR-mandated risk assessment performed before deploying systems that handle personal data at scale or with elevated risk.
Data controller · Data processor: Under GDPR, the controller decides why and how personal data is processed; the processor acts on the controller's instructions. AI vendors are usually processors; their enterprise customers are usually controllers.
Right to be forgotten: A GDPR right (Article 17) allowing data subjects to require deletion of their personal data under specified conditions. Creates real engineering challenges for AI systems trained on personal data.
SOC 2: Service Organization Control 2. A US audit framework focused on five trust principles: security, availability, processing integrity, confidentiality and privacy. The common B2B SaaS compliance baseline.
ISO 27001: The international standard for information security management systems. Required by many enterprise and government procurement processes.

Operations & Deployment.

The vocabulary of getting an AI appliance from contract to live service, and the commercial concepts shaping the buy decision.

Discovery: The opening phase of an enterprise deployment, where we map your environment, identity provider, network topology, data sources and SLA expectations. Drives sizing and configuration decisions downstream.
Sizing: Calculating the hardware footprint (GPU count, RAM, storage) needed to meet your performance and concurrency targets.
Pre-flight check: Validating that all prerequisites — network, identity, storage, power — are in place before the appliance ships.
Smoke test: A fast end-to-end test confirming a freshly installed system performs core functions correctly. Not a full QA suite — just enough to catch broken deployments early.
Self-test: Built-in diagnostics that the appliance runs on boot to verify hardware health and software integrity.
Pre-shipment: Everything that happens before hardware leaves our integration facility: license binding, image preparation, signed manifest assembly.
Pass-through hardware pass-through: Selling hardware to the customer at the manufacturer's price with no markup. We pass through Supermicro hardware at 0% commission so your infrastructure investment stays with you.
Retainer: A recurring service contract covering support, updates and platform engineering hours. Distinct from the one-time setup fee.
SLA: Service-Level Agreement. The contractual commitment to specific availability, response-time or performance targets.
OPEX vs CAPEX: Operating expenses (recurring, e.g. cloud bills) versus capital expenditure (one-time investments in owned assets, e.g. hardware). Sovereign on-prem AI shifts cloud OPEX into owned CAPEX, with operational consequences for accounting and planning.
Per-token billing: Pricing AI usage by the number of tokens processed. The standard cloud LLM model — and the source of most "AI cost surprise" stories at scale.
Vendor lock-in: Dependence on a single vendor's platform such that switching becomes prohibitively expensive or technically infeasible. The EU Data Act explicitly targets cloud lock-in.
Pilot proof of concept: A bounded, time-limited deployment to validate the appliance against your real workloads before committing to a full rollout.

Glossary.

AI & Models.

Architecture & Infrastructure.

Privacy, Security & Identity.

Regulations & Compliance.

Operations & Deployment.

Want this in your stack?