Discovery and sizing.
Map users, workloads, model needs, data sources, compliance constraints and expected capacity before hardware is finalised.
A tailored deployment, not a SaaS sign-up. We size the appliance, install the stack, federate identity, wire connectors, validate workloads, train admins and hand off a live sovereign AI system.
The same deployment spine applies whether the appliance lands on-premise, in your private cloud, or in our Croatian data centre.
Map users, workloads, model needs, data sources, compliance constraints and expected capacity before hardware is finalised.
Rack, power, license, provision storage, validate GPUs, test network throughput and run local inference smoke tests.
Federate to your identity provider, map admin/user/auditor roles and confirm access boundaries before broader rollout.
Wire approved sources such as docs, ticketing, repositories, chat and storage through the vetted connector catalog.
Run canonical tests across chat, API, RAG, agents, observability, audit logs and Pure Mode with real users.
Train admins and champions, document support boundaries, hand over runbooks and calendar the 30-day check-in.
Every component on the box belongs to one of three tiers, encoded in a signed manifest at install time and surfaced in the admin UI, telemetry, and support tickets.
Firmware, OS, Kubernetes, inference gateway, inference servers, identity, bundled apps, curated connector catalog, blessed models. Runs with host privileges. We own the issues.
Connectors, models, and apps from our partner registry that passed review. Runs in restricted containers. Integration surface ours; partner's internal logic theirs.
Custom connectors, custom apps deployed to K8s, custom models, custom workflows. Sandboxed. No host access. No inbound network unless explicitly granted. You build it, you own it.
A one-click admin action that disables all T2/T3 components. If an issue reproduces in Pure Mode, it's our ticket; if not, it routes to client (or partner) with a clear "components involved" report. This is the contract that lets us scale support.
Every phase has explicit owners, exit criteria, and a "Tailored to: …" thread that names what gets customised for your stack. Nothing about the install is a surprise.
Discovery questionnaire — your IdP, network topology, target integrations, user count, model preferences, compliance constraints (HIPAA, SOC2, FedRAMP, air-gap). Network pre-flight. Power & rack spec. License key bound to your hardware. Pre-flight call walks the questionnaire line by line. Sign-off in writing.
Tailored to: your IdP type, compliance regime, target integrationsReceiving inspection — serial matches manifest, no physical damage, tamper seals intact. Rack mount, redundant power, network uplinks (mgmt + data VLANs). BMC / out-of-band management configured. Field engineer verifies remote console access. Boot to firmware check screen, confirm against the certified manifest.
Tailored to: your data-centre layout, network VLANsFirst-boot wizard: hostname, time zone, NTP, initial admin credentials (rotated after IdP federation). License activation — online (signed token returned) or offline (signed bundle uploaded; default for security-sensitive clients). Storage volumes provisioned. Self-test: GPU enumeration, NVMe health, network throughput, inference smoke test.
Tailored to: your security posture (online vs air-gapped)TLS certs installed (your CA or Let's Encrypt with internal ACME). DNS records pointed at the box for the chosen subdomain. Reverse proxy configured with routes for each app surface. Keycloak federated to your IdP via OIDC or SAML. SCIM provisioning verified with a real user/group sync. Role mapping confirmed: Admin / User / Auditor / Read-Only land in the right places.
Highest-value step — adoption stalls if this is wrongModels loaded per the questionnaire — chat, code, embeddings — pulled from on-box signed registry (no internet required for T1 models). SGLang / vLLM inference servers registered with the gateway. LiteLLM configured: model aliases, per-team rate limits, per-team budgets, audit logging on. Bundled apps online: LibreChat, the local RAG workspace, openclaw / nemoclaw. Optional add-ons (Continue, Tabby, Langfuse and workflow tooling) per scope.
Tailored to: your model preferences, per-team budgetsSlack / Teams, email, source control (GitHub / GitLab / Bitbucket), issue tracking (Jira / Linear / Asana), docs (Notion / Confluence / SharePoint), CRM, storage. Each one auth'd, scoped (read-only by default for the first 30 days), and smoke-tested with a canned prompt before sign-off. Anything not on the curated catalog is logged as a T3 candidate.
Tailored to: your existing tool stackEnd-to-end tests across every surface — chat, IDE, inbound integration, outbound integration, governed workflow, agentic, observability, audit, Pure Mode toggle. Each one driven by a real user from your team, not the engineer. The test set is canonical — kept identical across every deployment so it's reproducible.
Real users, not engineers — the test must be reproducibleAdmin training (60 min): user/group management, model registry, connector lifecycle, Pure Mode, audit export, update channels, backup/restore, the tier model. End-user kickoff (30 min, recorded). Client-specific runbook delivered with your values filled in. Support channel established. On-call escalation paths confirmed on both sides. Update strategy chosen.
Tailored to: your team's roles, your support workflowCalendar-locked at handoff. Usage review — active users, prompts/day per surface, top use cases, idle surfaces. Connector review — anything failing, anything underused, anything missing. Tier-boundary review — has anyone built T3 stuff yet? If yes, are the support implications understood? Roadmap conversation. Adoption numbers are healthy — or there's a written remediation plan.
Continuous partnership, not transactional handoffExplicit ownership for every onboarding activity. R = Responsible (does the work) · A = Accountable (signs off) · C = Consulted · I = Informed.
| Activity | Us | Client IT | Tool Owner | End User |
|---|---|---|---|---|
| Pre-flight & questionnaire | A / R | R | C | I |
| Hardware rack & power | C | R / A | I | I |
| First boot & licensing | R / A | C | I | I |
| Network, TLS, DNS | C | R / A | I | I |
| IdP federation | R | A / R | I | I |
| Inference & app stack | R / A | C | I | I |
| Connector auth & scope | R | C | A / R | I |
| Validation tests | R | C | C | A |
| Admin training | R / A | A | I | I |
| End-user kickoff | R | I | I | A |
| Day-2 ops & T3 additions | C (T1/T2 only) | A / R | A / R | I |
What security, IT and business teams usually need to know before scheduling an on-prem AI rollout.
You need an agreed deployment mode, network details, identity provider access, approved data sources, admin contacts and pilot workloads for validation.
Yes. Security-sensitive deployments can use offline license activation, local model bundles and controlled connector setup without public internet dependency.
The stack is designed for standard enterprise identity federation through OIDC or SAML, with role mapping for admins, users, auditors and read-only access.
Connectors are chosen during discovery based on business value, data sensitivity, credential model and whether they belong in the certified core, partner tier or client sandbox.
A good pilot has real users, clear success criteria, representative internal data and enough risk to test governance without blocking production operations.
Discovery call, sized appliance spec, and a 4–6 week path to live. Same team end-to-end.