local sovereign ai

Run your own LLMs.
Privately/cheaply/forever.

Enclave is a self-hosted inference stack for the boxes you already own. OpenAI-compatible API, GGUF backend, multi-agent workflows, retrieval pipeline. No telemetry. No cloud. No subscription.

7B inference
40–50 t/s
Models catalogued
18+
API surface
16 routers
Bytes leaving the host
0
v v1.1.0

What's new this release.

RELEASE NOTES
A

Architecture-aware dispatch

Detection
Per-host detection of memory topology (unified / single-GPU / multi-GPU) and deployment shape (DMG / Container / HostNative). Pluggable Architecture + Deployment protocols.
Dispatch
Tick-based parallel DAG dispatcher: each tick asks the detected arch which steps to run concurrently vs defer. Falls back to single-step-per-tick on workflows without depends_on.
B

Per-step keep_alive + telemetry

Resolution
Four-tier resolver: step config > workflow default > arch-detected default > OLLAMA_KEEP_ALIVE. Defaults are 30m on CPU (reload dominates), 0 on single-GPU NVIDIA (VRAM is scarce).
Observability
Ollama load / prompt-eval / eval durations captured per workflow step. Live snapshot at /api/system/architecture.
C

Installable Python package

Surface
Wheel + sdist attached to every release. Console scripts enclave, enclave-api, enclave-chat, enclave-workflow.
Install
pip install <wheel-url> — no PyPI account required.
D

GHCR mirror, tarball, n8n

Container
Every Docker Hub push mirrored to ghcr.io/hankthebldr/enclave. Linux source tarball with SHA-256 + SHA-512 sidecars on every stable release.
Automation
Importable n8n workflow under workflows/n8n/ drives release prep through a local Ollama instance — no cloud LLM calls.

Read the full CHANGELOG entry · explore the Wiki · jump to Download.

man.7 enclave

What it actually does, with no marketing fog.

SECTIONS 01—06
01

OpenAI-compatible

Synopsis
POST /v1/chat/completions
Description
Drop-in replacement. Point existing SDKs at localhost:8000; streaming, function-calling, and embeddings all wire through the same surface.
02

CPU-first inference

Backend
Ollama + GGUF (Q4_K_M default, Q5_K_M for quality, Q3_K_M for 70B+).
Throughput
7B ≈40-50 t/s · 13B ≈25-30 t/s · 34B ≈10-15 t/s on commodity x86.
03

Zero telemetry

Network
No outbound calls except model pulls you initiate. No analytics, no crash reports, no phone-home.
Audit
Source-available. Read the egress paths yourself.
04

Multi-agent workflows

Format
Declarative .yaml DAG with role-based model selection.
Engine
Jinja2 prompts, output parsers, quality gates, checkpoint/resume, six-phase hook lifecycle.
05

Retrieval pipeline

Stack
Chroma vector store, semantic chunker, document service.
Storage
Local. Embeddings never leave the box.
06

Native macOS app

Build
py2app + pywebview, DMG pipeline with CI boot-probe gate.
First run
Setup wizard installs Ollama and pulls a starter model. Linux supported via direct install + Docker stack.
07

Architecture-aware (1.3.0+)

Detection
Per-host memory topology (unified / single-GPU / multi-GPU) and deployment shape (DMG / Container / HostNative).
Dispatch
Tick-based parallel DAG dispatcher with feasibility validation. Ollama timings captured per step.
download

Grab the build. Run it locally.

v v1.1.0

macOS app

DMG · no fee
  • Native pywebview window
  • First-run setup wizard installs Ollama
  • Pulls a starter model on first launch
  • Local-only — no account required

Docker

Container · any platform
  • Two-container stack: ollama + api
  • Mirrored to Docker Hub and GHCR — no Hub account required
  • Trivy-scanned on every push
  • One-command boot via ./run.sh
Docker Hub

docker pull ghcr.io/hankthebldr/enclave:v1.1.0 · deployment guide

Python (pip)

Wheel · embed it
  • Installable wheel attached to every release
  • Console scripts: enclave, enclave-api
  • Use the engine inside your own service
  • No PyPI account required
Get the wheel

pip install <url>.whl · changelog

Source

Clone · build it yourself
  • Full source on GitHub
  • Python 3.12 / 3.13 supported
  • Linux + macOS install scripts
  • Source tarball + SHA-256 / SHA-512

First launch — gatekeeper note

The DMG ships unsigned today. After dragging to /Applications, run once to clear the quarantine flag:

xattr -dr com.apple.quarantine /Applications/Enclave.app

Then launch Enclave — the native window opens the first-run wizard, which installs Ollama and pulls a starter model.