QIServer Private AI Node
Purpose
QIServer is the current machine-local QiOS compute node for private AI access, local model execution, and future ingestion/runtime work. It exists to offload AI and document-processing workload from the primary workstation and to serve as the first live QiOS local runtime host.
Why This Node Exists
This node is the practical execution host for the local-runtime side of QiOS:
- private AI chat access
- local model hosting
- future ingestion and extraction
- future chunking and embeddings
- future local control-plane endpoints
- future graph projection support
It is not the canonical source of truth for records. Canonical truth remains in the cloud-side data model. Derived layers remain downstream.
Current Role
Active now
- Tailscale-connected private server
- Ollama running on host
- Open WebUI running in Docker
- Neo4j running in Docker
- Tailscale Serve publishing Open WebUI privately to the tailnet
Planned next
- local Python API
- inbox watcher
- extraction pipeline
- deterministic chunking
- embedding worker
- Conditional/Future: Supabase/pgvector upsert
- Neo4j projection
- queue/retry/status endpoints
Runtime Position in QiOS
QIServer implements the local runtime side of QiOS, not the canonical cloud data layer.
Local runtime responsibilities
- file watcher
- ingest pipeline
- OCR
- extraction
- chunking
- embeddings
- local API
- machine-local state
Cloud-side responsibilities
- canonical metadata
- retrieval index in pgvector
- app-facing APIs
- review surfaces
- Conditional/Future: tenant-aware data serving
Boundary rule
Local runtime writes registrations, metadata, and embeddings outward. Cloud runtime serves application surfaces. Graph, vector, and AI layers do not become canonical truth.
Current Access
Private access surface
- Open WebUI:
https://qiserver-1.cerberus-sirius.ts.net
Machine identity
- OS hostname:
qiserver - Tailscale machine:
qiserver-1 - Tailscale IPv4:
100.121.111.106
Current Local Services
Host service
- Ollama API:
http://127.0.0.1:11434
Docker services
- Open WebUI:
http://127.0.0.1:3000 - Neo4j Browser:
http://127.0.0.1:7474 - Neo4j Bolt:
bolt://127.0.0.1:7687
Current Service Topology
Host
- Ubuntu server
- Tailscale
- Ollama
Containers
- Open WebUI
- Neo4j
Network pattern
- services bind locally
- Tailscale Serve proxies Open WebUI privately to the tailnet
- raw backend services are not publicly exposed
Current Paths
Server paths
- QiOS root:
/srv/qios - compose:
/srv/qios/compose - server runbook:
/srv/qios/docs/000_RUN_ME_FIRST.md
Data paths
- data root:
/srv/qidata - inbox:
/srv/qidata/inbox - processing:
/srv/qidata/processing - reviewed:
/srv/qidata/reviewed - failed:
/srv/qidata/failed - manifests:
/srv/qidata/manifests - extracted text:
/srv/qidata/extracted_text - embeddings cache:
/srv/qidata/embeddings_cache - logs:
/srv/qidata/logs - model cache:
/srv/qidata/model_cache - exports:
/srv/qidata/exports
Current Models
Chat model
llama3.2:latest
Embedding model
embeddinggemma:latest
Model Rule
llama3.2 is the chat/default interaction model.
embeddinggemma is not a general chat default. It is reserved for embedding, retrieval, and vectorization work.
Architectural Constraints
- This node is compute infrastructure, not canonical record authority.
- Graph and vector outputs are derived.
- No ingestion flow should bypass canonical registration.
- No downstream layer should redefine identity.
- Runtime memory for operation lives both here and in the machine-local runbook.
Relationship to Existing Blueprint Sections
This node operationalizes the following already-defined blueprint concepts:
- local runtime
- local API
- embeddings as local subprocess
- Neo4j as derived graph
- Local Admin Control Plane
- Spine milestone: local inbox -> registration -> extraction -> embedding -> retrieval
This file does not replace those sections. It records the concrete live node now implementing them.
Operational Re-entry
If operator context is lost, begin with:
/srv/qios/docs/000_RUN_ME_FIRST.mdbash /usr/local/bin/qiserver-status
Immediate Next Build
- create local Python API structure
- implement
/status,/queue,/ingest,/retry - build inbox watcher against
/srv/qidata/inbox - extract text into canonical pipeline flow
- chunk deterministically
- call Ollama embeddings endpoint
- upsert to canonical retrieval layer
- project derived graph records into Neo4j
Change Log
2026-04-19
- Ubuntu server brought online
- Tailscale configured
- Docker installed and running
- Ollama installed and serving locally
llama3.2pulledembeddinggemmapulled- Open WebUI deployed in Docker
- Neo4j deployed in Docker
- Tailscale Serve configured for private Open WebUI access
- server-local operator runbook created
- status command created
Active Runtime
- qiserver is the current active runtime.
Path Doctrine
/srv/qios/repos: For cloned Git repos and coding work./srv/qios/stacks: For Docker Compose runtime stacks. Do not create nested Git repos inside/srv/qios/stacks./srv/qios/data: For persistent app data.
Service / Runtime Facts
- NocoDB: Runs locally at
127.0.0.1:8088. - Open WebUI: Runs locally at
127.0.0.1:3000. - Private Server Launcher (gethomepage): Runs locally at
127.0.0.1:3001. Warning: This is for local/tailnet use only and is separate from the publicaccess.qially.comportal. - Portainer: Runs locally at
127.0.0.1:9443and is an admin service. - Ollama: Installed on qiserver and locked to
127.0.0.1:11434.