Skip to content

Data Flow

Canonical Ingestion Flow

SOURCE FILE
    │
    ▼
DEVICE DROP ZONE / ASSIGNED WATCH PATH
    │
    ▼
LOCAL AGENT DETECT
    │
    ▼
RESOLVE DOMAIN (namespace + prefix lookup)
    │
    ▼
REGISTER IN QiARCHIVE
  - assign canonical ID (UUID/ULID)
  - assign short visible code (Q + 6 hex)
  - normalize filename: {domain}_{name}_{QXXXXXX}.ext
  - calculate checksum
    │
    ▼
EXTRACT / INSPECT
  - detect MIME type
  - choose parser
  - extract text
  - OCR if needed
  - store extraction method + raw text
    │
    ▼
ENRICH METADATA
  - infer document type
  - extract entities
  - tag confidence values
  - assign semantic metadata
    │
    ▼
CHUNK
  - split text deterministically
  - assign chunk indices
  - link chunks to archive_id
    │
    ▼
EMBED (local)
  - generate embedding vectors
  - push to pgvector (qiarchive.archive_chunks)
    │
    ▼
INDEX
  - update search index
  - update qigraph.master_index if applicable
    │
    ▼
ROUTE / REVIEW / ACT
  - suggest route based on doc type + confidence
  - human review or auto-confirm
  - finalize placement
  - update archive record

RAG / AI Query Flow

USER QUERY
    │
    ▼
Metadata filter in Postgres (Supabase)
    │
    ▼
Semantic retrieval in pgvector
    │
    ▼ (optional)
Graph expansion in Neo4j / qigraph
    │
    ▼
Assemble evidence with provenance references
    │
    ▼
Generate answer citing archive_id / chunk_id / entity_id

Infrastructure Edge Ingress (Local Node Hosting)

PUBLIC REQUEST (Internet)
    │
    ▼
EDGE PROVIDER (DNS / Policy / Tunnel Endpoint)
    │
    ▼
OUTBOUND TUNNEL CONNECTOR (Local Node)
    │
    ▼
LOCAL REVERSE PROXY (Routing / Auth Check)
    │
    ▼
TARGET SERVICE (App Container / Webhook)
    │
    ▼
INTERNAL DATA SERVICE (Postgres / Vector DB)

Flow Invariants

  • Every record entering the system carries a canonical ID before extraction begins
  • Derived layers (graph, vector, AI) only receive data after the archive ID is assigned
  • Failure at any stage is visible, stateful, and retryable — it never silently drops data
  • Provenance: Every ingest must carry source_device_id, source_agent_id, source_path, and ingest_mode because ingestion happens across nodes.