Skip to content

Pipelines

Canonical Ingestion Flow

Every file that enters QiOS must pass through this sequence:

source
→ detect
→ resolve domain / namespace
→ register in QiArchive (assign canonical identity)
→ assign short visible code (Q + 6 hex)
→ normalize filename
→ extract / inspect
→ enrich metadata
→ chunk
→ embed (local)
→ index (pgvector in qiarchive)
→ route / review / act

Pipeline States

Every archive record must carry one of:

State Meaning
detected File seen by watcher
registered Archive record created, canonical ID assigned
normalized Filename normalized per naming contract
extracted Text extracted from file
enriched Metadata populated
chunked Text chunked deterministically
embedded Vectors generated
indexed Pushed to pgvector
review_pending Awaiting human review
routed Placement confirmed
finalized Full lifecycle complete
failed Error state — retryable

Failure Philosophy

Failures must be visible, stateful, retryable, and tied to canonical IDs.

  • Do not silently drop files.
  • Do not overwrite state.
  • Do not advance objects with weak provenance.

Supported Input Paths

  • Watched local inbox (C:/QiData/inbox/)
  • Manual import
  • Synced storage
  • App upload
  • Future: email/connector intake

Subprocess Categories

Subprocess Responsibility
OCR subprocess Extract text from scanned docs
Extraction subprocess Parse text from structured files
Embedding subprocess Generate local vector embeddings
File scanner Detect new files in watched paths
Sync subprocess Reconcile local ↔ cloud state
Graph projection job Push canonical records into qigraph
Retrieval orchestration Coordinate pgvector + Neo4j recall