Pipelines

Canonical Ingestion Flow

Every file that enters QiOS must pass through this sequence:

source
→ detect
→ resolve domain / namespace
→ register in QiArchive (assign canonical identity)
→ assign short visible code (Q + 6 hex)
→ normalize filename
→ extract / inspect
→ enrich metadata
→ chunk
→ embed (local)
→ index (pgvector in qiarchive)
→ route / review / act

Pipeline States

Every archive record must carry one of:

State	Meaning
`detected`	File seen by watcher
`registered`	Archive record created, canonical ID assigned
`normalized`	Filename normalized per naming contract
`extracted`	Text extracted from file
`enriched`	Metadata populated
`chunked`	Text chunked deterministically
`embedded`	Vectors generated
`indexed`	Pushed to pgvector
`review_pending`	Awaiting human review
`routed`	Placement confirmed
`finalized`	Full lifecycle complete
`failed`	Error state — retryable

Failure Philosophy

Failures must be visible, stateful, retryable, and tied to canonical IDs.

Do not silently drop files.
Do not overwrite state.
Do not advance objects with weak provenance.

Supported Input Paths

Watched local inbox (C:/QiData/inbox/)
Manual import
Synced storage
App upload
Future: email/connector intake

Subprocess Categories

Subprocess	Responsibility
OCR subprocess	Extract text from scanned docs
Extraction subprocess	Parse text from structured files
Embedding subprocess	Generate local vector embeddings
File scanner	Detect new files in watched paths
Sync subprocess	Reconcile local ↔ cloud state
Graph projection job	Push canonical records into `qigraph`
Retrieval orchestration	Coordinate pgvector + Neo4j recall