help · product tour

How pixl-os works

What ships today — the surfaces, the agent, the ingest pipeline, the APIs. Sourced from the running code, not the brochure.

§1 · overview

What is pixl-os

A workspace-native knowledge OS. It ingests the documents, code, and decisions of a team, extracts an entity graph, and answers natural-language questions against that corpus with inline citations. Three truths: one sqlite file per workspace, one OpenRouter key per tenant, one HTTP API for every surface.

sqlite per workspace

Each tenant is a single file. Portable, diff-able, cheap to back up. No vector DB service.

OpenRouter-strict gateway

One key, one model whitelist, one budget cap per workspace. Every call logged.

Entity-graph-aware retrieval

Hybrid BM25 + vector with entity co-occurrence as a re-ranking signal.

§2 · pipeline

Architecture

A five-step pipeline from raw source to cited answer. Every stage is deterministic, swappable, and observable via /audit.

1. Ingest

code · PDFs · URLs · chat

2. Extract

13 entity types + relations

3. Store

sqlite + FTS5 + sqlite-vec

4. Retrieve

BM25 + vector + RRF

5. Answer

OpenRouter + citations

Ingest routes each source to the right parser: AST chunker for code, Trafilatura / Playwright / Firecrawl for URLs, Mistral OCR for PDFs, semantic chunker for transcripts.

Extract calls an LLM pass that emits 13 entity types + edge relations, deduped case-insensitively.

Retrieve fuses BM25 keyword results with sqlite-vec cosine hits via reciprocal rank fusion; optional cross-encoder rerank on the top-k.

§3 · /ask

Chat & agent

The /ask page is an AI SDK v4 chat built on AI Elements. Pick a persona, pick a model, send a question — the agent loop streams tool calls, renders inline citations with hover-previewed source snippets, and logs every step to /audit.

Persona pill

5 personas: code · research · ops · support · general. Explicit picks pin to the conversation (N4), so follow-up turns resume the same persona without a classifier roundtrip.

Model picker

5 providers (OpenAI · Anthropic · Google · DeepSeek · Qwen), 6 whitelisted models, persisted per-browser.

Citations

[N] refs linkify to the retrieval set with HoverCard previews; #entity: and #subject: chips bind a question to an exact node.

Voice input

Push-to-talk mic transcribes locally via the /api/transcribe endpoint then injects the result into the composer.

Image input

Drop or paste images into the composer; multimodal models see them alongside the prompt.

Reasoning drawer

Per-round plan + tool-call cards + TTFT and tok/s columns (N5). The first sentence of each round streams as an agent-plan annotation (N10) separate from the answer delta.

Tool registry · 10 tools

retrieve_kb: defaultHybrid RAG over the workspace KB — BM25 + vector, reciprocal rank fusion.
search_graph: defaultWalk the entity graph from a seed entity. Returns neighbour entities + relation labels.
fetch_doc: defaultFetch the full markdown of a KB document by id (preferred) or fuzzy title match.
show_entity: defaultFull detail for one entity: type, description, relations both directions, top 5 source documents.
list_subjects: defaultList topic clusters (subjects) for the workspace, derived from entity co-occurrence.
search_symbols: defaultSearch AST code symbols (functions, classes, methods) by name or signature.
web_search: personaSearch the public web for recent or external information the workspace KB doesn't cover.
web_fetch: personaFetch a web page and return its main content (Trafilatura-cleaned, no nav/ads).
memory_read: personaRead a short conversation-scoped note previously persisted via memory_write.
memory_write: personaPersist a short note so later turns can recall user preferences or session facts.

The default tools are active when no persona is selected. The persona tools (web_*, memory_*) are opt-in per persona and not in the default allowlist.

Open /ask

§4 · /ingest

Ingestion

Drop a file, paste a URL, or type prose. The dispatcher sniffs the source type and routes it to the right parser, then streams back every stage over SSE so you can see the pipeline move.

Source types

file · .md .txtChonkie semantic: Paragraph-aware semantic chunker for plain text and Markdown.
file · .pdfMistral OCR / pypdf: OCR fallback for scanned PDFs; direct parse when text layer is present.
file · .csv .jsonpandas + json parser: Structured tabular / object data pushed as typed chunks.
url · http(s)Trafilatura: HTTP GET + readability-cleaned main content extraction.
url · SPAPlaywright / Firecrawl: JS-heavy URLs routed through a headless browser fallback when Trafilatura returns empty.
code · repostree-sitter AST: Typed symbols per language (.py / .ts / .tsx / .js / .rs) — feeds search_symbols.
text · pastedplain-text passthrough: Direct paste box on /ingest — useful for meeting notes and snippets.
chatswhatsapp / slack / telegram / imessage / discord / csv: Multi-format parser wired through the MCP ingest_chat tool.

Example SSE event stream

POST /api/ingest/stream

event: received
data: {"size":18432,"type":"file","name":"decisions.md"}

event: stage
data: {"stage":"parsing","progress":0.22,"note":"4,210 chars"}

event: stage
data: {"stage":"chunking","progress":0.25,"note":"Chonkie semantic · 4,210 chars"}

event: stage
data: {"stage":"embedding","progress":0.5,"note":"384-dim · batched"}

event: stage
data: {"stage":"extracting_entities","progress":0.75,"note":"LLM pass · 12 entities"}

event: stage
data: {"stage":"persisting","progress":0.9}

event: done
data: {"doc_id":"doc_5fc3","chunks":14,"entities":12,"cost_usd":0.0008}

Stages: received → fetching (URL only) → parsing → chunking → embedding → extracting_entities → persisting → done. Each stage is cost-tracked under ingest:<strategy>.

Open /ingest

§5 · /graph

Knowledge graph

/graph is the workspace overview — stats, type tiles, top mentions, and a command-palette search across all entities. Open any entity at /entity/[id] for its 1-hop neighborhood, source documents, and typed relations. Subject clusters (entity co-occurrence groupings) live at /subject/[id].

[ person ]

| owns

[ project ]

/ \

[ decision ][ technology ]

1-hop · typed · weighted

13 entity types

person
project
technology
concept
process
metric
organization
decision
law
agency
regulation
benefit_rule
actor_role

Surfaces

/graph — workspace overview + full-graph canvas toggle.
/entity/[id] — a single entity with its 1-hop neighborhood.
/docs — source documents that mention each entity.

§6 · hybrid RAG

Retrieval stack

Hybrid by construction: every query hits keyword + vector in parallel and fuses the results. Pure vector loses on unusual terminology; pure BM25 loses on paraphrases. Fusion covers both.

1
BM25 + vector in parallel
sqlite FTS5 for keyword, sqlite-vec for cosine over 384-dim embeddings. Both run against the same chunks table in the same sqlite file.
2
Reciprocal rank fusion
Results are fused via RRF (k=60) so top hits from either arm rise naturally. The /api/ask retrieval_plan records which arm produced each citation.
3
Process-local LRU cache
Identical (workspace, question, top_k) tuples hit an in-memory LRU for instant repeat answers — useful for /audit playback and tests.
4
Optional cross-encoder rerank
bge-reranker-v2-m3 can rerank the top-k before synthesis (kill-switch via env: pixl_os_rerank_off).

§7 · multi-provider

Models & providers

Every LLM call goes through one OpenRouter key — which fans out to five providers behind a single billing surface. The per-workspace whitelist decides which models the gateway actually accepts via the X-Model-Override header.

Providers on the feen whitelist

AnthropicOpenAIGoogleDeepSeekQwen

Whitelisted models

anthropic/claude-haiku-4.5: Default — cheap, fast.
anthropic/claude-opus-4-7: Deep reasoning fallback.
openai/gpt-4o-mini: Secondary general-purpose.
google/gemini-2.5-flash: Long-context backup.
deepseek/deepseek-v3: Open-weight cost anchor.
qwen/qwen-2.5-72b-instruct: Multilingual fallback.

§8 · /audit

Audit & cost

Every LLM call is logged to stage_cost_telemetry — no sampling, no estimation. /audit reads the table with filters (workspace · model · stage) and shows what was actually spent.

One audit row

fieldcaptured value

ts: 2026-04-18T14:22:07Z
session_id: conv_a41f · turn 3
stage: agent:synth
model: anthropic/claude-haiku-4.5
prompt_tokens: 4,182
completion_tokens: 316
cached_tokens: 3,940
cost_usd: 0.000412
latency_ms: 842
finish_reason: stop
prompt_text: captured (truncated in UI)
response_text: captured (full)

Response bodies and prompts started being persisted in Sprint T5 — legacy rows show null for those fields in the drawer. Pagination is offset/limit, with totals computed over the filtered-but-unpaginated set.

Open /audit

§9 · YAML

Workspace config

One YAML file per tenant is the single source of truth — repos, connectors, LLM policy, budget, model whitelist. Secrets stay as ${VAR} placeholders and are scrubbed from every API read.

configs/workspaces/feen.yaml (truncated)

ref: feen
name: FeeN
data_dir: /tmp/pixl-kb-demo

llm_policy:
  tone: "Direct, no hedging, no emoji."
  monthly_budget_usd: null        # null = no cap
  model_whitelist:
    - anthropic/claude-haiku-4.5  # default
    - anthropic/claude-opus-4-7
    - openai/gpt-4o-mini
    - google/gemini-2.5-flash
    - deepseek/deepseek-v3
    - qwen/qwen-2.5-72b-instruct

repos:
  - slug: feen-api
    role: backend
    local_path: /Users/hamzamounir/code/nuva/feen-api

env:
  openrouter_api_key: ${OPENROUTER_API_KEY}

Open /workspace

§10 · HTTP surface

HTTP API

Everything the UI uses is available over HTTP — same shapes, same auth. Routes are auto-mounted in app.py; the canonical Swagger lives at /docs.

Method	Path	Description
chat
POST	/api/chat/stream	Streaming agent loop (AI SDK v4 data-stream protocol).
POST	/api/chat/conversations	Create a new persisted conversation.
GET	/api/chat/conversations	List conversations for a workspace.
GET	/api/chat/conversations/{id}	Fetch one conversation with messages.
DELETE	/api/chat/conversations/{id}	Delete a conversation and its messages.
POST	/api/chat/conversations/{id}/fork	Fork at a message for side-by-side comparison.
POST	/api/chat/conversations/{id}/share	Mint a public read-only share token.
ask
POST	/api/ask	One-shot synthesis — hybrid retrieval + answer + citations.
POST	/api/ask/followups	Suggest follow-up questions for a given answer.
personas
GET	/api/personas	List personas (code / research / ops / support / general).
GET	/api/personas/{id}	Return one persona spec (allowlist, prompts, default model).
models
GET	/api/models	Workspace model whitelist + cost + latency + default flag.
knowledge
GET	/api/docs/list	Paginated document listing.
GET	/api/docs/{doc_id}	Full markdown + metadata for one document.
GET	/api/entities	Graph nodes + edges (+ optional subjects).
GET	/api/entities/{id}	Full detail for one entity.
GET	/api/subjects	Topic clusters derived from entity co-occurrence.
GET	/api/references	Attach-a-reference picker backing /ask composer.
ingest
POST	/api/ingest/stream	Universal ingest via SSE — file / url / text, all pipeline stages.
POST	/api/transcribe	Audio / video transcription + chunker.
audit
GET	/api/audit	Paginated LLM cost telemetry — filters by workspace / model / stage.
GET	/api/analytics	Workspace analytics — symbols, dedup, coverage, timeline.
GET	/api/status	Workspace health + counts.
workspaces
GET	/api/workspaces	List all workspaces known to the OS.
GET	/api/workspaces/{ref}	Workspace spec + repo list.
GET	/api/workspaces/{ref}/config	Scrubbed workspace YAML (secrets nulled).
connectors
POST	/api/connectors/sync	Trigger a connector sync (Linear / Slack / Notion / Gmail).
mcp
GET	/api/mcp/tools	MCP tool registry with JSON-schema + example payloads.
GET	/api/mcp/activity	Recent MCP tool calls + connected-agents estimate.

OpenAPI at :8000/docs

§11 · external agents

MCP server

A minimal stdio MCP server (plus two HTTP bridges) exposes five tools — pixl-os becomes a local tool that any MCP-speaking client can call.

Tools

ask: Heavy — runs the full classify + retrieve + rerank + synthesise pipeline against the KB.
ingest_chat: Ingest a chat export (whatsapp / slack / telegram / imessage / discord / csv) with PII redaction.
brief: Multi-repo brief for a Linear ticket — retrieval + analysis across every wired-up repo.
entities: HTTP bridge to /api/entities — graph nodes + edges + subjects.
subjects: HTTP bridge to /api/subjects — topic clusters over the workspace.

Claude Desktop / Cursor integration

~/.config/claude-desktop/mcp.json

{
  "mcpServers": {
    "pixl-os": {
      "command": "python",
      "args": ["-m", "pixl_os.mcp_server"],
      "env": { "OPENROUTER_API_KEY": "sk-or-..." }
    }
  }
}

Install: pip install pixl-os. The server advertises its tool list to the client on connect — no registration step.

§12 · q&a

FAQ

Questions that come up often enough to write down. Short answers, no hedging.

What does pixl-os actually do?

It reads your documents, code, and chats, extracts an entity graph, and answers questions about them with inline citations. One sqlite file per workspace; one OpenRouter key for every LLM call.

What are personas for?

A persona is a saved bundle: default model + tool allowlist + system prompt + starter prompts. The /ask composer has five — code / research / ops / support / general — each narrowing the tool surface to what actually helps that job.

Which tools does the agent have by default?

Six read-only tools: retrieve_kb, search_graph, fetch_doc, show_entity, list_subjects, search_symbols. Four more — web_search, web_fetch, memory_read, memory_write — are opt-in per persona and currently shipped on the `general` persona only when explicitly enabled.

Can I switch models mid-conversation?

Yes. The composer's model picker lives next to the Send button and persists in localStorage. On send it becomes an X-Model-Override header — the gateway validates it against the workspace's model_whitelist before routing.

Where does cost come from?

Every LLM call is logged to stage_cost_telemetry with prompt tokens, completion tokens, and the OpenRouter-returned cost. /audit reads that table directly — nothing is estimated.

Can Claude Desktop or Cursor talk to a workspace?

Yes — via the MCP server. Register pixl-os in your client's mcp.json and five tools (ask / ingest_chat / brief / entities / subjects) become callable from the agent.

Is this self-hostable?

Yes. One Python binary, one sqlite file per workspace, one YAML config. No vector DB service, no external deps beyond OpenRouter (which is itself swappable — the gateway is a thin adapter).

What's in the entity graph exactly?

13 entity types — person, project, technology, concept, process, metric, organization, decision, law, agency, regulation, benefit_rule, actor_role — plus typed edges with weights and direction. Explore at /graph (workspace overview + canvas) or open a single entity at /entity/[id] for its 1-hop neighborhood, source docs, and relations.

Didn't find what you need?

Ask the workspace directly — it knows more about its own code than this page does.

Ask pixl-os

jump to surface ·home /ask /ingest /graph /docs /metrics /audit /workspace