How pixl-os works
What ships today — the surfaces, the agent, the ingest pipeline, the APIs. Sourced from the running code, not the brochure.
What is pixl-os
A workspace-native knowledge OS. It ingests the documents, code, and decisions of a team, extracts an entity graph, and answers natural-language questions against that corpus with inline citations. Three truths: one sqlite file per workspace, one OpenRouter key per tenant, one HTTP API for every surface.
sqlite per workspace
Each tenant is a single file. Portable, diff-able, cheap to back up. No vector DB service.
OpenRouter-strict gateway
One key, one model whitelist, one budget cap per workspace. Every call logged.
Entity-graph-aware retrieval
Hybrid BM25 + vector with entity co-occurrence as a re-ranking signal.
Architecture
A five-step pipeline from raw source to cited answer. Every stage is deterministic, swappable, and observable via /audit.
Ingest routes each source to the right parser: AST chunker for code, Trafilatura / Playwright / Firecrawl for URLs, Mistral OCR for PDFs, semantic chunker for transcripts.
Extract calls an LLM pass that emits 13 entity types + edge relations, deduped case-insensitively.
Retrieve fuses BM25 keyword results with sqlite-vec cosine hits via reciprocal rank fusion; optional cross-encoder rerank on the top-k.
Chat & agent
The /ask page is an AI SDK v4 chat built on AI Elements. Pick a persona, pick a model, send a question — the agent loop streams tool calls, renders inline citations with hover-previewed source snippets, and logs every step to /audit.
Persona pill
5 personas: code · research · ops · support · general. Explicit picks pin to the conversation (N4), so follow-up turns resume the same persona without a classifier roundtrip.
Model picker
5 providers (OpenAI · Anthropic · Google · DeepSeek · Qwen), 6 whitelisted models, persisted per-browser.
Citations
[N] refs linkify to the retrieval set with HoverCard previews; #entity: and #subject: chips bind a question to an exact node.
Voice input
Push-to-talk mic transcribes locally via the /api/transcribe endpoint then injects the result into the composer.
Image input
Drop or paste images into the composer; multimodal models see them alongside the prompt.
Reasoning drawer
Per-round plan + tool-call cards + TTFT and tok/s columns (N5). The first sentence of each round streams as an agent-plan annotation (N10) separate from the answer delta.
Tool registry · 10 tools
- retrieve_kb
- defaultHybrid RAG over the workspace KB — BM25 + vector, reciprocal rank fusion.
- search_graph
- defaultWalk the entity graph from a seed entity. Returns neighbour entities + relation labels.
- fetch_doc
- defaultFetch the full markdown of a KB document by id (preferred) or fuzzy title match.
- show_entity
- defaultFull detail for one entity: type, description, relations both directions, top 5 source documents.
- list_subjects
- defaultList topic clusters (subjects) for the workspace, derived from entity co-occurrence.
- search_symbols
- defaultSearch AST code symbols (functions, classes, methods) by name or signature.
- web_search
- personaSearch the public web for recent or external information the workspace KB doesn't cover.
- web_fetch
- personaFetch a web page and return its main content (Trafilatura-cleaned, no nav/ads).
- memory_read
- personaRead a short conversation-scoped note previously persisted via memory_write.
- memory_write
- personaPersist a short note so later turns can recall user preferences or session facts.
The default tools are active when no persona is selected. The persona tools (web_*, memory_*) are opt-in per persona and not in the default allowlist.
Ingestion
Drop a file, paste a URL, or type prose. The dispatcher sniffs the source type and routes it to the right parser, then streams back every stage over SSE so you can see the pipeline move.
Source types
- file · .md .txtChonkie semantic
- Paragraph-aware semantic chunker for plain text and Markdown.
- file · .pdfMistral OCR / pypdf
- OCR fallback for scanned PDFs; direct parse when text layer is present.
- file · .csv .jsonpandas + json parser
- Structured tabular / object data pushed as typed chunks.
- url · http(s)Trafilatura
- HTTP GET + readability-cleaned main content extraction.
- url · SPAPlaywright / Firecrawl
- JS-heavy URLs routed through a headless browser fallback when Trafilatura returns empty.
- code · repostree-sitter AST
- Typed symbols per language (.py / .ts / .tsx / .js / .rs) — feeds search_symbols.
- text · pastedplain-text passthrough
- Direct paste box on /ingest — useful for meeting notes and snippets.
- chatswhatsapp / slack / telegram / imessage / discord / csv
- Multi-format parser wired through the MCP ingest_chat tool.
Example SSE event stream
event: received
data: {"size":18432,"type":"file","name":"decisions.md"}
event: stage
data: {"stage":"parsing","progress":0.22,"note":"4,210 chars"}
event: stage
data: {"stage":"chunking","progress":0.25,"note":"Chonkie semantic · 4,210 chars"}
event: stage
data: {"stage":"embedding","progress":0.5,"note":"384-dim · batched"}
event: stage
data: {"stage":"extracting_entities","progress":0.75,"note":"LLM pass · 12 entities"}
event: stage
data: {"stage":"persisting","progress":0.9}
event: done
data: {"doc_id":"doc_5fc3","chunks":14,"entities":12,"cost_usd":0.0008}Stages: received → fetching (URL only) → parsing → chunking → embedding → extracting_entities → persisting → done. Each stage is cost-tracked under ingest:<strategy>.
Knowledge graph
/graph is the workspace overview — stats, type tiles, top mentions, and a command-palette search across all entities. Open any entity at /entity/[id] for its 1-hop neighborhood, source documents, and typed relations. Subject clusters (entity co-occurrence groupings) live at /subject/[id].
13 entity types
- person
- project
- technology
- concept
- process
- metric
- organization
- decision
- law
- agency
- regulation
- benefit_rule
- actor_role
Surfaces
- /graph — workspace overview + full-graph canvas toggle.
- /entity/[id] — a single entity with its 1-hop neighborhood.
- /docs — source documents that mention each entity.
Retrieval stack
Hybrid by construction: every query hits keyword + vector in parallel and fuses the results. Pure vector loses on unusual terminology; pure BM25 loses on paraphrases. Fusion covers both.
- 1
BM25 + vector in parallel
sqlite FTS5 for keyword, sqlite-vec for cosine over 384-dim embeddings. Both run against the same chunks table in the same sqlite file.
- 2
Reciprocal rank fusion
Results are fused via RRF (k=60) so top hits from either arm rise naturally. The /api/ask retrieval_plan records which arm produced each citation.
- 3
Process-local LRU cache
Identical (workspace, question, top_k) tuples hit an in-memory LRU for instant repeat answers — useful for /audit playback and tests.
- 4
Optional cross-encoder rerank
bge-reranker-v2-m3 can rerank the top-k before synthesis (kill-switch via env: pixl_os_rerank_off).
Models & providers
Every LLM call goes through one OpenRouter key — which fans out to five providers behind a single billing surface. The per-workspace whitelist decides which models the gateway actually accepts via the X-Model-Override header.
Providers on the feen whitelist
Whitelisted models
- anthropic/claude-haiku-4.5
- Default — cheap, fast.
- anthropic/claude-opus-4-7
- Deep reasoning fallback.
- openai/gpt-4o-mini
- Secondary general-purpose.
- google/gemini-2.5-flash
- Long-context backup.
- deepseek/deepseek-v3
- Open-weight cost anchor.
- qwen/qwen-2.5-72b-instruct
- Multilingual fallback.
Audit & cost
Every LLM call is logged to stage_cost_telemetry — no sampling, no estimation. /audit reads the table with filters (workspace · model · stage) and shows what was actually spent.
One audit row
- ts
- 2026-04-18T14:22:07Z
- session_id
- conv_a41f · turn 3
- stage
- agent:synth
- model
- anthropic/claude-haiku-4.5
- prompt_tokens
- 4,182
- completion_tokens
- 316
- cached_tokens
- 3,940
- cost_usd
- 0.000412
- latency_ms
- 842
- finish_reason
- stop
- prompt_text
- captured (truncated in UI)
- response_text
- captured (full)
Response bodies and prompts started being persisted in Sprint T5 — legacy rows show null for those fields in the drawer. Pagination is offset/limit, with totals computed over the filtered-but-unpaginated set.
Workspace config
One YAML file per tenant is the single source of truth — repos, connectors, LLM policy, budget, model whitelist. Secrets stay as ${VAR} placeholders and are scrubbed from every API read.
ref: feen
name: FeeN
data_dir: /tmp/pixl-kb-demo
llm_policy:
tone: "Direct, no hedging, no emoji."
monthly_budget_usd: null # null = no cap
model_whitelist:
- anthropic/claude-haiku-4.5 # default
- anthropic/claude-opus-4-7
- openai/gpt-4o-mini
- google/gemini-2.5-flash
- deepseek/deepseek-v3
- qwen/qwen-2.5-72b-instruct
repos:
- slug: feen-api
role: backend
local_path: /Users/hamzamounir/code/nuva/feen-api
env:
openrouter_api_key: ${OPENROUTER_API_KEY}HTTP API
Everything the UI uses is available over HTTP — same shapes, same auth. Routes are auto-mounted in app.py; the canonical Swagger lives at /docs.
| Method | Path | Description |
|---|---|---|
| chat | ||
| POST | /api/chat/stream | Streaming agent loop (AI SDK v4 data-stream protocol). |
| POST | /api/chat/conversations | Create a new persisted conversation. |
| GET | /api/chat/conversations | List conversations for a workspace. |
| GET | /api/chat/conversations/{id} | Fetch one conversation with messages. |
| DELETE | /api/chat/conversations/{id} | Delete a conversation and its messages. |
| POST | /api/chat/conversations/{id}/fork | Fork at a message for side-by-side comparison. |
| POST | /api/chat/conversations/{id}/share | Mint a public read-only share token. |
| ask | ||
| POST | /api/ask | One-shot synthesis — hybrid retrieval + answer + citations. |
| POST | /api/ask/followups | Suggest follow-up questions for a given answer. |
| personas | ||
| GET | /api/personas | List personas (code / research / ops / support / general). |
| GET | /api/personas/{id} | Return one persona spec (allowlist, prompts, default model). |
| models | ||
| GET | /api/models | Workspace model whitelist + cost + latency + default flag. |
| knowledge | ||
| GET | /api/docs/list | Paginated document listing. |
| GET | /api/docs/{doc_id} | Full markdown + metadata for one document. |
| GET | /api/entities | Graph nodes + edges (+ optional subjects). |
| GET | /api/entities/{id} | Full detail for one entity. |
| GET | /api/subjects | Topic clusters derived from entity co-occurrence. |
| GET | /api/references | Attach-a-reference picker backing /ask composer. |
| ingest | ||
| POST | /api/ingest/stream | Universal ingest via SSE — file / url / text, all pipeline stages. |
| POST | /api/transcribe | Audio / video transcription + chunker. |
| audit | ||
| GET | /api/audit | Paginated LLM cost telemetry — filters by workspace / model / stage. |
| GET | /api/analytics | Workspace analytics — symbols, dedup, coverage, timeline. |
| GET | /api/status | Workspace health + counts. |
| workspaces | ||
| GET | /api/workspaces | List all workspaces known to the OS. |
| GET | /api/workspaces/{ref} | Workspace spec + repo list. |
| GET | /api/workspaces/{ref}/config | Scrubbed workspace YAML (secrets nulled). |
| connectors | ||
| POST | /api/connectors/sync | Trigger a connector sync (Linear / Slack / Notion / Gmail). |
| mcp | ||
| GET | /api/mcp/tools | MCP tool registry with JSON-schema + example payloads. |
| GET | /api/mcp/activity | Recent MCP tool calls + connected-agents estimate. |
MCP server
A minimal stdio MCP server (plus two HTTP bridges) exposes five tools — pixl-os becomes a local tool that any MCP-speaking client can call.
Tools
- ask
- Heavy — runs the full classify + retrieve + rerank + synthesise pipeline against the KB.
- ingest_chat
- Ingest a chat export (whatsapp / slack / telegram / imessage / discord / csv) with PII redaction.
- brief
- Multi-repo brief for a Linear ticket — retrieval + analysis across every wired-up repo.
- entities
- HTTP bridge to /api/entities — graph nodes + edges + subjects.
- subjects
- HTTP bridge to /api/subjects — topic clusters over the workspace.
Claude Desktop / Cursor integration
{
"mcpServers": {
"pixl-os": {
"command": "python",
"args": ["-m", "pixl_os.mcp_server"],
"env": { "OPENROUTER_API_KEY": "sk-or-..." }
}
}
}Install: pip install pixl-os. The server advertises its tool list to the client on connect — no registration step.
FAQ
Questions that come up often enough to write down. Short answers, no hedging.
What does pixl-os actually do?
It reads your documents, code, and chats, extracts an entity graph, and answers questions about them with inline citations. One sqlite file per workspace; one OpenRouter key for every LLM call.
What are personas for?
A persona is a saved bundle: default model + tool allowlist + system prompt + starter prompts. The /ask composer has five — code / research / ops / support / general — each narrowing the tool surface to what actually helps that job.
Which tools does the agent have by default?
Six read-only tools: retrieve_kb, search_graph, fetch_doc, show_entity, list_subjects, search_symbols. Four more — web_search, web_fetch, memory_read, memory_write — are opt-in per persona and currently shipped on the `general` persona only when explicitly enabled.
Can I switch models mid-conversation?
Yes. The composer's model picker lives next to the Send button and persists in localStorage. On send it becomes an X-Model-Override header — the gateway validates it against the workspace's model_whitelist before routing.
Where does cost come from?
Every LLM call is logged to stage_cost_telemetry with prompt tokens, completion tokens, and the OpenRouter-returned cost. /audit reads that table directly — nothing is estimated.
Can Claude Desktop or Cursor talk to a workspace?
Yes — via the MCP server. Register pixl-os in your client's mcp.json and five tools (ask / ingest_chat / brief / entities / subjects) become callable from the agent.
Is this self-hostable?
Yes. One Python binary, one sqlite file per workspace, one YAML config. No vector DB service, no external deps beyond OpenRouter (which is itself swappable — the gateway is a thin adapter).
What's in the entity graph exactly?
13 entity types — person, project, technology, concept, process, metric, organization, decision, law, agency, regulation, benefit_rule, actor_role — plus typed edges with weights and direction. Explore at /graph (workspace overview + canvas) or open a single entity at /entity/[id] for its 1-hop neighborhood, source docs, and relations.
Didn't find what you need?
Ask the workspace directly — it knows more about its own code than this page does.