help · product tour

How pixl-os works

What ships today — the surfaces, the agent, the ingest pipeline, the APIs. Sourced from the running code, not the brochure.

§1 · overview

What is pixl-os

A workspace-native knowledge OS. It ingests the documents, code, and decisions of a team, extracts an entity graph, and answers natural-language questions against that corpus with inline citations. Three truths: one sqlite file per workspace, one OpenRouter key per tenant, one HTTP API for every surface.

sqlite per workspace

Each tenant is a single file. Portable, diff-able, cheap to back up. No vector DB service.

OpenRouter-strict gateway

One key, one model whitelist, one budget cap per workspace. Every call logged.

Entity-graph-aware retrieval

Hybrid BM25 + vector with entity co-occurrence as a re-ranking signal.

§2 · pipeline

Architecture

A five-step pipeline from raw source to cited answer. Every stage is deterministic, swappable, and observable via /audit.

1. Ingest
code · PDFs · URLs · chat
2. Extract
13 entity types + relations
3. Store
sqlite + FTS5 + sqlite-vec
4. Retrieve
BM25 + vector + RRF
5. Answer
OpenRouter + citations

Ingest routes each source to the right parser: AST chunker for code, Trafilatura / Playwright / Firecrawl for URLs, Mistral OCR for PDFs, semantic chunker for transcripts.

Extract calls an LLM pass that emits 13 entity types + edge relations, deduped case-insensitively.

Retrieve fuses BM25 keyword results with sqlite-vec cosine hits via reciprocal rank fusion; optional cross-encoder rerank on the top-k.

§3 · /ask

Chat & agent

The /ask page is an AI SDK v4 chat built on AI Elements. Pick a persona, pick a model, send a question — the agent loop streams tool calls, renders inline citations with hover-previewed source snippets, and logs every step to /audit.

Persona pill

5 personas: code · research · ops · support · general. Explicit picks pin to the conversation (N4), so follow-up turns resume the same persona without a classifier roundtrip.

Model picker

5 providers (OpenAI · Anthropic · Google · DeepSeek · Qwen), 6 whitelisted models, persisted per-browser.

Citations

[N] refs linkify to the retrieval set with HoverCard previews; #entity: and #subject: chips bind a question to an exact node.

Voice input

Push-to-talk mic transcribes locally via the /api/transcribe endpoint then injects the result into the composer.

Image input

Drop or paste images into the composer; multimodal models see them alongside the prompt.

Reasoning drawer

Per-round plan + tool-call cards + TTFT and tok/s columns (N5). The first sentence of each round streams as an agent-plan annotation (N10) separate from the answer delta.

Tool registry · 10 tools

retrieve_kb
defaultHybrid RAG over the workspace KB — BM25 + vector, reciprocal rank fusion.
search_graph
defaultWalk the entity graph from a seed entity. Returns neighbour entities + relation labels.
fetch_doc
defaultFetch the full markdown of a KB document by id (preferred) or fuzzy title match.
show_entity
defaultFull detail for one entity: type, description, relations both directions, top 5 source documents.
list_subjects
defaultList topic clusters (subjects) for the workspace, derived from entity co-occurrence.
search_symbols
defaultSearch AST code symbols (functions, classes, methods) by name or signature.
web_search
personaSearch the public web for recent or external information the workspace KB doesn't cover.
web_fetch
personaFetch a web page and return its main content (Trafilatura-cleaned, no nav/ads).
memory_read
personaRead a short conversation-scoped note previously persisted via memory_write.
memory_write
personaPersist a short note so later turns can recall user preferences or session facts.

The default tools are active when no persona is selected. The persona tools (web_*, memory_*) are opt-in per persona and not in the default allowlist.

§4 · /ingest

Ingestion

Drop a file, paste a URL, or type prose. The dispatcher sniffs the source type and routes it to the right parser, then streams back every stage over SSE so you can see the pipeline move.

Source types

file · .md .txtChonkie semantic
Paragraph-aware semantic chunker for plain text and Markdown.
file · .pdfMistral OCR / pypdf
OCR fallback for scanned PDFs; direct parse when text layer is present.
file · .csv .jsonpandas + json parser
Structured tabular / object data pushed as typed chunks.
url · http(s)Trafilatura
HTTP GET + readability-cleaned main content extraction.
url · SPAPlaywright / Firecrawl
JS-heavy URLs routed through a headless browser fallback when Trafilatura returns empty.
code · repostree-sitter AST
Typed symbols per language (.py / .ts / .tsx / .js / .rs) — feeds search_symbols.
text · pastedplain-text passthrough
Direct paste box on /ingest — useful for meeting notes and snippets.
chatswhatsapp / slack / telegram / imessage / discord / csv
Multi-format parser wired through the MCP ingest_chat tool.
Example SSE event stream
POST /api/ingest/stream
event: received
data: {"size":18432,"type":"file","name":"decisions.md"}

event: stage
data: {"stage":"parsing","progress":0.22,"note":"4,210 chars"}

event: stage
data: {"stage":"chunking","progress":0.25,"note":"Chonkie semantic · 4,210 chars"}

event: stage
data: {"stage":"embedding","progress":0.5,"note":"384-dim · batched"}

event: stage
data: {"stage":"extracting_entities","progress":0.75,"note":"LLM pass · 12 entities"}

event: stage
data: {"stage":"persisting","progress":0.9}

event: done
data: {"doc_id":"doc_5fc3","chunks":14,"entities":12,"cost_usd":0.0008}

Stages: received fetching (URL only) → parsing chunking embedding extracting_entities persisting done. Each stage is cost-tracked under ingest:<strategy>.

§5 · /graph

Knowledge graph

/graph is the workspace overview — stats, type tiles, top mentions, and a command-palette search across all entities. Open any entity at /entity/[id] for its 1-hop neighborhood, source documents, and typed relations. Subject clusters (entity co-occurrence groupings) live at /subject/[id].

[ person ]
| owns
[ project ]
/   \
[ decision ][ technology ]
1-hop · typed · weighted

13 entity types

  • person
  • project
  • technology
  • concept
  • process
  • metric
  • organization
  • decision
  • law
  • agency
  • regulation
  • benefit_rule
  • actor_role

Surfaces

  • /graph — workspace overview + full-graph canvas toggle.
  • /entity/[id] — a single entity with its 1-hop neighborhood.
  • /docs — source documents that mention each entity.
§6 · hybrid RAG

Retrieval stack

Hybrid by construction: every query hits keyword + vector in parallel and fuses the results. Pure vector loses on unusual terminology; pure BM25 loses on paraphrases. Fusion covers both.

  1. 1

    BM25 + vector in parallel

    sqlite FTS5 for keyword, sqlite-vec for cosine over 384-dim embeddings. Both run against the same chunks table in the same sqlite file.

  2. 2

    Reciprocal rank fusion

    Results are fused via RRF (k=60) so top hits from either arm rise naturally. The /api/ask retrieval_plan records which arm produced each citation.

  3. 3

    Process-local LRU cache

    Identical (workspace, question, top_k) tuples hit an in-memory LRU for instant repeat answers — useful for /audit playback and tests.

  4. 4

    Optional cross-encoder rerank

    bge-reranker-v2-m3 can rerank the top-k before synthesis (kill-switch via env: pixl_os_rerank_off).

§7 · multi-provider

Models & providers

Every LLM call goes through one OpenRouter key — which fans out to five providers behind a single billing surface. The per-workspace whitelist decides which models the gateway actually accepts via the X-Model-Override header.

Providers on the feen whitelist

AnthropicOpenAIGoogleDeepSeekQwen

Whitelisted models

anthropic/claude-haiku-4.5
Default — cheap, fast.
anthropic/claude-opus-4-7
Deep reasoning fallback.
openai/gpt-4o-mini
Secondary general-purpose.
google/gemini-2.5-flash
Long-context backup.
deepseek/deepseek-v3
Open-weight cost anchor.
qwen/qwen-2.5-72b-instruct
Multilingual fallback.
§8 · /audit

Audit & cost

Every LLM call is logged to stage_cost_telemetry — no sampling, no estimation. /audit reads the table with filters (workspace · model · stage) and shows what was actually spent.

One audit row

fieldcaptured value
ts
2026-04-18T14:22:07Z
session_id
conv_a41f · turn 3
stage
agent:synth
model
anthropic/claude-haiku-4.5
prompt_tokens
4,182
completion_tokens
316
cached_tokens
3,940
cost_usd
0.000412
latency_ms
842
finish_reason
stop
prompt_text
captured (truncated in UI)
response_text
captured (full)

Response bodies and prompts started being persisted in Sprint T5 — legacy rows show null for those fields in the drawer. Pagination is offset/limit, with totals computed over the filtered-but-unpaginated set.

§9 · YAML

Workspace config

One YAML file per tenant is the single source of truth — repos, connectors, LLM policy, budget, model whitelist. Secrets stay as ${VAR} placeholders and are scrubbed from every API read.

configs/workspaces/feen.yaml (truncated)
ref: feen
name: FeeN
data_dir: /tmp/pixl-kb-demo

llm_policy:
  tone: "Direct, no hedging, no emoji."
  monthly_budget_usd: null        # null = no cap
  model_whitelist:
    - anthropic/claude-haiku-4.5  # default
    - anthropic/claude-opus-4-7
    - openai/gpt-4o-mini
    - google/gemini-2.5-flash
    - deepseek/deepseek-v3
    - qwen/qwen-2.5-72b-instruct

repos:
  - slug: feen-api
    role: backend
    local_path: /Users/hamzamounir/code/nuva/feen-api

env:
  openrouter_api_key: ${OPENROUTER_API_KEY}
§10 · HTTP surface

HTTP API

Everything the UI uses is available over HTTP — same shapes, same auth. Routes are auto-mounted in app.py; the canonical Swagger lives at /docs.

MethodPath
chat
POST/api/chat/stream
POST/api/chat/conversations
GET/api/chat/conversations
GET/api/chat/conversations/{id}
DELETE/api/chat/conversations/{id}
POST/api/chat/conversations/{id}/fork
POST/api/chat/conversations/{id}/share
ask
POST/api/ask
POST/api/ask/followups
personas
GET/api/personas
GET/api/personas/{id}
models
GET/api/models
knowledge
GET/api/docs/list
GET/api/docs/{doc_id}
GET/api/entities
GET/api/entities/{id}
GET/api/subjects
GET/api/references
ingest
POST/api/ingest/stream
POST/api/transcribe
audit
GET/api/audit
GET/api/analytics
GET/api/status
workspaces
GET/api/workspaces
GET/api/workspaces/{ref}
GET/api/workspaces/{ref}/config
connectors
POST/api/connectors/sync
mcp
GET/api/mcp/tools
GET/api/mcp/activity
§11 · external agents

MCP server

A minimal stdio MCP server (plus two HTTP bridges) exposes five tools — pixl-os becomes a local tool that any MCP-speaking client can call.

Tools

ask
Heavy — runs the full classify + retrieve + rerank + synthesise pipeline against the KB.
ingest_chat
Ingest a chat export (whatsapp / slack / telegram / imessage / discord / csv) with PII redaction.
brief
Multi-repo brief for a Linear ticket — retrieval + analysis across every wired-up repo.
entities
HTTP bridge to /api/entities — graph nodes + edges + subjects.
subjects
HTTP bridge to /api/subjects — topic clusters over the workspace.

Claude Desktop / Cursor integration

~/.config/claude-desktop/mcp.json
{
  "mcpServers": {
    "pixl-os": {
      "command": "python",
      "args": ["-m", "pixl_os.mcp_server"],
      "env": { "OPENROUTER_API_KEY": "sk-or-..." }
    }
  }
}

Install: pip install pixl-os. The server advertises its tool list to the client on connect — no registration step.

§12 · q&a

FAQ

Questions that come up often enough to write down. Short answers, no hedging.

What does pixl-os actually do?

It reads your documents, code, and chats, extracts an entity graph, and answers questions about them with inline citations. One sqlite file per workspace; one OpenRouter key for every LLM call.

What are personas for?

A persona is a saved bundle: default model + tool allowlist + system prompt + starter prompts. The /ask composer has five — code / research / ops / support / general — each narrowing the tool surface to what actually helps that job.

Which tools does the agent have by default?

Six read-only tools: retrieve_kb, search_graph, fetch_doc, show_entity, list_subjects, search_symbols. Four more — web_search, web_fetch, memory_read, memory_write — are opt-in per persona and currently shipped on the `general` persona only when explicitly enabled.

Can I switch models mid-conversation?

Yes. The composer's model picker lives next to the Send button and persists in localStorage. On send it becomes an X-Model-Override header — the gateway validates it against the workspace's model_whitelist before routing.

Where does cost come from?

Every LLM call is logged to stage_cost_telemetry with prompt tokens, completion tokens, and the OpenRouter-returned cost. /audit reads that table directly — nothing is estimated.

Can Claude Desktop or Cursor talk to a workspace?

Yes — via the MCP server. Register pixl-os in your client's mcp.json and five tools (ask / ingest_chat / brief / entities / subjects) become callable from the agent.

Is this self-hostable?

Yes. One Python binary, one sqlite file per workspace, one YAML config. No vector DB service, no external deps beyond OpenRouter (which is itself swappable — the gateway is a thin adapter).

What's in the entity graph exactly?

13 entity types — person, project, technology, concept, process, metric, organization, decision, law, agency, regulation, benefit_rule, actor_role — plus typed edges with weights and direction. Explore at /graph (workspace overview + canvas) or open a single entity at /entity/[id] for its 1-hop neighborhood, source docs, and relations.

Didn't find what you need?

Ask the workspace directly — it knows more about its own code than this page does.

Ask pixl-os