Skip to content

Model configuration

Xiajiao supports any OpenAI-compatible API. This page covers major providers. If you have not run the app yet, start with Quick start. After models work, tune personas and tokens with the SOUL.md guide.

Global settings—theme, language, default LLM

Global settings—theme, language, and default LLM model

Agent card—per-agent model selector

Agent card—dropdown to assign a model per agent

Where to configure

After login:

Settings → Model management → Add configuration

Each provider needs:

FieldNotes
NameAny label (e.g. “Qwen”)
API base URLProvider endpoint
API keySecret from vendor
API typeopenai-completions or anthropic-messages
Default modelDefault model id for that provider

OpenAI

FieldValue
API base URLhttps://api.openai.com/v1
API typeopenai-completions
Keysplatform.openai.com/api-keys

Suggested models

ModelNotesPrice
gpt-4oFlagship, multimodal$5/M in, $15/M out
gpt-4o-miniLightweight, good value$0.15/M in, $0.6/M out
gpt-4-turboPrevious flagship$10/M in, $30/M out
o1Reasoning-focused$15/M in, $60/M out

Practices

  • Daily chat: gpt-4o-mini
  • Hard tasks: gpt-4o
  • Code: gpt-4o

Anthropic (Claude)

FieldValue
API base URLhttps://api.anthropic.com
API typeanthropic-messages
Keysconsole.anthropic.com

API type

Claude must use anthropic-messages, not openai-completions.

Suggested models

ModelNotesPrice
claude-sonnet-4-20250514Strong code and reasoning$3/M in, $15/M out
claude-3-5-haiku-20241022Fast, smaller$1/M in, $5/M out
claude-opus-4-20250514Maximum reasoning$15/M in, $75/M out

Practices

  • Code: Claude Sonnet
  • Long writing: Sonnet (large context)
  • Budget: Haiku

Qwen (Alibaba DashScope)

FieldValue
API base URLhttps://dashscope.aliyuncs.com/compatible-mode/v1
API typeopenai-completions
Keysdashscope.console.aliyun.com

Suggested models

ModelNotesPrice
qwen-maxFlagshipCN¥20/M in, CN¥60/M out
qwen-plusBalancedCN¥0.8/M in, CN¥2/M out
qwen-turboFast, cheapCN¥0.3/M in, CN¥0.6/M out
qwen-longLong contextCN¥0.5/M in, CN¥2/M out

Practices

  • Daily: qwen-turbo
  • Harder work: qwen-plus
  • Best quality: qwen-max

New accounts

Qwen often offers free credits for new signups—check the vendor site.

DeepSeek

FieldValue
API base URLhttps://api.deepseek.com
API typeopenai-completions
Keysplatform.deepseek.com

Suggested models

ModelNotesPrice
deepseek-chatGeneral chatCN¥1/M in, CN¥2/M out
deepseek-coderCodeCN¥1/M in, CN¥2/M out
deepseek-reasonerReasoningCN¥4/M in, CN¥16/M out

Practices

  • Strong price/performance vs flagship Western models
  • Code: deepseek-coder
  • Chat: deepseek-chat

Kimi (Moonshot)

FieldValue
API base URLhttps://api.moonshot.cn/v1
API typeopenai-completions
Keysplatform.moonshot.cn

Suggested models

ModelNotesPrice
moonshot-v1-8k8K contextCN¥12 / M tokens
moonshot-v1-32k32KCN¥24 / M tokens
moonshot-v1-128k128KCN¥60 / M tokens

Practices

  • Default: 8K
  • Long docs: 128K

GLM (Zhipu)

FieldValue
API base URLhttps://open.bigmodel.cn/api/paas/v4
API typeopenai-completions
Keysopen.bigmodel.cn

Suggested models

ModelNotesPrice
glm-4-plusFlagshipCN¥50 / M tokens
glm-4-flashFastFree
glm-4-longLong textCN¥1 / M tokens

Free tier

glm-4-flash is free—good for experiments and light use.

Ollama (local)

FieldValue
API base URLhttp://localhost:11434/v1
API typeopenai-completions
API keyOmit or use a placeholder like ollama

Install Ollama

bash
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows: installer from https://ollama.com/download

Pull models

bash
ollama pull llama3.1          # Llama 3.1 8B
ollama pull qwen2.5           # Qwen 2.5
ollama pull mistral           # Mistral 7B
ollama pull codellama         # Code-focused
ollama pull deepseek-coder-v2

Hardware

SizeMin VRAMComfortable VRAM
7B4GB8GB
13B8GB16GB
70B40GB48GB+

CPU-only works but is slow.

Practices

  • Free, private, offline-friendly
  • Privacy-sensitive workloads
  • ~8B models run well on consumer GPUs
  • Chinese: qwen2.5 is a solid default

OpenRouter

FieldValue
API base URLhttps://openrouter.ai/api/v1
API typeopenai-completions
Keysopenrouter.ai/keys

One key routes to many models—handy if you switch often.

Example model ids

openai/gpt-4o
anthropic/claude-3.5-sonnet
google/gemini-pro-1.5
meta-llama/llama-3.1-70b-instruct

Multiple providers

Xiajiao can keep several providers and assign different models per agent.

Example mix

AgentProviderModelRationale
🤖 Xiajiao stewardQwenqwen-turboSimple ops, low cost
✍️ NovelistClaudeclaude-sonnetQuality writing
📝 EditorDeepSeekdeepseek-chatCheap text work
🌐 TranslatorOpenAIgpt-4oStrong multilingual
💻 Coding assistantClaudeclaude-sonnetStrong code

Budget-first

AgentProviderModelRough monthly
AllQwenqwen-turbo< CN¥5

Quality-first

RoleProviderModelRough monthly
CreativeClaudeclaude-sonnet~$10
ToolsOpenAIgpt-4o~$10

Free stack

AgentProviderModel
AllOllamaqwen2.5 / llama3.1

Cost tips

Principles

  1. Match model to task—not everything needs GPT-4o or Claude Opus
  2. Cheap models for translation, summary, format tweaks; premium for creation, code, hard reasoning
  3. Tighter SOUL.md → fewer prompt tokens each call

Example cost for one “500-word tech post” prompt

ModelInput tokOutput tokRough cost
GPT-4o~800~600~$0.012
Claude Sonnet~800~600~$0.009
DeepSeek Chat~800~600~CN¥0.004
Qwen Turbo~800~600~CN¥0.003
Ollama~800~600CN¥0

Example routing

Steward (ops) → Qwen Turbo
Translator → DeepSeek Chat
Coding assistant → Claude Sonnet
Casual agent → Ollama qwen2.5

Reduce token usage

  1. Shorten SOUL.md—roughly ~1 token per word saved
  2. Limit memory injection: e.g. AUTO_MEMORY_TOP_K=3
  3. Disable unused tools—each tool adds ~100–200 tokens in definitions
  4. Start fresh threads when history grows too long

Troubleshooting

Invalid API key

Symptom: 401 Unauthorized

Fix: Re-copy the key (no spaces), confirm it is active.

Wrong base URL

Symptom: ECONNREFUSED or 404

Fix: Check trailing paths:

  • Wrong: https://api.openai.com (missing /v1)
  • Right: https://api.openai.com/v1
  • Wrong: http://localhost:11434 (Ollama needs /v1)
  • Right: http://localhost:11434/v1

Wrong model name

Symptom: model not found

Fix: Match vendor spelling exactly—case-sensitive.

Claude fails

Symptom: errors from Anthropic

Fix: Type must be anthropic-messages, not openai-completions.

Ollama connection refused

Symptom: ECONNREFUSED

Fix:

  1. ollama list — daemon running
  2. Default port 11434
  3. Remote Ollama: bind 0.0.0.0 if Xiajiao runs on another host

Quick verification checklist

✅ OpenAI-compatible URLs end with /v1 where required
✅ API key has no spaces or line breaks
✅ Model id matches provider docs
✅ API type correct (Anthropic → anthropic-messages, else openai-completions)
✅ Test agent sends one message successfully