LLM Providers
Supported remote LLM providers, model guidance, and known caveats for AI Now and agent workflows
Choose your setup in three layers:
- Plan or subscription: what account or bundle you already pay for
- Provider or endpoint: where requests actually go
- Model: the concrete tool-capable default you run day to day
Then configure it once in Settings (or nmem CLI/TUI).
Nowledge Mem recommends subscription-based defaults for the smoothest daily experience right now:
- OpenAI ChatGPT/Codex subscription
- Kimi Code subscription
Cost-aware default recommendation
For day-to-day AI Now usage, prioritize fast tool-capable models over SOTA by default.
Examples: gpt-5.1-codex-mini (Codex subscription) or Kimi Coding Plan models.
If your current setup is spending too many tokens (for example always running gpt-5.3-codex), switch to a lighter tool-capable default first.
What To Choose
ChatGPT/Codex Subscription
Best default for coding-heavy AI Now workflows with strong tool use support.
Kimi Code Subscription
Great for coding workflows and tool-calling in AI Now.
Provider Guides
- Stable tool-calling in AI Now and agents
- Strong ecosystem compatibility
- Great fit for coding-heavy AI Now sessions
- Reliable tool workflows in day-to-day usage
- Strong planning and tool use quality
- Good for long multi-step sessions
- Use
deepseek-chatin AI Now - Solid lower-cost alternative for tool workflows
- Flexible multi-model routing
- Pick tool-capable models for AI Now / agent workflows
- Supported in AI Now and agent flows
- Smooth fit with Google AI Studio credentials
- Supported with tool workflows in AI Now
- Good when xAI is already part of your stack
- Supported in AI Now and extension workflows
- Works with current MiniMax chat models
- Supported for AI Now and agents
- Good regional/provider fit for existing usage
- No cloud dependency for model runtime
- Use tool-capable models for AI Now agents
- Supported in AI Now
- Low-friction if your team already uses Copilot
- Requires OpenAI-compatible chat completions
- Tool support depends on your gateway/model
DeepSeek model hint
Use deepseek-chat for AI Now and agent tasks.
Context Window
Every model has a maximum number of tokens it can process in a single request. Nowledge Mem auto-detects this limit from the model name — for example, gpt-4o defaults to 128k tokens, gemini-2.0-flash to 1M.
You can override the context window in Settings → Providers → Advanced (or via nmem config provider set --context-window <tokens>).
When to adjust:
- Small or fine-tuned models (8k–32k context) — set the actual limit so that AI Now compacts the conversation before it overflows.
- Extended-context models (500k–1M+) — increase the window so AI Now uses the full capacity instead of compacting early.
- Custom or self-hosted models — if the model name doesn't match a known pattern, the default is 128k. Set the real value to get accurate compaction behavior.
What happens during compaction
When a conversation approaches the context limit, AI Now automatically summarizes older messages and keeps the most recent exchanges. This lets long sessions continue without losing important context. Setting the right context window ensures compaction fires at the right time — not too early, not too late.
Custom Provider Guidance
If you configure a custom OpenAI-compatible endpoint (openai_compatible) that points to DeepSeek (api.deepseek.com), use deepseek-chat as the model for AI Now and agent tasks.
Custom endpoints also support the newer Responses API (/v1/responses) alongside the legacy Chat Completions format. Select the API format when adding or editing the provider.
For Linux headless deployment setup, see Linux Server Deployment.