Supported remote LLM providers, model guidance, and known caveats for AI Now and agent workflows

Choose your setup in three layers:

Plan or subscription: what account or bundle you already pay for
Provider or endpoint: where requests actually go
Model: the concrete tool-capable default you run day to day

Then configure it once in Settings (or nmem CLI/TUI).

Nowledge Mem recommends subscription-based defaults for the smoothest daily experience right now:

OpenAI ChatGPT/Codex subscription
Kimi Code subscription

Cost-aware default recommendation

For day-to-day AI Now usage, prioritize fast tool-capable models over SOTA by default.
For ChatGPT/Codex, fetch the current model list in Settings and choose a supported Codex or base GPT-5 model. Kimi Coding Plan models are also good daily defaults.

If your current setup is spending too many tokens (for example always running gpt-5.3-codex), switch to a lighter tool-capable default first.

What To Choose

ChatGPT/Codex Subscription

Best default for coding-heavy AI Now workflows with strong tool use support.

Kimi Code Subscription

Great for coding workflows and tool-calling in AI Now.

Provider Guides

OpenAI (ChatGPT/Codex)

Best for: coding + daily assistant workflows.

Stable tool-calling in AI Now and agents
Strong ecosystem compatibility
For ChatGPT Subscription, pick Codex models or base GPT-5 releases. Avoid chat aliases like gpt-5-chat-latest here.

Kimi / Moonshot

Best for: Kimi Code subscription users.

Great fit for coding-heavy AI Now sessions
Reliable tool workflows in day-to-day usage

Anthropic Claude

Best for: reliable autonomous workflows.

Strong planning and tool use quality
Good for long multi-step sessions

DeepSeek

Best for: cost/performance-sensitive workflows.

Start with deepseek-v4-flash for AI Now and agents
Use deepseek-v4-pro when quality matters more than speed

OpenRouter

Best for: one endpoint with many models.

Flexible multi-model routing
Pick tool-capable models for AI Now / agent workflows

Gemini

Best for: Google ecosystem users.

Supported in AI Now and agent flows
Smooth fit with Google AI Studio credentials

xAI

Best for: Grok users.

Supported with tool workflows in AI Now
Good when xAI is already part of your stack

MiniMax

Best for: users already invested in MiniMax.

Supported in AI Now and extension workflows
Works with current MiniMax chat models

Z.AI

Best for: Zhipu ecosystem users.

Supported for AI Now and agents
Good regional/provider fit for existing usage

Ollama

Best for: local-first and self-hosted users.

No cloud dependency for model runtime
Use tool-capable models for AI Now agents

GitHub Copilot

Best for: Copilot subscription workflows.

Supported in AI Now
Low-friction if your team already uses Copilot

API

OpenAI-Compatible Custom Endpoint

Best for: private gateways and enterprise proxies.

Requires OpenAI-compatible chat completions
Tool support depends on your gateway/model

DeepSeek model hint

Use deepseek-v4-flash for the quickest DeepSeek setup in AI Now and agent tasks. Choose deepseek-v4-pro when you want the stronger V4 model. deepseek-chat still works, but deepseek-reasoner is not offered for agent workflows because it does not support tool calling.

Context Window

Every model has a maximum number of tokens it can process in a single request. Nowledge Mem auto-detects this limit from the model name. For example, gpt-4o defaults to 128k tokens, gemini-2.0-flash to 1M.

You can override the context window in Settings → Providers → Advanced (or via nmem config provider set --context-window <tokens>).

When to adjust:

Small or fine-tuned models (8k–32k context): set the actual limit so that AI Now compacts the conversation before it overflows.
Extended-context models (500k–1M+): increase the window so AI Now uses the full capacity instead of compacting early.
Custom or self-hosted models: if the model name doesn't match a known pattern, the default is 128k. Set the real value to get accurate compaction behavior.

What happens during compaction

When a conversation approaches the context limit, AI Now automatically summarizes older messages and keeps the most recent exchanges. This lets long sessions continue without losing important context. Setting the right context window ensures compaction fires at the right time, not too early, not too late.

Custom Provider Guidance

If you configure a custom OpenAI-compatible endpoint (openai_compatible) for DeepSeek V4, use deepseek-v4-flash, deepseek-v4-pro, or the exact DeepSeek V4 model id your gateway exposes. DeepSeek V4 uses a provider-specific thinking mode; Mem disables that mode during tool-loop agent calls so tool history remains compatible with strict DeepSeek validation.

If you run DeepSeek V4 through OpenRouter, keep using the OpenRouter provider. Mem leaves OpenRouter-hosted models on OpenRouter's own request contract instead of adding DeepSeek-native thinking fields.

Custom endpoints also support the newer Responses API (/v1/responses) alongside the legacy Chat Completions format. Select the API format when adding or editing the provider.

For Linux headless deployment setup, see Linux Server Deployment.

Choose your setup in three layers:

Plan or subscription: what account or bundle you already pay for
Provider or endpoint: where requests actually go
Model: the concrete tool-capable default you run day to day

Then configure it once in Settings (or nmem CLI/TUI).

Nowledge Mem recommends subscription-based defaults for the smoothest daily experience right now:

OpenAI ChatGPT/Codex subscription
Kimi Code subscription

Cost-aware default recommendation

If your current setup is spending too many tokens (for example always running gpt-5.3-codex), switch to a lighter tool-capable default first.

What To Choose

ChatGPT/Codex Subscription

Best default for coding-heavy AI Now workflows with strong tool use support.

Kimi Code Subscription

Great for coding workflows and tool-calling in AI Now.

Provider Guides

OpenAI (ChatGPT/Codex)

Best for: coding + daily assistant workflows.

Stable tool-calling in AI Now and agents
Strong ecosystem compatibility
For ChatGPT Subscription, pick Codex models or base GPT-5 releases. Avoid chat aliases like gpt-5-chat-latest here.

Kimi / Moonshot

Best for: Kimi Code subscription users.

Great fit for coding-heavy AI Now sessions
Reliable tool workflows in day-to-day usage

Anthropic Claude

Best for: reliable autonomous workflows.

Strong planning and tool use quality
Good for long multi-step sessions

DeepSeek

Best for: cost/performance-sensitive workflows.

Start with deepseek-v4-flash for AI Now and agents
Use deepseek-v4-pro when quality matters more than speed

OpenRouter

Best for: one endpoint with many models.

Flexible multi-model routing
Pick tool-capable models for AI Now / agent workflows

Gemini

Best for: Google ecosystem users.

Supported in AI Now and agent flows
Smooth fit with Google AI Studio credentials

xAI

Best for: Grok users.

Supported with tool workflows in AI Now
Good when xAI is already part of your stack

MiniMax

Best for: users already invested in MiniMax.

Supported in AI Now and extension workflows
Works with current MiniMax chat models

Z.AI

Best for: Zhipu ecosystem users.

Supported for AI Now and agents
Good regional/provider fit for existing usage

Ollama

Best for: local-first and self-hosted users.

No cloud dependency for model runtime
Use tool-capable models for AI Now agents

GitHub Copilot

Best for: Copilot subscription workflows.

Supported in AI Now
Low-friction if your team already uses Copilot

API

OpenAI-Compatible Custom Endpoint

Best for: private gateways and enterprise proxies.

Requires OpenAI-compatible chat completions
Tool support depends on your gateway/model

DeepSeek model hint

Context Window

You can override the context window in Settings → Providers → Advanced (or via nmem config provider set --context-window <tokens>).

When to adjust:

Small or fine-tuned models (8k–32k context): set the actual limit so that AI Now compacts the conversation before it overflows.
Extended-context models (500k–1M+): increase the window so AI Now uses the full capacity instead of compacting early.
Custom or self-hosted models: if the model name doesn't match a known pattern, the default is 128k. Set the real value to get accurate compaction behavior.

What happens during compaction

Custom Provider Guidance

Custom endpoints also support the newer Responses API (/v1/responses) alongside the legacy Chat Completions format. Select the API format when adding or editing the provider.

For Linux headless deployment setup, see Linux Server Deployment.

LLM Providers

What To Choose

ChatGPT/Codex Subscription

Kimi Code Subscription

Provider Guides

Context Window

Custom Provider Guidance

On this page

LLM Providers

What To Choose

ChatGPT/Codex Subscription

Kimi Code Subscription

Provider Guides

Context Window

Custom Provider Guidance

On this page