Ivilson.AI.VllmChatClient
1.8.8
dotnet add package Ivilson.AI.VllmChatClient --version 1.8.8
NuGet\Install-Package Ivilson.AI.VllmChatClient -Version 1.8.8
<PackageReference Include="Ivilson.AI.VllmChatClient" Version="1.8.8" />
<PackageVersion Include="Ivilson.AI.VllmChatClient" Version="1.8.8" />
<PackageReference Include="Ivilson.AI.VllmChatClient" />
paket add Ivilson.AI.VllmChatClient --version 1.8.8
#r "nuget: Ivilson.AI.VllmChatClient, 1.8.8"
#:package Ivilson.AI.VllmChatClient@1.8.8
#addin nuget:?package=Ivilson.AI.VllmChatClient&version=1.8.8
#tool nuget:?package=Ivilson.AI.VllmChatClient&version=1.8.8
vllmchatclient
C# vLLM Chat Client
A comprehensive .NET 8 chat client library that supports various LLM models including OpenAI GPT 系列, Claude 4.6 / 4.5, GPT-OSS-120B, Qwen3, Qwen3-Next, Qwen 3.5, QwQ-32B, Gemma3, DeepSeek-R1, DeepSeek-V3.2, Kimi K2 / Kimi 2.5, GLM-5 / GLM 4.6 / 4.7 / 4.7 Flash / 4.5, Gemini 3, MiniMax-M2.5 with advanced reasoning capabilities.
🚀 Features
✅ Multi-model Support: OpenAI GPT 系列, Claude 4.6 / 4.5, Qwen3, Qwen3-Next, Qwen 3.5 (supports multiple modelIds, including Qwen3-VL), QwQ, Gemma3, DeepSeek-R1, DeepSeek-V3.2, GLM-5 / GLM-4 / glm-4.6 / glm-4.7 / glm-4.7-flash / glm-4.5, GPT-OSS-120B/20B, Kimi K2 / Kimi 2.5, Gemini 3, MiniMax-M2.5
✅ Reasoning Chain Support: Built-in thinking/reasoning capabilities for supported models (GLM supports Zhipu official thinking parameter via
GlmChatOptions.ThinkingEnabled)✅ Stream Function Calls: Real-time function calling with streaming responses
✅ Multiple Deployment Options: Local vLLM deployment and cloud API support
✅ Performance Optimized: Efficient streaming and memory management
✅ .NET 8 Ready: Full compatibility with the latest .NET platform
📦 Project Repository
GitHub: https://github.com/iwaitu/vllmchatclient
本次更新
🆕 Claude 4.6 / 4.5 思维链支持
- 新增
VllmClaudeChatClient:专门适配 OpenRouter 等平台提供的 Claude 模型。 - 思维链参数适配:支持 Claude 4.6 推出的
reasoning: { effort: "high"|"medium"|"low" }参数(通过VllmChatOptions.ThinkingEnabled = true开启,默认使用high)。 - 响应格式解析:支持从模型返回的
reasoning字符串或reasoning_details数组中提取思维链内容,并统一封装进ReasoningChatResponse。 - Token 优化:针对 Claude 默认较大的 token 限制进行了保护性设置,避免 OpenRouter 额度报错。
🆕 OpenAI GPT 系列支持
- 新增
VllmOpenAiGptClient:专门适配 OpenAI 官方或 OpenRouter 提供的 GPT 系列模型(如 gpt-4o, gpt-5.2-codex 等)。 - 推理分段支持:支持包含思维链的 GPT 系列模型,通过
OpenAiGptChatOptions控制推理级别 (ReasoningLevel)。 - 灵活配置:内置
ExcludeReasoning选项,允许控制是否在输出中包含推理过程。
🆕 DeepSeek V3.2 思维链支持
VllmDeepseekV3ChatClient思维链修复:- 修正请求格式:DashScope API 使用
enable_thinking: true(顶层布尔值),而非 Kimi 格式的thinking: {type: "enabled"}。 - 模型返回的
reasoning_content字段现在可以正确解析并输出。 - 非流式响应通过
ReasoningChatResponse.Reason获取思维链内容。 - 流式响应通过
ReasoningChatResponseUpdate.Thinking区分思考阶段与最终回答。 - 支持通过
VllmChatOptions.ThinkingEnabled = true开启思维链。 - 兼容 DashScope 平台
deepseek-v3.2模型。
- 修正请求格式:DashScope API 使用
🐛 Bug Fixes
VllmGptOssChatClient流式函数调用 Bug 修复:- 修复了流式手动函数调用(Manual Function Call)时,模型返回
tool_calls后第一个流结束、导致无法获取最终文本回复的问题。 - 新增
GetStreamingResponseAsync重写:自动检测调用方已将工具结果追加到messages,并自动发起第二轮流式请求,实现无缝的工具调用 → 最终回复流程。 - 现在
StreamChatManualFunctionCallTest可以在单个await foreach循环中完成完整的工具调用流程,无需手动编写 "Second turn" 逻辑。 - 简化了默认系统提示词,去除了"tool_calls 时 content 必须为空"的硬性约束。
- 修复了流式手动函数调用(Manual Function Call)时,模型返回
🔄 VllmQwen3NextChatClient 重构 — 统一多模型适配
VllmQwen3NextChatClient已适配多个模型系列,通过构造函数modelId或ChatOptions.ModelId切换,无需再使用独立的 Client 类:qwen3.5-397b-a17b(Qwen 3.5,最新)qwen3-next-80b-a3b-thinking/qwen3-next-80b-a3b-instructqwen3-vl-30b-a3b-thinking/qwen3-vl-30b-a3b-instruct(多模态,支持图片输入)qwen3-vl-32b-thinking/qwen3-vl-32b-instruct(多模态)qwen3-vl-235b-a22b-thinking/qwen3-vl-235b-a22b-instruct(多模态,人工验证通过)
- 删除已整合的模型类(功能已由
VllmQwen3NextChatClient或基类统一覆盖):VllmQwen2507ChatClient(qwen3-235b-a22b-instruct-2507)— 已删除VllmQwen2507ReasoningChatClient(qwen3-235b-a22b-thinking-2507)— 已删除- 对应测试
Qwen2507ChatTests.cs、Qwen2507ReasoningChatTests.cs、Qwen3coderNextTests.cs同步删除
- 删除
VllmChatClientNuget.Test测试项目(已不再需要)。
🧩 基类重构与适配器增强
VllmBaseChatClient基类增强:提取公共逻辑(请求构建、流式解析、推理内容处理)到基类,子类只需重写特定差异部分。VllmDeepseekR1ChatClient重构:继承VllmBaseChatClient,精简代码,仅保留 DeepSeek R1 特有的ReasoningContent流式处理逻辑。VllmGptOssChatClient重构:继承VllmBaseChatClient,精简大量重复代码,增强推理流式处理。
🛠️ 本地 Skill 自动加载
- 新增
VllmChatOptions的 skill 自动加载功能:默认从运行目录./skills/*.md读取本地 skills,并自动注入系统提示词。 - 可通过
EnableSkills(默认true)/SkillDirectoryPath控制开关与路径。 - 内置工具
ListSkillFiles和ReadSkillFile,模型可在对话中按需查询和读取 skill 文件。 - 新增
SimpleSkillSmokeTests测试类验证 skill 功能。
📝 其他更新
- 新增 Qwen 3.5 支持(
qwen3.5-397b-a17b),通过VllmQwen3NextChatClient接入。 - 新增 MiniMax-M2.5 支持,
VllmMiniMaxChatClient兼容 M2.5 / M2.1。 - 新增 GLM 4.7 Flash 支持。
- 新增 GLM 4.6/4.7/5 思维链支持:
VllmGlmChatClient,支持推理分段流式输出(思考/答案)与函数调用。 - 新增
GlmChatOptions:通过ThinkingEnabled开关控制是否在请求体中发送智普官方平台所需的thinking: { type: "enabled" }(默认关闭)。 - 新增
KimiChatOptions:通过ThinkingEnabled开关控制 Moonshot/Kimi 2.5 所需的thinking: { type: "enabled" | "disabled" }。 - 修复/完善
VllmKimiK2ChatClient思维链解析。 - 新增标签提取示例(基于 JSON 解析与正则匹配)。
- 新增 Gemini 3 支持(
VllmGemini3ChatClient),详见docs/Gemini3*系列文档。
🔥 Latest Updates
🆕 Claude 4.6 / 4.5 Thinking Chain Support
VllmClaudeChatClientadded: Specifically designed for Claude models via platforms like OpenRouter.- Thinking Parameter Adaptation: Supports the new
reasoning: { effort: "high" }format introduced in Claude 4.6. - Reasoning Extraction: Efficiently extracts reasoning content from both
reasoning(string) andreasoning_details(array) response fields. - Token Optimization: Includes default
MaxTokenslimits to prevent credit-related errors on cloud providers.
🆕 OpenAI GPT Series Support
VllmOpenAiGptClientadded: Specifically designed for OpenAI official or OpenRouter GPT models.- Reasoning Level Control: Fine-tune model reasoning depth via
OpenAiGptChatOptions.ReasoningLevel. - Reasoning Toggle: Use
ExcludeReasoningto easily include or omit the thinking process from the output.
🆕 DeepSeek V3.2 Thinking Chain Support
VllmDeepseekV3ChatClientthinking chain fixed:- Corrected request format: DashScope API uses
enable_thinking: true(top-level boolean) instead ofthinking: {type: "enabled"}. reasoning_contentfield in model responses is now correctly parsed and output.- Non-streaming: access thinking via
ReasoningChatResponse.Reason. - Streaming: use
ReasoningChatResponseUpdate.Thinkingto distinguish thinking vs final answer. - Enable via
VllmChatOptions.ThinkingEnabled = true. - Compatible with DashScope platform
deepseek-v3.2model.
- Corrected request format: DashScope API uses
🐛 Bug Fixes
VllmGptOssChatClientStreaming Function Call Bug Fixed:- Fixed an issue where the stream ended after model returned
tool_calls, leaving the final text response empty. - Added
GetStreamingResponseAsyncoverride: automatically detects when the caller has appended tool results tomessagesand initiates a follow-up streaming request seamlessly. StreamChatManualFunctionCallTestnow works in a singleawait foreachloop without needing manual "Second turn" logic.- Simplified the default system prompt by removing the strict "content must be empty when tool_calls present" constraint.
- Fixed an issue where the stream ended after model returned
🆕 GLM 4.6 / 4.7 / 5 Thinking Model Support
VllmGlmChatClient added with full reasoning (thinking) stream separation.
Supports
glm-5,glm-4.7,glm-4.7-flash,glm-4.6,glm-4.5.Compatible with existing tool/function invocation pipeline.
Supports Zhipu official platform thinking parameter via
GlmChatOptions.ThinkingEnabled.
🆕 New GPT-OSS-20B/120B Support
- VllmGptOssChatClient - Support for OpenAI's GPT-OSS-120B model with full reasoning capabilities
- Advanced reasoning chain processing with
ReasoningChatResponseUpdate - Compatible with OpenRouter and other GPT-OSS providers
- Enhanced debugging and performance optimizations
🆕 GLM-4 Support
- VllmGlmZ1ChatClient - Support for GLM-4 models with reasoning capabilities
- VllmGlm4ChatClient - Standard GLM-4 chat functionality
🔄 Base Class Refactoring & Model Consolidation
VllmBaseChatClientenhanced: common logic (request building, streaming parsing, reasoning content handling) extracted to base class; subclasses only override specific differences.VllmDeepseekR1ChatClientrefactored: inheritsVllmBaseChatClient, retains only DeepSeek R1-specificReasoningContentstreaming logic.VllmGptOssChatClientrefactored: inheritsVllmBaseChatClient, significantly reduced duplicate code, enhanced reasoning streaming.- Removed
VllmQwen2507ChatClientandVllmQwen2507ReasoningChatClient(consolidated intoVllmQwen3NextChatClient). - Removed
VllmChatClientNuget.Testproject.
🛠️ Local Skill Auto-Loading
VllmChatOptionsnow supports automatic skill loading from./skills/*.mdfiles, injected into system prompts.- Controlled via
EnableSkills(defaulttrue) /SkillDirectoryPath. - Built-in tools
ListSkillFilesandReadSkillFileallow models to query and read skill files during conversation.
🆕 Qwen3-Next / Qwen 3.5 Multi-Model Adaptation
- VllmQwen3NextChatClient now supports multiple model families via
modelId:qwen3.5-397b-a17b(Qwen 3.5, latest)qwen3-next-80b-a3b-thinking/qwen3-next-80b-a3b-instructqwen3-vl-30b-a3b-thinking/qwen3-vl-30b-a3b-instruct(multimodal, image input)qwen3-vl-32b-thinking/qwen3-vl-32b-instruct(multimodal)qwen3-vl-235b-a22b-thinking/qwen3-vl-235b-a22b-instruct(multimodal, manually verified)
- Unified API: switch model by passing the desired modelId in constructor or per-request via
ChatOptions.ModelId. - Thinking models expose
ReasoningChatResponse/ streamingReasoningChatResponseUpdate; instruct models output standard responses. - New examples: Serial/Parallel tool calls, manual tool orchestration in streaming, JSON-only output formatting.
🆕 Kimi K2 Support
- VllmKimiK2ChatClient added.
- Supports Kimi models including
kimi-k2-thinkingandkimi-k2.5. - Seamless reasoning streaming via
ReasoningChatResponseUpdate(thinking vs final answer segments). - Full function invocation support (automatic or manual tool call handling).
🆕 Kimi 2.5 Thinking Toggle (Moonshot)
- New
KimiChatOptions.ThinkingEnabledto control request payload:ThinkingEnabled = true→thinking: { "type": "enabled" }ThinkingEnabled = false→thinking: { "type": "disabled" }
- Kimi reasoning text is taken from
reasoningContent/ streamingdelta.reasoning_content(not</think>markers).
🆕 Gemini 3 Support & Tool Calling
- VllmGemini3ChatClient added (Google Gemini API)。
- Features: text & streaming, ReasoningLevel (Normal/Low), full tool calling (single / parallel / automatic / streaming)。
- Tests:
Gemini3Test全部通过(含多轮与并行工具调用)、GeminiDebugTest覆盖原生 API 思维签名与多轮函数调用调试。 - Docs: 详见
docs/Gemini3*文档合集。
🆕 MiniMax-M2.5 Support
- VllmMiniMaxChatClient added for MiniMax-M2.5 / M2.1 model support.
- Full streaming chat and function calling (parallel tool calls supported).
- Compatible with DashScope API endpoint.
- Tests:
MiniMaxTestscovering chat, streaming, function calls (serial/parallel/manual), and JSON output.
🆕 Qwen 3.5 Support
- VllmQwen3NextChatClient now supports Qwen 3.5 (
qwen3.5-397b-a17b) via DashScope API. - Full reasoning chain and function calling support.
- Use the same
VllmQwen3NextChatClientwithmodelId = "qwen3.5-397b-a17b".
🏗️ Supported Clients
| Client | Deployment | Model Support | Reasoning | Function Calls |
|---|---|---|---|---|
VllmOpenAiGptClient |
OpenRouter/Cloud | OpenAI GPT Series | ✅ Full | ✅ Stream |
VllmClaudeChatClient |
OpenRouter/Cloud | Claude 4.6 / 4.5 | ✅ Full | ✅ Stream |
VllmGptOssChatClient |
OpenRouter/Cloud | GPT-OSS-120B/20B | ✅ Full | ✅ Stream |
VllmQwen3ChatClient |
Local vLLM | Qwen3-32B/235B | ✅ Toggle | ✅ Stream |
VllmQwen3NextChatClient |
Cloud API (DashScope compatible) | Multiple modelIds (e.g. qwen3-next-80b-a3b-thinking / qwen3-next-80b-a3b-instruct) | ✅ (thinking model) | ✅ Stream |
VllmQwen3NextChatClient |
Cloud API (DashScope compatible) | qwen3-vl-30b-a3b-thinking / qwen3-vl-30b-a3b-instruct | ✅ (thinking model) | ✅ Stream |
VllmQwen3NextChatClient |
Cloud API (DashScope compatible) | qwen3-vl-32b-thinking / qwen3-vl-32b-instruct | ✅ (thinking model) | ✅ Stream |
VllmQwen3NextChatClient |
Cloud API (DashScope compatible) | qwen3-vl-235b-a22b-thinking / qwen3-vl-235b-a22b-instruct (manual verified) | ✅ (thinking model) | ✅ Stream |
VllmQwqChatClient |
Local vLLM | QwQ-32B | ✅ Full | ✅ Stream |
VllmGemmaChatClient |
Local vLLM | Gemma3-27B | ❌ | ✅ Stream |
VllmGemini3ChatClient |
Cloud API (Google Gemini) | gemini-3-pro-preview | Signature (hidden) | ✅ Stream |
VllmDeepseekR1ChatClient |
Cloud API | DeepSeek-R1 | ✅ Full | ❌ |
VllmDeepseekV3ChatClient |
Cloud API (DashScope) | DeepSeek-V3.2 | ✅ (via VllmChatOptions) |
✅ Stream |
VllmGlmChatClient |
Cloud API (Zhipu official) / OpenAI compatible | glm-5 / glm-4.6 / glm-4.7 / glm-4.7-flash / glm-4.5 | ✅ Full (via GlmChatOptions) |
✅ Stream |
VllmKimiK2ChatClient |
Cloud API (DashScope) | kimi-k2-(thinking/instruct) / kimi-k2.5 | ✅ (thinking model) | ✅ Stream |
VllmMiniMaxChatClient |
Cloud API (DashScope) | MiniMax-M2.5 / M2.1 | ✅ | ✅ Stream |
VllmQwen3NextChatClient |
Cloud API (DashScope compatible) | qwen3.5-397b-a17b | ✅ (thinking model) | ✅ Stream |
注:Gemini 3 的推理采用加密的 thought signature,不输出可读推理文本;函数调用在当前测试中无需显式回传签名亦可完成多轮调用。
🐳 Docker Deployment Examples
Qwen3 vLLM Deployment:
docker run -it --gpus all -p 8000:8000 \
-v /models/Qwen3-32B-FP8:/models/Qwen3-32B-FP8 \
--restart always \
-e VLLM_USE_V1=1 \
vllm/llm-openai:v0.8.5 \
--model /models/Qwen3-32B-FP8 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--trust-remote-code \
--max-model-len 131072 \
--tensor-parallel-size 2 \
--gpu_memory_utilization 0.8 \
--served-model-name "qwen3"
QwQ vLLM Deployment:
docker run -it --gpus all -p 8000:8000 \
-v /models/Qwen3-32B-FP8:/models/Qwen3-32B-FP8 \
--restart always \
-e VLLM_USE_V1=1 \
vllm/llm-openai:v0.8.5 \
--model /models/Qwen3-32B-FP8 \
--enable-auto-tool-choice \
--tool-call-parser llama3_json \
--trust-remote-code \
--max-model-len 131072 \
--tensor-parallel-size 2 \
--gpu_memory_utilization 0.8 \
--served-model-name "qwen3"
Gemma3 vLLM Deployment:
docker run -it --gpus all -p 8000:8000 \
-v /models/gemma-3-27b-it-FP8-Dynamic:/models/gemma-3-27b-it-FP8-Dynamic \
-v /home/lc/work/gemma3.jinja:/home/lc/work/gemma3.jinja \
-e TZ=Asia/Shanghai \
-e VLLM_USE_V1=1 \
--restart always \
vllm/llm-openai:v0.8.2 \
--model /models/gemma-3-27b-it-FP8-Dynamic \
--enable-auto-tool-choice \
--tool-call-parser pythonic \
--chat-template /home/lc/work/gemma3.jinja \
--trust-remote-code \
--max-model-len 128000 \
--tensor-parallel-size 2 \
--gpu_memory_utilization 0.8 \
--served-model-name "gemma3"
💻 Usage Examples
🆕 GLM 4.6/4.7/4.7-Flash Thinking Example
using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.VllmChatClient.Glm4;
IChatClient glm46 = new VllmGlmChatClient(
"http://localhost:8000/{0}/{1}", // or your OpenAI-compatible endpoint
null,
"glm-4.6");
// Enable Zhipu official platform thinking chain parameter:
// thinking: { "type": "enabled" }
var opts = new GlmChatOptions { ThinkingEnabled = true };
var messages = new List<ChatMessage>
{
new(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new(ChatRole.User, "解释一下快速排序的思想并举一个简单例子。")
};
string reasoning = string.Empty;
string answer = string.Empty;
await foreach (var update in glm46.GetStreamingResponseAsync(messages, opts))
{
if (update is ReasoningChatResponseUpdate r)
{
if (r.Thinking)
reasoning += r.Text; // reasoning phase
else
answer += r.Text; // final answer phase
}
else
{
answer += update.Text;
}
}
Console.WriteLine($"Reasoning: {reasoning}\nAnswer: {answer}");
🆕 Claude 4.6 / 4.5 with Reasoning (OpenRouter)
using Microsoft.Extensions.AI;
// Initialize Claude client (OpenRouter)
IChatClient claude = new VllmClaudeChatClient(
"https://openrouter.ai/api/v1",
"your-api-key",
"anthropic/claude-4.6-sonnet");
var messages = new List<ChatMessage>
{
new(ChatRole.System, "你是一个拥有强大逻辑推理能力的智能助手。"),
new(ChatRole.User, "解释一下为什么天空是蓝色的?请详细思考。")
};
// Enable high-effort reasoning
var options = new VllmChatOptions { ThinkingEnabled = true };
// Non-streaming example:
var response = await claude.GetResponseAsync(messages, options);
if (response is ReasoningChatResponse r)
{
Console.WriteLine($"🧠 Thinking:\n{r.Reason}");
Console.WriteLine($"💬 Answer:\n{r.Text}");
}
// Streaming example:
await foreach (var update in claude.GetStreamingResponseAsync(messages, options))
{
if (update is ReasoningChatResponseUpdate ru)
{
if (ru.Thinking)
Console.Write(ru.Text); // Reasoning phase
else
Console.Write(ru.Text); // Answer phase
}
}
🆕 OpenAI GPT Series with Reasoning (OpenRouter)
using Microsoft.Extensions.AI;
// Initialize OpenAI GPT client (OpenRouter)
IChatClient gptClient = new VllmOpenAiGptClient(
"https://openrouter.ai/api/v1",
"your-api-key",
"openai/gpt-5.2-codex");
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are a coding expert."),
new(ChatRole.User, "Write a complex regex for email validation and explain it.")
};
// Set reasoning level and other options
var options = new OpenAiGptChatOptions
{
ReasoningLevel = OpenAiGptReasoningLevel.High,
Temperature = 0.5f
};
// Streaming with reasoning
await foreach (var update in gptClient.GetStreamingResponseAsync(messages, options))
{
if (update is ReasoningChatResponseUpdate r)
{
if (r.Thinking)
Console.Write(r.Text); // Reasoning phase
else
Console.Write(r.Text); // Answer phase
}
}
🆕 GPT-OSS-120B with Reasoning (OpenRouter)
using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.VllmChatClient.GptOss;
[Description("Gets weather information")]
static string GetWeather(string city) => $"Weather in {city}: Sunny, 25°C";
// Initialize GPT-OSS client
IChatClient gptOssClient = new VllmGptOssChatClient(
"https://openrouter.ai/api/v1",
"your-api-token",
"openai/gpt-oss-120b");
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "You are a helpful assistant with reasoning capabilities."),
new ChatMessage(ChatRole.User, "What's the weather like in Tokyo? Please think through this step by step.")
};
var chatOptions = new ChatOptions
{
Temperature = 0.7f,
ReasoningLevel = GptOssReasoningLevel.Medium, // Set reasoning level,controls depth of reasoning
Tools = [AIFunctionFactory.Create(GetWeather)]
};
// Stream response with reasoning
string reasoning = string.Empty;
string answer = string.Empty;
await foreach (var update in gptOssClient.GetStreamingResponseAsync(messages, chatOptions))
{
if (update is ReasoningChatResponseUpdate reasoningUpdate)
{
if (reasoningUpdate.Thinking)
{
// Capture the model's reasoning process
reasoning += reasoningUpdate.Reasoning;
Console.WriteLine($"🧠 Thinking: {reasoningUpdate.Reasoning}");
}
else
{
// Capture the final answer
answer += reasoningUpdate.Text;
Console.WriteLine($"💬 Response: {reasoningUpdate.Text}");
}
}
}
Console.WriteLine($"\n📝 Full Reasoning: {reasoning}");
Console.WriteLine($"✅ Final Answer: {answer}");
🆕 Qwen3-Next 80B (Thinking vs Instruct)
using Microsoft.Extensions.AI;
// Choose model: reasoning variant or instruct variant
var apiKey = "your-dashscope-api-key";
// Reasoning (with thinking chain)
IChatClient thinkingClient = new VllmQwen3NextChatClient(
"https://dashscope.aliyuncs.com/compatible-mode/v1/{1}",
apiKey,
"qwen3-next-80b-a3b-thinking");
// Instruct (no reasoning chain)
IChatClient instructClient = new VllmQwen3NextChatClient(
"https://dashscope.aliyuncs.com/compatible-mode/v1/{1}",
apiKey,
"qwen3-next-80b-a3b-instruct");
var messages = new List<ChatMessage>
{
new(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new(ChatRole.User, "简单介绍下量子计算。")
};
// Reasoning streaming example
await foreach (var update in thinkingClient.GetStreamingResponseAsync(messages))
{
if (update is ReasoningChatResponseUpdate r)
{
if (r.Thinking)
Console.Write(r.Text); // reasoning / thinking phase
else
Console.Write(r.Text); // final answer phase
}
else
{
Console.Write(update.Text);
}
}
// Instruct (single response)
var resp = await instructClient.GetResponseAsync(messages);
Console.WriteLine(resp.Text);
🆕 Qwen3-Next Advanced Function Calls (Serial / Parallel / Manual Streaming)
using Microsoft.Extensions.AI;
[Description("获取南宁的天气情况")]
static string GetWeather() => "现在正在下雨。";
[Description("Searh")]
static string Search([Description("需要搜索的问题")] string question) => "南宁市青秀区方圆广场北面站前路1号。";
IChatClient baseClient = new VllmQwen3NextChatClient(
"https://dashscope.aliyuncs.com/compatible-mode/v1/{1}",
Environment.GetEnvironmentVariable("VLLM_ALIYUN_API_KEY"),
"qwen3-next-80b-a3b-thinking");
IChatClient client = new ChatClientBuilder(baseClient)
.UseFunctionInvocation()
.Build();
var messages = new List<ChatMessage>
{
new(ChatRole.System, "你是一个智能助手,名字叫菲菲,调用工具时仅能输出工具调用内容,不能输出其他文本。"),
new(ChatRole.User, "南宁火车站在哪里?我出门需要带伞吗?")
};
ChatOptions opts = new()
{
Tools = [AIFunctionFactory.Create(GetWeather), AIFunctionFactory.Create(Search)]
};
// Parallel tool calls example (also supports serial depending on prompt)
await foreach (var update in client.GetStreamingResponseAsync(messages, opts))
{
if (update is ReasoningChatResponseUpdate r)
{
Console.Write(r.Text);
}
else
{
Console.Write(update.Text);
}
}
// Manual streaming tool orchestration
messages = new()
{
new(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new(ChatRole.User, "南宁火车站在哪里?我出门需要带伞吗?")
};
string answer = string.Empty;
await foreach (var update in client.GetStreamingResponseAsync(messages, opts))
{
if (update.FinishReason == ChatFinishReason.ToolCalls)
{
foreach (var fc in update.Contents.OfType<FunctionCallContent>())
{
messages.Add(new ChatMessage(ChatRole.Assistant, [fc]));
if (fc.Name == "GetWeather")
{
messages.Add(new ChatMessage(ChatRole.Tool, [new FunctionResultContent(fc.CallId, GetWeather())]));
}
else if (fc.Name == "Search")
{
messages.Add(new ChatMessage(ChatRole.Tool, [new FunctionResultContent(fc.CallId, Search("南宁火车站"))]));
}
}
}
else
{
answer += update.Text;
}
}
Console.WriteLine(answer);
🆕 JSON-only Output (No Code Block)
using Microsoft.Extensions.AI;
var messages = new List<ChatMessage>
{
new(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new(ChatRole.User, "请输出json格式的问候语,不要使用 codeblock。")
};
var options = new ChatOptions { MaxOutputTokens = 100 };
var resp = await baseClient.GetResponseAsync(messages, options);
var text = resp.Text; // Ensure no ``` code blocks and extract JSON via regex if needed
Qwen3 with Reasoning Toggle
using Microsoft.Extensions.AI;
[Description("Gets the weather")]
static string GetWeather() => Random.Shared.NextDouble() > 0.1 ? "It's sunny" : "It's raining";
IChatClient vllmclient = new VllmQwen3ChatClient("http://localhost:8000/{0}/{1}", null, "qwen3");
IChatClient client2 = new ChatClientBuilder(vllmclient)
.UseFunctionInvocation()
.Build();
var messages2 = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new ChatMessage(ChatRole.User, "今天天气如何?")
};
Qwen3ChatOptions chatOptions = new()
{
Tools = [AIFunctionFactory.Create(GetWeather)],
NoThinking = true // Toggle reasoning on/off
};
string res = string.Empty;
await foreach (var update in client2.GetStreamingResponseAsync(messages2, chatOptions))
{
res += update.Text;
}
QwQ with Full Reasoning Support
using Microsoft.Extensions.AI;
[Description("Gets the weather")]
static string GetWeather() => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining";
IChatClient vllmclient2 = new VllmQwqChatClient("http://localhost:8000/{0}/{1}", null, "qwq");
var messages3 = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new ChatMessage(ChatRole.User, "今天天气如何?")
};
ChatOptions chatOptions2 = new()
{
Tools = [AIFunctionFactory.Create(GetWeather)]
};
// Stream with reasoning separation
private async Task<(string answer, string reasoning)> StreamChatResponseAsync(
List<ChatMessage> messages, ChatOptions chatOptions)
{
string answer = string.Empty;
string reasoning = string.Empty;
await foreach (var update in vllmclient2.GetStreamingResponseAsync(messages, chatOptions))
{
if (update is ReasoningChatResponseUpdate reasoningUpdate)
{
if (!reasoningUpdate.Thinking)
{
answer += reasoningUpdate.Text;
}
else
{
reasoning += reasoningUpdate.Text;
}
}
else
{
answer += update.Text;
}
}
return (answer, reasoning);
}
var (answer3, reasoning3) = await StreamChatResponseAsync(messages3, chatOptions2);
DeepSeek-R1 with Reasoning
using Microsoft.Extensions.AI;
IChatClient client3 = new VllmDeepseekR1ChatClient(
"https://dashscope.aliyuncs.com/compatible-mode/v1/{1}",
"your-api-key",
"deepseek-r1");
var messages4 = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new ChatMessage(ChatRole.User, "你是谁?")
};
string res4 = string.Empty;
string think = string.Empty;
await foreach (ReasoningChatResponseUpdate update in client3.GetStreamingResponseAsync(messages4))
{
if (update.Thinking)
{
think += update.Text;
}
else
{
res4 += update.Text;
}
}
🆕 DeepSeek-V3.2 with Thinking Chain
using Microsoft.Extensions.AI;
// Initialize DeepSeek V3.2 client (DashScope API)
IChatClient dsV3 = new VllmDeepseekV3ChatClient(
"https://dashscope.aliyuncs.com/compatible-mode/v1/{1}",
"your-api-key",
"deepseek-v3.2");
var messages = new List<ChatMessage>
{
new(ChatRole.System, "你是一个智能助手,名字叫菲菲"),
new(ChatRole.User, "请解释一下相对论。")
};
// Enable thinking chain via VllmChatOptions
var options = new VllmChatOptions { ThinkingEnabled = true };
// Non-streaming: access reasoning via ReasoningChatResponse.Reason
var response = await dsV3.GetResponseAsync(messages, options);
if (response is ReasoningChatResponse reasoningResponse)
{
Console.WriteLine($"🧠 Thinking: {reasoningResponse.Reason}");
Console.WriteLine($"💬 Answer: {reasoningResponse.Text}");
}
// Streaming: distinguish thinking vs answer phases
string thinking = string.Empty;
string answer = string.Empty;
await foreach (var update in dsV3.GetStreamingResponseAsync(messages, options))
{
if (update is ReasoningChatResponseUpdate r)
{
if (r.Thinking)
thinking += r.Text; // reasoning phase
else
answer += r.Text; // final answer phase
}
else
{
answer += update.Text;
}
}
Console.WriteLine($"🧠 Thinking: {thinking}");
Console.WriteLine($"💬 Answer: {answer}");
🔧 Advanced Features
Reasoning Chain Processing
All reasoning-capable clients support the ReasoningChatResponseUpdate interface:
await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
if (update is ReasoningChatResponseUpdate reasoningUpdate)
{
if (reasoningUpdate.Thinking)
{
// Process thinking/reasoning content
Console.WriteLine($"🤔 Reasoning: {reasoningUpdate.Reasoning}");
}
else
{
// Process final response
Console.WriteLine($"💬 Answer: {reasoningUpdate.Text}");
}
}
}
Function Calling with Streaming
All clients support real-time function calling:
[Description("Search for location information")]
static string Search([Description("Search query")] string query)
{
return "Location found: Beijing, China";
}
ChatOptions options2 = new()
{
Tools = [AIFunctionFactory.Create(Search)],
Temperature = 0.7f
};
await foreach (var update in client.GetStreamingResponseAsync(messages, options2))
{
// Handle function calls and responses in real-time
foreach (var content in update.Contents)
{
if (content is FunctionCallContent functionCall)
{
Console.WriteLine($"🔧 Calling: {functionCall.Name}");
}
}
}
🏆 Performance & Optimizations
- Stream Processing: Efficient real-time response handling
- Memory Management: Optimized for long conversations
- Error Handling: Robust error recovery and debugging support
- JSON Parsing: High-performance serialization with System.Text.Json
- Connection Pooling: Shared HttpClient for optimal resource usage
📋 Requirements
- .NET 8.0 or higher
- Microsoft.Extensions.AI framework
- Newtonsoft.Json for JSON processing
- System.Text.Json for high-performance scenarios
🤝 Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests。
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Microsoft.Extensions.AI (>= 9.7.1)
- Newtonsoft.Json (>= 13.0.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.8.8 | 0 | 2/20/2026 |
| 1.8.7 | 34 | 2/19/2026 |
| 1.8.6 | 39 | 2/18/2026 |
| 1.8.5 | 85 | 2/17/2026 |
| 1.8.1 | 83 | 2/12/2026 |
| 1.8.0 | 77 | 2/11/2026 |
| 1.7.8 | 89 | 2/11/2026 |
| 1.7.6 | 82 | 2/11/2026 |
| 1.7.5 | 77 | 2/10/2026 |
| 1.7.4 | 83 | 2/10/2026 |
| 1.7.3 | 82 | 2/10/2026 |
| 1.7.2 | 87 | 2/10/2026 |
| 1.7.1 | 88 | 2/9/2026 |
| 1.7.0 | 89 | 2/9/2026 |
| 1.6.9 | 78 | 2/6/2026 |
| 1.6.8 | 151 | 1/19/2026 |
| 1.6.6 | 744 | 12/2/2025 |
| 1.6.5 | 664 | 12/2/2025 |
| 1.6.4 | 671 | 12/2/2025 |
| 1.6.3 | 147 | 11/28/2025 |