SharpAI 1.0.14

.NET 8.0

dotnet add package SharpAI --version 1.0.14

NuGet\Install-Package SharpAI -Version 1.0.14

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="SharpAI" Version="1.0.14" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="SharpAI" Version="1.0.14" />
                    

                            Directory.Packages.props

<PackageReference Include="SharpAI" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add SharpAI --version 1.0.14

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: SharpAI, 1.0.14"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package SharpAI@1.0.14

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=SharpAI&version=1.0.14
                    

                            Install as a Cake Addin

#tool nuget:?package=SharpAI&version=1.0.14
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

SharpAI

Transform your .NET applications into AI powerhouses - embed models directly or deploy as an Ollama-compatible and OpenAI-compatible API server. No cloud dependencies, no limits, just local embeddings and inference.

A .NET library for local AI model inference with Ollama-compatible and OpenAI-compatible REST APIs

Embeddings • Completions • Chat • Built on LlamaSharp • GGUF Models Only

🚀 Features

Ollama and OpenAI Compatible REST API Server - Provides endpoints compatible with API from Ollama and OpenAI
Model Management - Download and manage GGUF models from HuggingFace using Ollama APIs
Multiple Inference Types:
- Text embeddings generation
- Text completions
- Chat completions
Prompt Engineering Tools - Built-in helpers for formatting prompts for different model types
GPU Acceleration - Automatic CUDA detection when available
Streaming Support - Real-time token streaming for completions
SQLite Model Registry - Tracks model metadata and file information

📦 Installation

Install SharpAI via NuGet:

dotnet add package SharpAI

Or via Package Manager Console:

Install-Package SharpAI

📖 Core Components

AIDriver

The main entry point that provides access to all functionality:

using SharpAI;
using SyslogLogging;

// Initialize the AI driver
var ai = new AIDriver(
    logging: new LoggingModule(), 
    databaseFilename: "./sharpai.db",     
    huggingFaceApiKey: "hf_xxxxxxxxxxxx", 
    modelDirectory: "./models/"           
);

// Download a model from HuggingFace (GGUF format)
await ai.Models.Add(
    name: "microsoft/phi-2",
    quantizationPriority: null,
    progressCallback: (url, bytesDownloaded, percentComplete) =>
    {
        Console.WriteLine($"Progress: {percentComplete:P0}");
    });

// Generate a completion
string response = await ai.Completion.GenerateCompletion(
    model: "microsoft/phi-2",
    prompt: "Once upon a time",
    maxTokens: 512,
    temperature: 0.7f
);

The AIDriver provides access to APIs via:

ai.Models - Model management operations
ai.Embeddings - Embedding generation
ai.Completion - Text completion generation
ai.Chat - Chat completion generation

ModelDriver

Manages model downloads and lifecycle:

// List all downloaded models
List<ModelFile> models = ai.Models.All();

// Get a specific model
ModelFile model = ai.Models.GetByName("microsoft/phi-2");

// Download a new model from HuggingFace
ModelFile downloaded = await ai.Models.Add(
    name: "meta-llama/Llama-2-7b-chat-hf",
    quantizationPriority: null,
    progressCallback: null);

// Delete a model
ai.Models.Delete("microsoft/phi-2");

// Get the filesystem path for a model
string modelPath = ai.Models.GetFilename("microsoft/phi-2");

🗄️ Model Management

SharpAI automatically handles downloading GGUF files from HuggingFace. Only GGUF format models are supported.

Queries available GGUF files for a model
Selects appropriate quantization based on file naming conventions
Downloads and stores models with metadata
Tracks model information in local Sqlite model registry

Model metadata includes:

Model name and GUID
File size and hashes (MD5, SHA1, SHA256)
Quantization type
Source URL
Creation timestamps

🔢 Generating Embeddings

Generate vector embeddings for text:

// Single text embedding
float[] embedding = await ai.Embeddings.Generate(
    model: "microsoft/phi-2",
    input: "This is a sample text"
);

// Multiple text embeddings
string[] texts = { "First text", "Second text", "Third text" };
float[][] embeddings = await ai.Embeddings.Generate(
    model: "microsoft/phi-2",
    inputs: texts
);

📝 Text Completions

Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.

Generate text continuations:

// Non-streaming completion
string completion = await ai.Completion.GenerateCompletion(
    model: "microsoft/phi-2",
    prompt: "The meaning of life is",
    maxTokens: 512,
    temperature: 0.7f
);

// Streaming completion
await foreach (string token in ai.Completion.GenerateCompletionStreaming(
    model: "microsoft/phi-2",
    prompt: "Write a poem about",
    maxTokens: 512,
    temperature: 0.8f))
{
    Console.Write(token);
}

💬 Chat Completions

Note: for best results, structure your prompt in a manner appropriate for the model you are using. See the prompt formatting section below.

Generate conversational responses:

// Non-streaming chat
string response = await ai.Chat.GenerateCompletion(
    model: "microsoft/phi-2",
    prompt: chatFormattedPrompt,  // Prompt should be formatted for chat
    maxTokens: 512,
    temperature: 0.7f
);

// Streaming chat
await foreach (string token in ai.Chat.GenerateCompletionStreaming(
    model: "microsoft/phi-2",
    prompt: chatFormattedPrompt,
    maxTokens: 512,
    temperature: 0.7f))
{
    Console.Write(token);
}

🛠️ Prompt Formatting

SharpAI includes prompt builders to format conversations for different model types:

Chat Message Formatting

using SharpAI.Prompts;

var messages = new List<ChatMessage>
{
    new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
    new ChatMessage { Role = "user", Content = "What is the capital of France?" },
    new ChatMessage { Role = "assistant", Content = "The capital of France is Paris." },
    new ChatMessage { Role = "user", Content = "What is its population?" }
};

// Format for different model types
string chatMLPrompt = PromptBuilder.Build(ChatFormat.ChatML, messages);
/* Output:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>
<|im_start|>user
What is its population?<|im_end|>
<|im_start|>assistant
*/

string llama2Prompt = PromptBuilder.Build(ChatFormat.Llama2, messages);
/* Output:
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

What is the capital of France? [/INST] The capital of France is Paris. </s><s>[INST] What is its population? [/INST] 
*/

string simplePrompt = PromptBuilder.Build(ChatFormat.Simple, messages);
/* Output:
system: You are a helpful assistant.
user: What is the capital of France?
assistant: The capital of France is Paris.
user: What is its population?
assistant:
*/

Supported chat formats:

Simple - Basic role: content format (generic models, base models)
ChatML - OpenAI ChatML format (GPT models, models fine-tuned with ChatML) including Qwen
Llama2 - Llama 2 instruction format (Llama-2-Chat models)
Llama3 - Llama 3 format (Llama-3-Instruct models)
Alpaca - Alpaca instruction format (Alpaca, Vicuna, WizardLM, and many Llama-based fine-tunes)
Mistral - Mistral instruction format (Mistral-Instruct, Mixtral-Instruct models)
HumanAssistant - Human/Assistant format (Anthropic Claude-style training, some chat models)
Zephyr - Zephyr model format (Zephyr beta/alpha models)
Phi - Microsoft Phi format (Phi-2, Phi-3 models)
DeepSeek - DeepSeek format (DeepSeek-Coder, DeepSeek-LLM models)

If you are unsure which your model supports, choose Simple.

Text Generation Formatting

using SharpAI.Prompts;

// Simple instruction
string instructionPrompt = TextPromptBuilder.Build(
    TextGenerationFormat.Instruction,
    "Write a haiku about programming"
);
/* Output:
### Instruction:
Write a haiku about programming

### Response:
*/

// Code generation with context
var context = new Dictionary<string, string>
{
    ["language"] = "python",
    ["requirements"] = "Include error handling"
};

string codePrompt = TextPromptBuilder.Build(
    TextGenerationFormat.CodeGeneration,
    "Write a function to parse JSON",
    context
);
/* Output:
Language: python
Task: Write a function to parse JSON
Requirements: Include error handling

```python
*/

// Question-answer format
string qaPrompt = TextPromptBuilder.Build(
    TextGenerationFormat.QuestionAnswer,
    "What causes rain?"
);
/* Output:
Question: What causes rain?

Answer:
*/

// Few-shot examples
var examples = new List<(string input, string output)>
{
    ("2+2", "4"),
    ("5*3", "15")
};

string fewShotPrompt = TextPromptBuilder.BuildWithExamples(
    TextGenerationFormat.QuestionAnswer,
    "7-3",
    examples
);
/* Output:
Examples:

Question: 2+2

Answer:
4

---

Question: 5*3

Answer:
15

---

Now complete the following:

Question: 7-3

Answer:
*/

Supported text generation formats:

Raw - No formatting
Completion - Continuation format
Instruction - Instruction/response format
QuestionAnswer - Q&A format
CreativeWriting - Story/creative format
CodeGeneration - Code generation format
Academic - Academic writing format
ListGeneration - List creation format
TemplateFilling - Template completion
Dialogue - Dialogue generation

🌐 API Server

SharpAI includes a fully-functional REST API server through the SharpAI.Server project, which provides Ollama-compatible endpoints. The server acts and behaves like Ollama (with minor gaps), allowing you to use existing Ollama clients and integrations with SharpAI.

Ollama API endpoints include:

/api/generate - Text generation
/api/chat - Chat completions
/api/embed - Generate embeddings
/api/tags - List available models
/api/pull - Download models from HuggingFace

OpenAI API endpoints include:

/v1/embeddings - Generate embeddings
/v1/completions - Text generation
/v1/chat/completions - Chat completions

⚙️ Requirements

.NET 8.0 or higher
Windows, Linux, or macOS
HuggingFace API key (for downloading models)
(Optional) GPU for acceleration (see GPU Support section)

Tested Platforms

SharpAI has been tested on:

Windows 11
macOS Sequoia
Ubuntu 24.04

📊 Model Information

When models are downloaded, the following information is tracked:

Model name and unique GUID
File size
MD5, SHA1, and SHA256 hashes
Quantization type (e.g., Q4_K_M, Q5_K_S)
Source URL from HuggingFace
Download and creation timestamps

🔧 Configuration

Directory Structure

Models are stored in the specified modelDirectory with files named by their GUID. Model metadata is stored in the SQLite database specified by databaseFilename.

GPU Support

The library automatically detects CUDA availability and optimizes layer allocation. The LlamaSharpEngine determines optimal GPU layers based on available hardware.

LlamaSharp supports multiple GPU backends through LlamaSharp and llama.cpp:

NVIDIA GPUs - via CUDA
AMD GPUs - via ROCm (Linux) or Vulkan
Apple Silicon - via Metal (M1, M2, M3, etc.)
Intel GPUs - via SYCL or Vulkan

Note: The actual GPU support depends on the LlamaSharp build and backend availability on your system. CUDA support is most mature, while other backends may require specific LlamaSharp builds or additional setup.

🐳 Running in Docker

SharpAI.Server is available as a Docker image, providing an easy way to deploy the Ollama-compatible API server without local installation.

Quick Start

Using Docker Run

For Windows:

run.bat v1.0.0

For Linux/macOS:

./run.sh v1.0.0

Using Docker Compose

For Windows:

compose-up.bat

For Linux/macOS:

./compose-up.sh

Prerequisites

Before running the Docker container, ensure you have:

Configuration file: Create a sharpai.json configuration file in your working directory
Directory structure: The container expects the following directories to exist:
- ./logs/ - For application logs
- ./models/ - For storing downloaded GGUF models

Docker Image

The official Docker image is available at: jchristn/sharpai. Refer to the docker directory for assets useful for running in Docker and Docker Compose.

Volume Mappings

The container uses several volume mappings for persistence:

Host Path	Container Path	Description
`./sharpai.json`	`/app/sharpai.json`	Configuration file
`./sharpai.db`	`/app/sharpai.db`	SQLite database for model registry
`./logs/`	`/app/logs/`	Application logs
`./models/`	`/app/models/`	Downloaded GGUF model files

Configuration

Modify the sharpai.json file to supply your configuration.

Networking

The container exposes port 8000 by default.

You can access Ollama APIs at:

http://localhost:8000/api/tags - List available models
http://localhost:8000/api/pull - Pull a model
http://localhost:8000/api/generate - Generate text
http://localhost:8000/api/chat - Chat completions
http://localhost:8000/api/embed - Generate embeddings

You can access OpenAI APIs at:

http://localhost:8000/v1/embeddings - Generate embeddings
http://localhost:8000/v1/completions - Generate text
http://localhost:8000/v1/chat/completions - Chat completions

Example Usage

Create the required directory structure:
```
mkdir logs models
```
Create your sharpai.json configuration file

Run the container:

# Windows
run.bat v1.0.0

# Linux/macOS
./run.sh v1.0.0

Download a model using the API (GGUF format required):

curl http://localhost:8000/api/pull \
  -d '{"model":"QuantFactory/Qwen2.5-3B-GGUF"}'

Generate text:

curl http://localhost:8000/api/generate \
  -d '{
    "model": "QuantFactory/Qwen2.5-3B-GGUF",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

Docker Compose

For production deployments, you can use Docker Compose. Create a compose.yaml file:

version: '3.8'

services:
  sharpai:
    image: jchristn/sharpai:v1.0.0
    ports:
      - "8000:8000"
    volumes:
      - ./sharpai.json:/app/sharpai.json
      - ./sharpai.db:/app/sharpai.db
      - ./logs:/app/logs
      - ./models:/app/models
    environment:
      - TERM=xterm-256color
    restart: unless-stopped

Then run:

docker compose up -d

GPU Support in Docker

To enable GPU acceleration in Docker:

NVIDIA GPUs

Install the NVIDIA Container Toolkit and modify your run command:

docker run --gpus all \
  -p 8000:8000 \
  -v ./sharpai.json:/app/sharpai.json \
  -v ./sharpai.db:/app/sharpai.db \
  -v ./logs:/app/logs \
  -v ./models:/app/models \
  jchristn/sharpai:v1.0.0

For Docker Compose, add:

services:
  sharpai:
    # ... other configuration ...
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Troubleshooting

Container exits immediately: Check that sharpai.json exists and is valid JSON
Models not persisting: Ensure the ./models/ directory has proper write permissions
Cannot connect to API: Verify port 8000 is not already in use and firewall rules allow access
Out of memory errors: Large models may require significant RAM. Consider using quantized models or adjusting Docker memory limits

📚 Version History

Please see the CHANGELOG.md file for detailed version history and release notes.

Have a bug, feature request, or idea? Please file an issue on our GitHub repository. We welcome community input on our roadmap!

📄 License

This project is licensed under the MIT License.

🙏 Acknowledgments

Built on LlamaSharp for GGUF model inference
Model hosting by HuggingFace
Inspired by (and forever grateful to) Ollama for API compatibility
Special thanks to the community of developers that helped build, test, and refine SharpAI

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Inputty (>= 1.0.13)
- LLamaSharp (>= 0.25.0)
- LLamaSharp.Backend.Cpu (>= 0.25.0)
- LLamaSharp.Backend.Cuda12 (>= 0.25.0)
- RestWrapper (>= 3.1.8)
- SyslogLogging (>= 2.0.11)
- System.Text.Json (>= 9.0.9)
- WatsonORM.Sqlite (>= 3.0.14)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on SharpAI:

Package	Downloads
SharpAI.Sdk C# SDK for SharpAI - Local AI inference with Ollama and OpenAI compatible APIs	566

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.14	1,149	10/10/2025
1.0.12	280	8/29/2025
1.0.11	254	8/28/2025
1.0.10	259	8/27/2025
1.0.9	194	8/20/2025
1.0.8	275	8/8/2025
1.0.7	128	8/1/2025
1.0.6	171	7/31/2025
1.0.5	184	7/31/2025
1.0.4	213	7/30/2025
1.0.3	216	7/27/2025
1.0.2	422	7/25/2025
1.0.1	511	7/25/2025
1.0.0	132	7/12/2025

Initial release