drittich.SemanticSlicer 1.4.2

.NET Standard 2.1

dotnet add package drittich.SemanticSlicer --version 1.4.2

NuGet\Install-Package drittich.SemanticSlicer -Version 1.4.2

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="drittich.SemanticSlicer" Version="1.4.2" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="drittich.SemanticSlicer" Version="1.4.2" />
                    

                            Directory.Packages.props

<PackageReference Include="drittich.SemanticSlicer" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add drittich.SemanticSlicer --version 1.4.2

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: drittich.SemanticSlicer, 1.4.2"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package drittich.SemanticSlicer@1.4.2

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=drittich.SemanticSlicer&version=1.4.2
                    

                            Install as a Cake Addin

#tool nuget:?package=drittich.SemanticSlicer&version=1.4.2
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

SemanticSlicer

SemanticSlicer is a C# library for slicing text data into smaller pieces while attempting to break the text on meaningful boundaries.

GitHub: https://github.com/drittich/SemanticSlicer

Overview
Installation
Sample Usage
Chunk Order
Additional Metadata
Adding Headers to Chunks
License
Contact

Overview

This library accepts text and will break it into smaller chunks, typically useful for when creating LLM AI embeddings.

Installation

The package name is drittich.SemanticSlicer. You can install this from Nuget via the command line:

dotnet add package drittich.SemanticSlicer

or from the Package Manager Console:

NuGet\Install-Package drittich.SemanticSlicer

Sample Usage

Simple text document:

// The default options uses text separators, a max chunk size of 1,000, and 
// cl100k_base encoding to count tokens.
var slicer = new Slicer();
var text = File.ReadAllText("MyDocument.txt");
var documentChunks = slicer.GetDocumentChunks(text);

Markdown document:

// Let's use Markdown separators and reduce the chunk size
var options = new SlicerOptions { MaxChunkTokenCount = 600, Separators = Separators.Markdown };
var slicer = new Slicer(options);
var text = File.ReadAllText("MyDocument.md");
var documentChunks = slicer.GetDocumentChunks(text);

HTML document:

var options = new SlicerOptions { Separators = Separators.Html };
var slicer = new Slicer(options);
var text = File.ReadAllText("MyDocument.html");
var documentChunks = slicer.GetDocumentChunks(text);

Removing HTML tags:

For any content you can choose to remove HTML tags from the chunks to minimize the number of tokens. The inner text is preserved, and if there is a <Title> tag the title will be pre-pended to the result:

// Let's remove the HTML tags as they just consume a lot of tokens without adding much value
var options = new SlicerOptions { Separators = Separators.Html, StripHtml = true };
var slicer = new Slicer(options);
var text = File.ReadAllText("MyDocument.html");
var documentChunks = slicer.GetDocumentChunks(text);

Custom separators:

You can pass in your own list if of separators if you wish, e.g., if you wish to add support for other documents.

Chunk Order

Chunks will be returned in the order they were found in the document, and contain an Index property you can use to put them back in order if necessary.

Additional Metadata

You can pass any additional metadata you wish in as a dictionary, and it will be returned with each document chunk, so it's easy to persist. You might use the metadata to store the document id, title or last modified date.

var slicer = new Slicer();
var text = File.ReadAllText("MyDocument.txt");
var metadata = new Dictionary<string, object?>();
metadata["Id"] = 123;
metadata["FileName"] = "MyDocument.txt";
var documentChunks = slicer.GetDocumentChunks(text, metadata);
// All chunks returned will have a Metadata property with the data you passed in.

Adding Headers to Chunks

If you wish you can pass a header to be included at the top of each chunk. Example use cases are to include the document title or tags as part of the chunk content to help maintain context.

var slicer = new Slicer();
var fileName = "MyDocument.txt";
var text = File.ReadAllText(fileName);
var header = $"FileName: {fileName}";
var documentChunks = slicer.GetDocumentChunks(text, null, header);

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

If you have any questions or feedback, please open an issue on this repository.

Product	Compatible and additional computed target framework versions.
.NET	net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.
.NET Core	netcoreapp3.0 was computed. netcoreapp3.1 was computed.
.NET Standard	netstandard2.1 is compatible.
MonoAndroid	monoandroid was computed.
MonoMac	monomac was computed.
MonoTouch	monotouch was computed.
Tizen	tizen60 was computed.
Xamarin.iOS	xamarinios was computed.
Xamarin.Mac	xamarinmac was computed.
Xamarin.TVOS	xamarintvos was computed.
Xamarin.WatchOS	xamarinwatchos was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

.NETStandard 2.1
- HtmlAgilityPack (>= 1.12.1)
- Tiktoken (>= 2.2.0)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on drittich.SemanticSlicer:

Package	Downloads
TiDB.Vector Core C# SDK for TiDB Vector Search with OpenAI integration built-in. Features fluent builder API, vector upsert/search, RAG capabilities, and optional chunking. Supports both OpenAI and OpenAI-compatible endpoints.	395

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.4.2	2,752	6/19/2025
1.4.0	5,252	11/9/2024
1.3.4	138	11/9/2024
1.2.0	10,827	12/3/2023
1.1.0	161	12/2/2023
1.0.0	199	11/13/2023

- Performance improvements