SimpleSIMD 3.3.0
Major bug in subtraction operator.
See the version list below for details.
dotnet add package SimpleSIMD --version 3.3.0
NuGet\Install-Package SimpleSIMD -Version 3.3.0
<PackageReference Include="SimpleSIMD" Version="3.3.0" />
paket add SimpleSIMD --version 3.3.0
#r "nuget: SimpleSIMD, 3.3.0"
// Install SimpleSIMD as a Cake Addin #addin nuget:?package=SimpleSIMD&version=3.3.0 // Install SimpleSIMD as a Cake Tool #tool nuget:?package=SimpleSIMD&version=3.3.0
SimpleSIMD
What is SIMD?
Single Instruction, Multiple Data (SIMD) units refer to hardware components that perform the same operation on multiple data operands concurrently.
The concurrency is performed on a single thread, while utilizing the full size of the processor register to perform several operations at one.
This approach could be combined with standard multithreading for massive performence boosts in numeric computations.
Goals And Purpose
- Single API to unify SIMD for All supported types
- Gain performence boost for mathematical computations using a simple API
- Simplifies SIMD usage, and to make it easy to integrate it into an already existing solutions
- Helps generalize several methemathical functions for supported types
- Performs less allocations compared to standard LINQ implementations
Available Functions
Comparison:
- Equal
- Greater
- GreaterOrEqual
- Less
- LessOrEqual
Elementwise:
- Negate
- Abs
- Add
- Divide
- Multiply
- Subtract
- And
- Or
- Xor
- Not
- Select
- Ternary (Conditional Select)
- Concat
- Sqrt
Reduction:
- Aggregate
- Sum
- Average
- Max
- Min
- Dot
General Purpose:
- All
- Any
- Contains
- IndexOf
- Fill
- Foreach
Auto-Generated Functions
For any of the Elementwise
functions, an auto-generated overload is generated, which doesn't accept Span<T> result
,
and instead creates T[]
internally and returns the result within this array.
For any of the functions with the Value Delagate pattern, an auto-generated overload is generated, which accepts regular delegates.
Note that using this overload results in performence losses. Check Value Delegates - Benchmark
section for more info.
Performance Benefits
A simple benchmark to demonstrate performance gains of using SIMD.
Benchmarked method was a Sum
over an int[]
.
Method | Length | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
LINQ | 10 | 58.428 ns | 1.1658 ns | 1.4743 ns | 9.65 |
Naive | 10 | 6.138 ns | 0.1226 ns | 0.1087 ns | 1.00 |
SIMD | 10 | 5.739 ns | 0.1397 ns | 0.1372 ns | 0.93 |
LINQ | 100 | 475.290 ns | 9.3530 ns | 17.7951 ns | 7.36 |
Naive | 100 | 65.447 ns | 0.8545 ns | 0.7575 ns | 1.00 |
SIMD | 100 | 12.879 ns | 0.2039 ns | 0.1592 ns | 0.20 |
LINQ | 1000 | 4,620.020 ns | 80.4166 ns | 71.2872 ns | 7.47 |
Naive | 1000 | 617.992 ns | 7.6832 ns | 7.1869 ns | 1.00 |
SIMD | 1000 | 78.865 ns | 0.7991 ns | 0.6673 ns | 0.13 |
LINQ | 10000 | 43,103.800 ns | 700.6532 ns | 655.3915 ns | 6.99 |
Naive | 10000 | 6,164.725 ns | 51.9217 ns | 48.5676 ns | 1.00 |
SIMD | 10000 | 738.459 ns | 14.7266 ns | 32.3252 ns | 0.13 |
LINQ | 100000 | 393,739.178 ns | 755.6571 ns | 631.0079 ns | 6.73 |
Naive | 100000 | 58,510.310 ns | 58.0928 ns | 54.3400 ns | 1.00 |
SIMD | 100000 | 8,897.370 ns | 102.2559 ns | 95.6502 ns | 0.15 |
Value Delegates
This library extensively uses the value delegate pattern. This pattern is used as a replacement for delegates.
Calling functions using this patten may feel unusual since it requires creation of structs to pass as arguments instead of delegates, but it is very beneficial performance-wise.
The performance difference makes using this pattern worthwhile in performance critical places.
Since the focus of this library is pure performance, we use this pattern wherever possible.
Wrap extension methods are included (SimpleSimd.Wrapper.Wrap(delegate)
) to wrap regular delegates as Value Delegates.
Note that wrapping a regular delegate results in a performance hit - prefer using Value Delegates directly as shown below.
Usage:
using System;
using System.Numerics;
using SimpleSimd;
namespace MyProgram
{
class Program
{
static void Main()
{
// Creating the data
// Can be int[], Span<int>, ReadOnlySpan<int>
int[] Data = GetData()
// We need to create 2 structs which will serve as a replacement for delegates
SimdOps<int>.Sum(Data, new VecSelector(), new Selector());
}
}
// A struct which is used as Vector<int> selector
// Inheritence from IFunc is according to Sum() signature
struct VecSelector : IFunc<Vector<int>, Vector<int>>
{
public Vector<int> Invoke(Vector<int> param) => param * 2;
}
// A struct which is used as int selector
// Inheritence from IFunc is according to Sum() signature
struct Selector : IFunc<int, int>
{
public int Invoke(int param) => param * 2;
}
}
benchmark:
Both of the benchmarked methods have the exactly same code, both of them are accelerated using SIMD,
the only difference is the argument types.
// Delegate, baseline
public static T Sum(Span<T> span, Func<Vector<T>, Vector<T>> vSelector, Func<T, T> selector)
// ValueDelegate
public static T Sum<F1, F2>(in Span<T> span, F1 vSelector, F2 selector)
where F1 : struct, IFunc<Vector<T>, Vector<T>>
where F2 : struct, IFunc<T, T>
Method | Length | Mean | Error | StdDev | Ratio |
---|---|---|---|---|---|
Delegate | 10 | 10.697 ns | 0.0155 ns | 0.0145 ns | 1.00 |
ValueDelegate | 10 | 5.069 ns | 0.0206 ns | 0.0182 ns | 0.47 |
Delegate | 100 | 40.812 ns | 0.0977 ns | 0.0913 ns | 1.00 |
ValueDelegate | 100 | 11.732 ns | 0.0149 ns | 0.0139 ns | 0.29 |
Delegate | 1000 | 302.164 ns | 3.1291 ns | 2.6130 ns | 1.00 |
ValueDelegate | 1000 | 66.808 ns | 0.2692 ns | 0.2518 ns | 0.22 |
Delegate | 10000 | 2,884.803 ns | 8.9309 ns | 7.4577 ns | 1.00 |
ValueDelegate | 10000 | 585.193 ns | 0.8926 ns | 0.6969 ns | 0.20 |
Delegate | 100000 | 28,920.414 ns | 267.4154 ns | 250.1406 ns | 1.00 |
ValueDelegate | 100000 | 8,519.340 ns | 41.2833 ns | 38.6164 ns | 0.29 |
Delegate | 1000000 | 304,228.749 ns | 1,995.9951 ns | 1,769.3976 ns | 1.00 |
ValueDelegate | 1000000 | 85,619.207 ns | 316.5366 ns | 280.6015 ns | 0.28 |
Limitations
- Methods are not lazily evaluated as IEnumerable
- Old hardware might not support SIMD
- Supported collection types:
T[] where T : unmanaged
Span<T> where T : unmanaged
ReadOnlySpan<T> where T : unmanaged
- Supports only Primitive Numeric Types as array elements. Supported types are:
byte, sbyte
short, ushort
int, uint
long, ulong
float
double
Contributing
All ideas and suggestions are welcome. Feel free to open an issue if you have an idea or a suggestion that might improve this project. If you encounter a bug or have a feature request, please open a relevent issue.
License
This project is licensed under MIT license. For more info see the License File
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 is compatible. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. |
-
net5.0
- No dependencies.
NuGet packages (6)
Showing the top 5 NuGet packages that depend on SimpleSIMD:
Package | Downloads |
---|---|
FaceAiSharp
FaceAiSharp allows you to work with face-related computer vision tasks easily. It currently provides face detection, face recognition, facial landmarks detection, and eye state detection functionalities. FaceAiSharp leverages publicly available pretrained ONNX models to deliver accurate and efficient results and offers a convenient way to integrate them into your .NET applications. Whether you need to find faces, recognize individuals, detect facial landmarks, or determine eye states, FaceAiSharp simplifies the process with its simple API. ONNXRuntime is used for model inference, enabling hardware acceleration were possible. All processing is done locally, with no reliance on cloud services. This package contains just FaceAiSharp's managed code and does not include any ONNX models. Take a look at FaceAiSharp.Bundle for a batteries-included package with everything you need to get started. |
|
FaceAiSharp.Bundle
FaceAiSharp allows you to work with face-related computer vision tasks easily. It currently provides face detection, face recognition, facial landmarks detection, and eye state detection functionalities. FaceAiSharp leverages publicly available pretrained ONNX models to deliver accurate and efficient results and offers a convenient way to integrate them into your .NET applications. Whether you need to find faces, recognize individuals, detect facial landmarks, or determine eye states, FaceAiSharp simplifies the process with its simple API. ONNXRuntime is used for model inference, enabling hardware acceleration were possible. All processing is done locally, with no reliance on cloud services. This is a bundle package that installs FaceAiSharp's managed code and multiple AI models in the ONNX format. |
|
STensor
SIMD-accelerated generic tensor library |
|
PlatonAiPhoto
FaceAiSharp allows you to work with face-related computer vision tasks easily. It currently provides face detection, face recognition, facial landmarks detection, and eye state detection functionalities. FaceAiSharp leverages publicly available pretrained ONNX models to deliver accurate and efficient results and offers a convenient way to integrate them into your .NET applications. Whether you need to find faces, recognize individuals, detect facial landmarks, or determine eye states, FaceAiSharp simplifies the process with its simple API. ONNXRuntime is used for model inference, enabling hardware acceleration were possible. All processing is done locally, with no reliance on cloud services. This package contains just FaceAiSharp's managed code and does not include any ONNX models. Take a look at FaceAiSharp.Bundle for a batteries-included package with everything you need to get started. |
|
PlatonAiPhoto.Bundle
FaceAiSharp allows you to work with face-related computer vision tasks easily. It currently provides face detection, face recognition, facial landmarks detection, and eye state detection functionalities. FaceAiSharp leverages publicly available pretrained ONNX models to deliver accurate and efficient results and offers a convenient way to integrate them into your .NET applications. Whether you need to find faces, recognize individuals, detect facial landmarks, or determine eye states, FaceAiSharp simplifies the process with its simple API. ONNXRuntime is used for model inference, enabling hardware acceleration were possible. All processing is done locally, with no reliance on cloud services. This is a bundle package that installs FaceAiSharp's managed code and multiple AI models in the ONNX format. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated | |
---|---|---|---|
4.6.0 | 1,162 | 11/20/2022 | |
4.2.0-alpha | 177 | 8/17/2021 | |
3.3.1 | 222,662 | 6/6/2022 | |
3.3.0 | 804 | 8/31/2021 | |
3.1.0 | 402 | 4/20/2021 | |
2.5.1 | 356 | 1/25/2021 | |
2.4.3 | 454 | 10/6/2020 | |
2.4.2 | 396 | 10/6/2020 | |
2.4.1-beta | 289 | 10/6/2020 | |
2.4.0-beta | 279 | 10/6/2020 | |
2.3.1 | 433 | 10/4/2020 | |
2.3.0 | 400 | 9/28/2020 | |
2.2.0 | 402 | 9/24/2020 | |
2.1.1 | 418 | 9/21/2020 | |
2.0.1 | 416 | 9/18/2020 | |
2.0.0 | 465 | 9/17/2020 | |
1.9.0 | 385 | 9/17/2020 | |
1.8.0 | 452 | 9/16/2020 | |
1.7.0 | 719 | 9/13/2020 | |
1.6.5 | 560 | 9/12/2020 | |
1.6.3 | 894 | 9/8/2020 | |
1.6.2 | 584 | 9/7/2020 | |
1.5.0 | 706 | 9/5/2020 | |
1.2.0 | 593 | 9/5/2020 | |
1.1.0 | 587 | 9/3/2020 | |
1.0.0 | 616 | 9/1/2020 |
Added source generator to generate regular delegate overloads, for any method accepting value delegates.
Removed wrappers as obsolete now - use generated methods instead.
Added source generator to generate functions that return T[] as result for any method accepting a result Span.