ManySpeech.AliParaformerAsr 1.1.9

dotnet add package ManySpeech.AliParaformerAsr --version 1.1.9
                    
NuGet\Install-Package ManySpeech.AliParaformerAsr -Version 1.1.9
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ManySpeech.AliParaformerAsr" Version="1.1.9" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ManySpeech.AliParaformerAsr" Version="1.1.9" />
                    
Directory.Packages.props
<PackageReference Include="ManySpeech.AliParaformerAsr" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ManySpeech.AliParaformerAsr --version 1.1.9
                    
#r "nuget: ManySpeech.AliParaformerAsr, 1.1.9"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ManySpeech.AliParaformerAsr@1.1.9
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ManySpeech.AliParaformerAsr&version=1.1.9
                    
Install as a Cake Addin
#tool nuget:?package=ManySpeech.AliParaformerAsr&version=1.1.9
                    
Install as a Cake Tool

ManySpeech.AliParaformerAsr User Guide

I. Introduction

ManySpeech.AliParaformerAsr is a "speech recognition" library written in C#. It decodes ONNX models by calling Microsoft.ML.OnnxRuntime at the bottom layer. It has several notable features:

  • Multi-environment Support: It is compatible with multiple environments such as net461+, net60+, netcoreapp3.1, and netstandard2.0+, meeting the needs of different development scenarios.
  • Cross-platform Compilation: It supports cross-platform compilation, enabling it to be compiled and used on various operating systems like Windows, macOS, Linux, and Android, thus expanding its application range.
  • AOT Compilation Support: It is simple and convenient to use, facilitating developers to quickly integrate it into their projects.

II. Installation Methods

It is recommended to install via the NuGet package manager. There are two specific installation approaches as follows:

(A) Using Package Manager Console

Execute the following command in the "Package Manager Console" of Visual Studio:

Install-Package ManySpeech.AliParaformerAsr

(B) Using.NET CLI

Enter the following command in the command line to install:

dotnet add package ManySpeech.AliParaformerAsr

III. Configuration Instructions (Refer to the asr.yaml File)

In the asr.yaml configuration file used for decoding, most parameters do not need to be modified. However, there is a specific modifiable parameter:

  • use_itn: true: When using the SenseVoiceSmall model configuration, enabling this parameter can achieve inverse text normalization. For example, it can convert text like "123" into "one hundred and twenty-three", making the recognized text more in line with the normal reading habits.

IV. Code Calling Methods

(A) Offline (Non-streaming) Model Calling

  1. Adding Project References Add the following references in the code:
using ManySpeech.AliParaformerAsr;
using ManySpeech.AliParaformerAsr.Model;
  1. Model Initialization and Configuration
    • Initialization Method for the paraformer Model:
string applicationBase = AppDomain.CurrentDomain.BaseDirectory;
string modelName = "speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx";
string modelFilePath = applicationBase + "./" + modelName + "/model_quant.onnx";
string configFilePath = applicationBase + "./" + modelName + "/asr.yaml";
string mvnFilePath = applicationBase + "./" + modelName + "/am.mvn";
string tokensFilePath = applicationBase + "./" + modelName + "/tokens.txt";
OfflineRecognizer offlineRecognizer = new OfflineRecognizer(modelFilePath, configFilePath, mvnFilePath, tokensFilePath);
- **Initialization Method for the SeACo-paraformer Model**:
    - First, find the hotword.txt file in the model directory and add custom hotwords in the format of one Chinese word per line, such as adding industry-specific terms, specific personal names, and other hotword content.
    - Then, add relevant parameters in the code. The example is as follows:
string applicationBase = AppDomain.CurrentDomain.BaseDirectory;
string modelName = "paraformer-seaco-large-zh-timestamp-onnx-offline";
string modelFilePath = applicationBase + "./" + modelName + "/model.int8.onnx";
string modelebFilePath = applicationBase + "./" + modelName + "/model_eb.int8.onnx";
string configFilePath = applicationBase + "./" + modelName + "/asr.yaml";
string mvnFilePath = applicationBase + "./" + modelName + "/am.mvn";
string hotwordFilePath = applicationBase + "./" + modelName + "/hotword.txt";
string tokensFilePath = applicationBase + "./" + modelName + "/tokens.txt";
OfflineRecognizer offlineRecognizer = new OfflineRecognizer(modelFilePath: modelFilePath, configFilePath: configFilePath, mvnFilePath, tokensFilePath: tokensFilePath, modelebFilePath: modelebFilePath, hotwordFilePath: hotwordFilePath);
  1. Calling Process
List<float[]> samples = new List<float[]>();
// The code for converting the wav file into samples is omitted here. For details, refer to the example code in ManySpeech.AliParaformerAsr.Examples.
List<OfflineStream> streams = new List<OfflineStream>();
foreach (var sample in samples)
{
    OfflineStream stream = offlineRecognizer.CreateOfflineStream();
    stream.AddSamples(sample);
    streams.Add(stream);
}
List<OfflineRecognizerResultEntity> results = offlineRecognizer.GetResults(streams);
  1. Example of Output Results
Welcome everyone to experience the speech recognition model launched by DAMO Academy.

It's very convenient, but now it's different. The UK has left the EU, and the EU has an internal industrial chain with good dividends.

He must be home now for the light is on. (He must be at home because the light is on.) It's like there's a kind of reasoning or explanation for that feeling.

elapsed_milliseconds: 1502.8828125
total_duration: 40525.6875
rtf: 0.037084696280599808
end!

(B) Real-time (Streaming) Model Calling

  1. Adding Project References Add the following references in the code as well:
using ManySpeech.AliParaformerAsr;
using ManySpeech.AliParaformerAsr.Model;
  1. Model Initialization and Configuration
string encoderFilePath = applicationBase + "./" + modelName + "/encoder.int8.onnx";
string decoderFilePath = applicationBase + "./" + modelName + "/decoder.int8.onnx";
string configFilePath = applicationBase + "./" + modelName + "/asr.yaml";
string mvnFilePath = applicationBase + "./" + modelName + "/am.mvn";
string tokensFilePath = applicationBase + "./" + modelName + "/tokens.txt";
OnlineRecognizer onlineRecognizer = new OnlineRecognizer(encoderFilePath, decoderFilePath, configFilePath, mvnFilePath, tokensFilePath);
  1. Calling Process
List<float[]> samples = new List<float[]>();
// The code for converting the wav file into samples is omitted here. The following is the sample code for batch processing:
List<OnlineStream> streams = new List<OnlineStream>();
OnlineStream stream = onlineRecognizer.CreateOnlineStream();
foreach (var sample in samples)
{
    OnlineStream stream = onlineRecognizer.CreateOnlineStream();
    stream.AddSamples(sample);
    streams.Add(stream);
}
List<OnlineRecognizerResultEntity> results = onlineRecognizer.GetResults(streams);
// Example of single processing. Only one stream needs to be constructed.
OnlineStream stream = onlineRecognizer.CreateOnlineStream();
stream.AddSamples(sample);
OnlineRecognizerResultEntity result = onlineRecognizer.GetResult(stream);
// Refer to the example code in ManySpeech.AliParaformerAsr.Examples for details.
  1. Example of Output Results

It is precisely because of the existence of absolute justice that I accept the relative justice in reality, but don't deny the absolute justice just because of the relative justice in reality.

elapsed_milliseconds: 1389.3125
total_duration: 13052
rtf: 0.10644441464909593
Hello, World!
  • Voice Activity Detection: To solve the problem of reasonable segmentation of long audio, you can add the ManySpeech.AliFsmnVad library and install it with the following command:
dotnet add package ManySpeech.AliFsmnVad
  • Text Punctuation Prediction: To address the lack of punctuation in recognition results, you can add the ManySpeech.AliCTTransformerPunc library. The installation command is as follows:
dotnet add package ManySpeech.AliCTTransformerPunc

Specific calling examples can be referred to in the official documentation of the corresponding libraries or the ManySpeech.AliParaformerAsr.Examples project. This project is a console/desktop example project mainly used to demonstrate the basic functions of speech recognition, such as offline transcription and real-time recognition.

VI. Other Notes

  • Test Cases: Use ManySpeech.AliParaformerAsr.Examples as the test case.
  • Test CPU: The test CPU used is Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (2.59 GHz).
  • Supported Platforms:
    • Windows: Windows 7 SP1 and higher versions.
    • macOS: macOS 10.13 (High Sierra) and higher versions, and also supports iOS, etc.
    • Linux: Applicable to Linux distributions, but specific dependencies need to be met (see the list of Linux distributions supported by.NET 6 for details).
    • Android: Supports Android 5.0 (API 21) and higher versions.

VII. Model Download (Supported ONNX Models)

The following is the information related to the ONNX models supported by ManySpeech.AliParaformerAsr, including model names, types, supported languages, punctuation status, timestamp status, and download addresses, which facilitates you to choose the appropriate model for download and use according to specific requirements:

Model Name Type Supported Languages Punctuation Timestamp Download Address
paraformer-large-zh-en-onnx-offline Non-streaming Chinese, English No No huggingface, modelscope
paraformer-large-zh-en-timestamp-onnx-offline Non-streaming Chinese, English No Yes modelscope
paraformer-large-en-onnx-offline Non-streaming English No No modelscope
paraformer-large-zh-en-onnx-online Streaming Chinese, English No No modelscope
paraformer-large-zh-yue-en-timestamp-onnx-offline-dengcunqin-20240805 Non-streaming Chinese, Cantonese, English No Yes modelscope
paraformer-large-zh-yue-en-onnx-offline-dengcunqin-20240805 Non-streaming Chinese, Cantonese, English No No modelscope
paraformer-large-zh-yue-en-onnx-online-dengcunqin-20240208 Streaming Chinese, Cantonese, English No No modelscope
paraformer-seaco-large-zh-timestamp-onnx-offline Non-streaming Chinese, Hotwords No Yes modelscope
SenseVoiceSmall Non-streaming Chinese, Cantonese, English, Japanese, Korean Yes No modelscope, modelscope-split-embed
sensevoice-small-wenetspeech-yue-int8-onnx Non-streaming Cantonese, Chinese, English, Japanese, Korean Yes No modelscope

VIII. Model Introduction

(A) Model Usage

Paraformer is an efficient non-autoregressive end-to-end speech recognition framework proposed by the speech team of DAMO Academy. The Paraformer Chinese general-purpose speech recognition model in this project is trained with tens of thousands of hours of labeled audio in the industrial field, which endows the model with good general recognition performance. It can be widely applied in scenarios such as speech input methods, speech navigation, and intelligent meeting minutes, and has a relatively high recognition accuracy.

(B) Model Structure

The Paraformer model structure mainly consists of five parts: Encoder, Predictor, Sampler, Decoder, and Loss function. You can view its structural diagram here. The specific functions of each part are as follows:

  • Encoder: It can adopt different network structures, such as self-attention, conformer, SAN-M, etc., and is mainly responsible for extracting acoustic features from audio.
  • Predictor: It is a two-layer FFN (Feed Forward Neural Network). Its function is to predict the number of target words and extract the acoustic vectors corresponding to the target words, providing key data for subsequent recognition processing.
  • Sampler: It is a module without learnable parameters. It can generate semantic feature vectors based on the input acoustic vectors and target vectors, enriching the semantic information for recognition.
  • Decoder: Its structure is similar to that of the autoregressive model, but it is a bidirectional modeling (while the autoregressive model is unidirectional modeling). Through the bidirectional structure, it can better model the context and improve the accuracy of speech recognition.
  • Loss function: Besides including the Cross Entropy (CE) and Minimum Word Error Rate (MWER) as discriminative optimization objectives, it also covers the Predictor optimization objective Mean Absolute Error (MAE). These optimization objectives ensure the accuracy of the model.

(C) Main Highlights

  • Predictor Module: Based on the Continuous integrate-and-fire (CIF) predictor, it extracts the acoustic feature vectors corresponding to the target words. In this way, it can predict the number of target words in the speech more accurately and improve the accuracy of speech recognition.
  • Sampler: Through the sampling operation, it transforms the acoustic feature vectors and target word vectors into semantic feature vectors. Then, in cooperation with the bidirectional Decoder, it can significantly enhance the model's ability to understand and model the context, making the recognition results more in line with semantic logic.
  • MWER Training Criterion Based on Negative Sample Sampling: This training criterion helps the model optimize parameters better during the training process, reduces recognition errors, and improves the overall recognition performance.

(D) More Detailed Information

Reference [1] https://github.com/alibaba-damo-academy/FunASR

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-android34.0 is compatible.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-ios18.0 is compatible.  net8.0-maccatalyst was computed.  net8.0-maccatalyst18.0 is compatible.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net8.0-windows10.0.19041 is compatible.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 is compatible. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 is compatible. 
.NET Framework net461 is compatible.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 is compatible.  net48 is compatible.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.1.9 77 10/13/2025
1.1.8 187 9/23/2025
1.1.7 226 8/26/2025
1.1.6 94 8/23/2025
1.1.5 94 8/23/2025
1.1.4 162 8/19/2025
1.1.3 161 8/19/2025
1.1.2 175 8/18/2025
1.1.1 116 8/15/2025
1.1.0 118 8/15/2025
1.0.9 151 8/15/2025
1.0.8 162 8/11/2025
1.0.7 163 8/11/2025
1.0.6 239 8/6/2025
1.0.5 240 8/6/2025
1.0.4 326 6/10/2025