Mostlylucid.LucidRAG.Storage.Core 2.7.5

.NET 10.0

dotnet add package Mostlylucid.LucidRAG.Storage.Core --version 2.7.5

NuGet\Install-Package Mostlylucid.LucidRAG.Storage.Core -Version 2.7.5

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Mostlylucid.LucidRAG.Storage.Core" Version="2.7.5" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Mostlylucid.LucidRAG.Storage.Core" Version="2.7.5" />
                    

                            Directory.Packages.props

<PackageReference Include="Mostlylucid.LucidRAG.Storage.Core" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Mostlylucid.LucidRAG.Storage.Core --version 2.7.5

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Mostlylucid.LucidRAG.Storage.Core, 2.7.5"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Mostlylucid.LucidRAG.Storage.Core@2.7.5

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Mostlylucid.LucidRAG.Storage.Core&version=2.7.5
                    

                            Install as a Cake Addin

#tool nuget:?package=Mostlylucid.LucidRAG.Storage.Core&version=2.7.5
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Mostlylucid.Storage.Core

NuGet: Mostlylucid.LucidRAG.Storage.Core

Unified vector storage library for LucidRAG with support for InMemory, DuckDB, and Qdrant backends.

Overview

This library provides a single, unified interface (IVectorStore) for embedding storage and retrieval across all LucidRAG pipelines:

DocumentPipeline - PDF, DOCX, Markdown text embeddings
ImagePipeline - Image OCR, CLIP visual embeddings
DataPipeline - Data profile embeddings for semantic search

Supported Backends

Backend	Persistence	Use Case	Default Mode
InMemory	❌ Ephemeral	Tool/MCP mode, one-shot analysis	MCP server, CLI tools
DuckDB	✅ File-based	Standalone mode, development	Standalone apps
Qdrant	✅ Server-based	Production, distributed systems	Production deployments

Quick Start

Installation

dotnet add package Mostlylucid.LucidRAG.Storage.Core

Tool/MCP Mode (No Persistence)

For one-shot analysis where you don't need to persist embeddings:

services.AddVectorStoreForToolMode();

Standalone Mode (DuckDB Persistence)

For standalone apps with embedded persistence:

services.AddVectorStoreForStandaloneMode(dataDirectory: "./data");

Production Mode (Qdrant)

For production deployments with dedicated vector database:

services.AddVectorStoreForProductionMode(qdrantHost: "localhost", qdrantPort: 6334);

Custom Configuration

services.AddVectorStore(options =>
{
    options.Backend = VectorStoreBackend.DuckDB;
    options.PersistVectors = true;
    options.ReuseExistingEmbeddings = true;

    options.DuckDB.DatabasePath = "./vectors.duckdb";
    options.DuckDB.EnableVSS = true;
    options.DuckDB.VectorDimension = 384;
});

Usage

Initialize Collection

var vectorStore = serviceProvider.GetRequiredService<IVectorStore>();

var schema = new VectorStoreSchema
{
    VectorDimension = 384,
    DistanceMetric = VectorDistance.Cosine,
    StoreText = true  // false for privacy-preserving mode
};

await vectorStore.InitializeAsync("documents", schema);

Insert Documents

var documents = new[]
{
    new VectorDocument
    {
        Id = "doc1:segment0",
        Embedding = new float[384], // your embedding vector
        ParentId = "doc1",
        ContentHash = ContentHasher.ComputeHash(text),
        Text = text,
        Metadata = new Dictionary<string, object>
        {
            ["type"] = "text",
            ["language"] = "en",
            ["confidence"] = 0.95
        }
    }
};

await vectorStore.UpsertDocumentsAsync("documents", documents);

Search

var query = new VectorSearchQuery
{
    QueryEmbedding = queryVector,
    TopK = 10,
    MinScore = 0.7,
    IncludeDocument = false,  // privacy-preserving - only return IDs
    Filters = new Dictionary<string, object>
    {
        ["language"] = "en"
    }
};

var results = await vectorStore.SearchAsync("documents", query);

foreach (var result in results)
{
    Console.WriteLine($"{result.Id}: {result.Score:F3}");
    Console.WriteLine($"  Metadata: {string.Join(", ", result.Metadata)}");
}

Content Hash-Based Caching

// Get documents by content hash (for deduplication)
var hashes = new[] { "abc123", "def456" };
var cached = await vectorStore.GetDocumentsByHashAsync("documents", hashes);

if (cached.TryGetValue("abc123", out var doc))
{
    Console.WriteLine("Reusing existing embedding!");
}

Incremental Updates

// Remove stale segments, keep new ones
var validHashes = new[] { "hash1", "hash2", "hash3" };
await vectorStore.RemoveStaleDocumentsAsync("documents", parentId: "doc1", validHashes);

Architecture

IMultiVectorStore : IVectorStore (unified interface)
├── InMemoryVectorStore     (ephemeral, fastest)
├── DuckDBVectorStore       (persistent, embedded, HNSW indexes)
└── QdrantVectorStore       (persistent, server-based, production)

Used by:
├── DocSummarizer.Core      (text embeddings)
├── ImageSummarizer.Core    (OCR + CLIP embeddings)
├── AudioSummarizer.Core    (voice embeddings via EmbeddingStorageWave)
├── DataSummarizer.Core     (data profile embeddings)
└── LucidRAG                (all of the above)

Configuration Reference

appsettings.json

{
  "VectorStore": {
    "Backend": "DuckDB",
    "CollectionName": "documents",
    "PersistVectors": true,
    "ReuseExistingEmbeddings": true,

    "DuckDB": {
      "DatabasePath": "./data/vectors.duckdb",
      "EnableVSS": true,
      "EnablePersistence": true,
      "VectorDimension": 384,
      "HNSW": {
        "M": 16,
        "EfConstruction": 200,
        "EfSearch": 100
      }
    },

    "Qdrant": {
      "Host": "localhost",
      "Port": 6334,
      "ApiKey": null,
      "VectorSize": 384,
      "UseHttps": false
    },

    "InMemory": {
      "MaxDocuments": 0,
      "Verbose": false
    }
  }
}

Backend Comparison

InMemory

Pros:

Fastest (no disk I/O)
No external dependencies
Perfect for testing

Cons:

Data lost on restart
Limited by RAM
No HNSW acceleration

Best for: MCP tools, one-shot CLI analysis, unit tests

DuckDB

Pros:

File-based persistence
No external server needed
HNSW indexes (with VSS extension)
Graceful fallback if VSS unavailable

Cons:

Single-process (no distributed)
HNSW persistence experimental

Best for: Standalone apps, development, embedded scenarios

Qdrant

Pros:

Production-grade
Distributed/multi-node
Advanced filtering
Multi-vector support

Cons:

Requires external server
More complex deployment

Best for: Production, multi-tenant, high-scale

Performance

DuckDB HNSW vs Brute-Force

Documents	HNSW (VSS)	Brute-Force	Speedup
1,000	2ms	15ms	7.5×
10,000	5ms	150ms	30×
100,000	10ms	1,500ms	150×

Startup Time (Reindex on Restart)

Backend	1,000 docs	10,000 docs	Persistence
InMemory	~5s	~50s	❌
DuckDB	~2s	~5s	✅
Qdrant	~1s	~2s	✅

Privacy-Preserving Mode

Set StoreText = false in schema to avoid storing plaintext:

var schema = new VectorStoreSchema
{
    VectorDimension = 384,
    StoreText = false  // Only store embeddings, not text
};

Search results will only return IDs and scores, not text content.

Multi-Vector Support

IMultiVectorStore extends IVectorStore with named vector support for multi-modal embeddings. All three backends ( InMemory, DuckDB, Qdrant) implement it.

public interface IMultiVectorStore : IVectorStore
{
    Task InitializeMultiVectorAsync(
        string collectionName,
        VectorStoreSchema primarySchema,
        IEnumerable<NamedVectorConfig> namedVectors,
        CancellationToken ct = default);

    Task UpsertMultiVectorDocumentsAsync(
        string collectionName,
        IEnumerable<MultiVectorDocument> documents,
        CancellationToken ct = default);

    Task<List<VectorSearchResult>> SearchByNamedVectorAsync(
        string collectionName,
        string vectorName,
        VectorSearchQuery query,
        CancellationToken ct = default);
}

Usage

var store = serviceProvider.GetRequiredService<IMultiVectorStore>();

// Initialize with primary + named vectors
await store.InitializeMultiVectorAsync("images", primarySchema, new[]
{
    new NamedVectorConfig { Name = "visual", Dimension = 512 },
    new NamedVectorConfig { Name = "color", Dimension = 128 },
});

// Upsert documents with named vectors
var doc = new MultiVectorDocument
{
    Id = "img1", Embedding = primaryEmbedding,
    NamedVectors = { ["visual"] = clipVector, ["color"] = colorVector }
};
await store.UpsertMultiVectorDocumentsAsync("images", [doc]);

// Search by named vector
var results = await store.SearchByNamedVectorAsync("images", "visual", query);

Backend Implementation Details

Backend	Named Vector Strategy
InMemory	Stored in `MultiVectorDocument.NamedVectors` dictionary, cosine search in-memory
DuckDB	Side table `{collection}_named_vectors` with `(document_id, vector_name)` PK, cosine search in-memory
Qdrant	Native `VectorParamsMap` with per-vector HNSW indexes, native search via `vectorName` parameter

Enables separate embeddings for:

Text (OCR from image)
Visual (CLIP embedding)
Color (color histogram)
Motion (optical flow)
Voice (ECAPA-TDNN speaker embedding)

Implementation Status

Phase 1: Foundation ✅ COMPLETE

Create project structure
Define IVectorStore interface
Define VectorDocument, VectorSearchQuery, VectorSearchResult models
Configuration classes with mode-specific factories
DI extension methods

Phase 2: DuckDB Implementation ✅ COMPLETE

Port DuckDBVectorStore from DataSummarizer
VSS extension integration with graceful fallback
In-memory cosine similarity fallback when VSS unavailable
HNSW index configuration and creation
Content hash-based caching for deduplication
Summary caching support
Schema migration for dimension changes
All IVectorStore methods implemented

Key Features:

✅ VSS extension detection and loading
✅ Experimental persistence: SET hnsw_enable_experimental_persistence = true
✅ Dynamic vector dimensions (384 default, configurable)
✅ Vector literal syntax: [val1,val2,...]::FLOAT[dim]
✅ HNSW index with configurable M and ef_construction
✅ Dual search modes: VSS array_distance() or in-memory cosine
✅ Metadata filtering support
✅ Privacy-preserving mode (StoreText = false)

Phase 3: Remaining Implementations ✅ COMPLETE

Port InMemoryVectorStore from DocSummarizer.Core
Port QdrantVectorStore from DocSummarizer.Core
Add IMultiVectorStore for multi-modal pipelines (image, audio)

InMemoryVectorStore Features:

✅ ConcurrentDictionary-based storage for thread-safety
✅ Cosine similarity search (brute-force, no index)
✅ LRU eviction with configurable MaxDocuments limit
✅ Content hash-based caching
✅ Summary caching
✅ Metadata filtering support
✅ Zero external dependencies

QdrantVectorStore Features:

✅ Production-grade persistent storage
✅ Qdrant gRPC client integration
✅ Batch upserts (100 points per batch)
✅ Filter-based search and deletion
✅ Content hash-based caching
✅ Summary caching with metadata
✅ Privacy-preserving mode support
✅ Configurable distance metrics (Cosine, Euclidean, Dot Product)
✅ HTTPS support

Phase 4: Migration

Migrate DocSummarizer.Core
Migrate DataSummarizer
Migrate Mostlylucid.RAG
Update LucidRAG web app

License

MIT - Part of the LucidRAG project

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- DuckDB.NET.Data.Full (>= 1.5.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.5)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.5)
- Microsoft.Extensions.Options (>= 10.0.5)
- Mostlylucid.LucidRAG.Summarizer.Core (>= 2.7.5)
- Qdrant.Client (>= 1.17.0)

NuGet packages (2)

Showing the top 2 NuGet packages that depend on Mostlylucid.LucidRAG.Storage.Core:

Package	Downloads
Mostlylucid.LucidRAG.DocSummarizer Local-first document summarization library using BERT embeddings, RAG, and optional LLM synthesis. Supports markdown, PDF, DOCX, and URLs. Every claim is grounded with citations. Runs entirely offline with ONNX models, or optionally uses Ollama/Docling for enhanced features.	9.8K
Mostlylucid.LucidRAG.AudioSummarizer AudioSummarizer - Forensic audio characterization library with speech-to-text and speaker analysis	1.7K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
2.7.5	595	3/30/2026
2.7.4	484	3/30/2026
2.7.3	585	3/30/2026
2.7.2	231	3/30/2026
2.7.1	551	3/29/2026
2.7.0	504	3/29/2026
2.6.0	516	3/29/2026
2.5.0-alpha0	628	2/10/2026
2.1.0	562	2/9/2026
2.1.0-preview2	550	2/9/2026
2.0.1-rc2	110	2/9/2026
2.0.1-rc0	601	2/9/2026
2.0.0-rc5	207	2/9/2026
1.1.1	132	2/4/2026
1.0.0	122	2/4/2026
0.0.0-alpha.0.266	60	2/9/2026
0.0.0-alpha.0.265	60	2/9/2026
0.0.0-alpha.0.263	56	2/9/2026
0.0.0-alpha.0.262	56	2/8/2026
0.0.0-alpha.0.258	57	2/8/2026

Mostlylucid.LucidRAG.Storage.Core 2.7.5

Mostlylucid.Storage.Core

Overview

Supported Backends

Quick Start

Installation

Tool/MCP Mode (No Persistence)

Standalone Mode (DuckDB Persistence)

Production Mode (Qdrant)

Custom Configuration

Usage

Initialize Collection

Insert Documents

Search

Content Hash-Based Caching

Incremental Updates

Architecture

Configuration Reference

appsettings.json

Backend Comparison

InMemory

DuckDB

Qdrant

Performance

DuckDB HNSW vs Brute-Force

Startup Time (Reindex on Restart)

Privacy-Preserving Mode

Multi-Vector Support

Usage

Backend Implementation Details

Implementation Status

Phase 1: Foundation ✅ COMPLETE

Phase 2: DuckDB Implementation ✅ COMPLETE

Phase 3: Remaining Implementations ✅ COMPLETE

Phase 4: Migration

License

net10.0

NuGet packages (2)

GitHub repositories