Mostlylucid.LucidRAG.Storage.Core 2.7.5

dotnet add package Mostlylucid.LucidRAG.Storage.Core --version 2.7.5
                    
NuGet\Install-Package Mostlylucid.LucidRAG.Storage.Core -Version 2.7.5
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Mostlylucid.LucidRAG.Storage.Core" Version="2.7.5" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Mostlylucid.LucidRAG.Storage.Core" Version="2.7.5" />
                    
Directory.Packages.props
<PackageReference Include="Mostlylucid.LucidRAG.Storage.Core" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Mostlylucid.LucidRAG.Storage.Core --version 2.7.5
                    
#r "nuget: Mostlylucid.LucidRAG.Storage.Core, 2.7.5"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Mostlylucid.LucidRAG.Storage.Core@2.7.5
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Mostlylucid.LucidRAG.Storage.Core&version=2.7.5
                    
Install as a Cake Addin
#tool nuget:?package=Mostlylucid.LucidRAG.Storage.Core&version=2.7.5
                    
Install as a Cake Tool

Mostlylucid.Storage.Core

NuGet: Mostlylucid.LucidRAG.Storage.Core

Unified vector storage library for LucidRAG with support for InMemory, DuckDB, and Qdrant backends.

Overview

This library provides a single, unified interface (IVectorStore) for embedding storage and retrieval across all LucidRAG pipelines:

  • DocumentPipeline - PDF, DOCX, Markdown text embeddings
  • ImagePipeline - Image OCR, CLIP visual embeddings
  • DataPipeline - Data profile embeddings for semantic search

Supported Backends

Backend Persistence Use Case Default Mode
InMemory ❌ Ephemeral Tool/MCP mode, one-shot analysis MCP server, CLI tools
DuckDB ✅ File-based Standalone mode, development Standalone apps
Qdrant ✅ Server-based Production, distributed systems Production deployments

Quick Start

Installation

dotnet add package Mostlylucid.LucidRAG.Storage.Core

Tool/MCP Mode (No Persistence)

For one-shot analysis where you don't need to persist embeddings:

services.AddVectorStoreForToolMode();

Standalone Mode (DuckDB Persistence)

For standalone apps with embedded persistence:

services.AddVectorStoreForStandaloneMode(dataDirectory: "./data");

Production Mode (Qdrant)

For production deployments with dedicated vector database:

services.AddVectorStoreForProductionMode(qdrantHost: "localhost", qdrantPort: 6334);

Custom Configuration

services.AddVectorStore(options =>
{
    options.Backend = VectorStoreBackend.DuckDB;
    options.PersistVectors = true;
    options.ReuseExistingEmbeddings = true;

    options.DuckDB.DatabasePath = "./vectors.duckdb";
    options.DuckDB.EnableVSS = true;
    options.DuckDB.VectorDimension = 384;
});

Usage

Initialize Collection

var vectorStore = serviceProvider.GetRequiredService<IVectorStore>();

var schema = new VectorStoreSchema
{
    VectorDimension = 384,
    DistanceMetric = VectorDistance.Cosine,
    StoreText = true  // false for privacy-preserving mode
};

await vectorStore.InitializeAsync("documents", schema);

Insert Documents

var documents = new[]
{
    new VectorDocument
    {
        Id = "doc1:segment0",
        Embedding = new float[384], // your embedding vector
        ParentId = "doc1",
        ContentHash = ContentHasher.ComputeHash(text),
        Text = text,
        Metadata = new Dictionary<string, object>
        {
            ["type"] = "text",
            ["language"] = "en",
            ["confidence"] = 0.95
        }
    }
};

await vectorStore.UpsertDocumentsAsync("documents", documents);
var query = new VectorSearchQuery
{
    QueryEmbedding = queryVector,
    TopK = 10,
    MinScore = 0.7,
    IncludeDocument = false,  // privacy-preserving - only return IDs
    Filters = new Dictionary<string, object>
    {
        ["language"] = "en"
    }
};

var results = await vectorStore.SearchAsync("documents", query);

foreach (var result in results)
{
    Console.WriteLine($"{result.Id}: {result.Score:F3}");
    Console.WriteLine($"  Metadata: {string.Join(", ", result.Metadata)}");
}

Content Hash-Based Caching

// Get documents by content hash (for deduplication)
var hashes = new[] { "abc123", "def456" };
var cached = await vectorStore.GetDocumentsByHashAsync("documents", hashes);

if (cached.TryGetValue("abc123", out var doc))
{
    Console.WriteLine("Reusing existing embedding!");
}

Incremental Updates

// Remove stale segments, keep new ones
var validHashes = new[] { "hash1", "hash2", "hash3" };
await vectorStore.RemoveStaleDocumentsAsync("documents", parentId: "doc1", validHashes);

Architecture

IMultiVectorStore : IVectorStore (unified interface)
├── InMemoryVectorStore     (ephemeral, fastest)
├── DuckDBVectorStore       (persistent, embedded, HNSW indexes)
└── QdrantVectorStore       (persistent, server-based, production)

Used by:
├── DocSummarizer.Core      (text embeddings)
├── ImageSummarizer.Core    (OCR + CLIP embeddings)
├── AudioSummarizer.Core    (voice embeddings via EmbeddingStorageWave)
├── DataSummarizer.Core     (data profile embeddings)
└── LucidRAG                (all of the above)

Configuration Reference

appsettings.json

{
  "VectorStore": {
    "Backend": "DuckDB",
    "CollectionName": "documents",
    "PersistVectors": true,
    "ReuseExistingEmbeddings": true,

    "DuckDB": {
      "DatabasePath": "./data/vectors.duckdb",
      "EnableVSS": true,
      "EnablePersistence": true,
      "VectorDimension": 384,
      "HNSW": {
        "M": 16,
        "EfConstruction": 200,
        "EfSearch": 100
      }
    },

    "Qdrant": {
      "Host": "localhost",
      "Port": 6334,
      "ApiKey": null,
      "VectorSize": 384,
      "UseHttps": false
    },

    "InMemory": {
      "MaxDocuments": 0,
      "Verbose": false
    }
  }
}

Backend Comparison

InMemory

Pros:

  • Fastest (no disk I/O)
  • No external dependencies
  • Perfect for testing

Cons:

  • Data lost on restart
  • Limited by RAM
  • No HNSW acceleration

Best for: MCP tools, one-shot CLI analysis, unit tests

DuckDB

Pros:

  • File-based persistence
  • No external server needed
  • HNSW indexes (with VSS extension)
  • Graceful fallback if VSS unavailable

Cons:

  • Single-process (no distributed)
  • HNSW persistence experimental

Best for: Standalone apps, development, embedded scenarios

Qdrant

Pros:

  • Production-grade
  • Distributed/multi-node
  • Advanced filtering
  • Multi-vector support

Cons:

  • Requires external server
  • More complex deployment

Best for: Production, multi-tenant, high-scale

Performance

DuckDB HNSW vs Brute-Force

Documents HNSW (VSS) Brute-Force Speedup
1,000 2ms 15ms 7.5×
10,000 5ms 150ms 30×
100,000 10ms 1,500ms 150×

Startup Time (Reindex on Restart)

Backend 1,000 docs 10,000 docs Persistence
InMemory ~5s ~50s
DuckDB ~2s ~5s
Qdrant ~1s ~2s

Privacy-Preserving Mode

Set StoreText = false in schema to avoid storing plaintext:

var schema = new VectorStoreSchema
{
    VectorDimension = 384,
    StoreText = false  // Only store embeddings, not text
};

Search results will only return IDs and scores, not text content.

Multi-Vector Support

IMultiVectorStore extends IVectorStore with named vector support for multi-modal embeddings. All three backends ( InMemory, DuckDB, Qdrant) implement it.

public interface IMultiVectorStore : IVectorStore
{
    Task InitializeMultiVectorAsync(
        string collectionName,
        VectorStoreSchema primarySchema,
        IEnumerable<NamedVectorConfig> namedVectors,
        CancellationToken ct = default);

    Task UpsertMultiVectorDocumentsAsync(
        string collectionName,
        IEnumerable<MultiVectorDocument> documents,
        CancellationToken ct = default);

    Task<List<VectorSearchResult>> SearchByNamedVectorAsync(
        string collectionName,
        string vectorName,
        VectorSearchQuery query,
        CancellationToken ct = default);
}

Usage

var store = serviceProvider.GetRequiredService<IMultiVectorStore>();

// Initialize with primary + named vectors
await store.InitializeMultiVectorAsync("images", primarySchema, new[]
{
    new NamedVectorConfig { Name = "visual", Dimension = 512 },
    new NamedVectorConfig { Name = "color", Dimension = 128 },
});

// Upsert documents with named vectors
var doc = new MultiVectorDocument
{
    Id = "img1", Embedding = primaryEmbedding,
    NamedVectors = { ["visual"] = clipVector, ["color"] = colorVector }
};
await store.UpsertMultiVectorDocumentsAsync("images", [doc]);

// Search by named vector
var results = await store.SearchByNamedVectorAsync("images", "visual", query);

Backend Implementation Details

Backend Named Vector Strategy
InMemory Stored in MultiVectorDocument.NamedVectors dictionary, cosine search in-memory
DuckDB Side table {collection}_named_vectors with (document_id, vector_name) PK, cosine search in-memory
Qdrant Native VectorParamsMap with per-vector HNSW indexes, native search via vectorName parameter

Enables separate embeddings for:

  • Text (OCR from image)
  • Visual (CLIP embedding)
  • Color (color histogram)
  • Motion (optical flow)
  • Voice (ECAPA-TDNN speaker embedding)

Implementation Status

Phase 1: Foundation ✅ COMPLETE

  • Create project structure
  • Define IVectorStore interface
  • Define VectorDocument, VectorSearchQuery, VectorSearchResult models
  • Configuration classes with mode-specific factories
  • DI extension methods

Phase 2: DuckDB Implementation ✅ COMPLETE

  • Port DuckDBVectorStore from DataSummarizer
  • VSS extension integration with graceful fallback
  • In-memory cosine similarity fallback when VSS unavailable
  • HNSW index configuration and creation
  • Content hash-based caching for deduplication
  • Summary caching support
  • Schema migration for dimension changes
  • All IVectorStore methods implemented

Key Features:

  • ✅ VSS extension detection and loading
  • ✅ Experimental persistence: SET hnsw_enable_experimental_persistence = true
  • ✅ Dynamic vector dimensions (384 default, configurable)
  • ✅ Vector literal syntax: [val1,val2,...]::FLOAT[dim]
  • ✅ HNSW index with configurable M and ef_construction
  • ✅ Dual search modes: VSS array_distance() or in-memory cosine
  • ✅ Metadata filtering support
  • ✅ Privacy-preserving mode (StoreText = false)

Phase 3: Remaining Implementations ✅ COMPLETE

  • Port InMemoryVectorStore from DocSummarizer.Core
  • Port QdrantVectorStore from DocSummarizer.Core
  • Add IMultiVectorStore for multi-modal pipelines (image, audio)

InMemoryVectorStore Features:

  • ✅ ConcurrentDictionary-based storage for thread-safety
  • ✅ Cosine similarity search (brute-force, no index)
  • ✅ LRU eviction with configurable MaxDocuments limit
  • ✅ Content hash-based caching
  • ✅ Summary caching
  • ✅ Metadata filtering support
  • ✅ Zero external dependencies

QdrantVectorStore Features:

  • ✅ Production-grade persistent storage
  • ✅ Qdrant gRPC client integration
  • ✅ Batch upserts (100 points per batch)
  • ✅ Filter-based search and deletion
  • ✅ Content hash-based caching
  • ✅ Summary caching with metadata
  • ✅ Privacy-preserving mode support
  • ✅ Configurable distance metrics (Cosine, Euclidean, Dot Product)
  • ✅ HTTPS support

Phase 4: Migration

  • Migrate DocSummarizer.Core
  • Migrate DataSummarizer
  • Migrate Mostlylucid.RAG
  • Update LucidRAG web app

License

MIT - Part of the LucidRAG project

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (2)

Showing the top 2 NuGet packages that depend on Mostlylucid.LucidRAG.Storage.Core:

Package Downloads
Mostlylucid.LucidRAG.DocSummarizer

Local-first document summarization library using BERT embeddings, RAG, and optional LLM synthesis. Supports markdown, PDF, DOCX, and URLs. Every claim is grounded with citations. Runs entirely offline with ONNX models, or optionally uses Ollama/Docling for enhanced features.

Mostlylucid.LucidRAG.AudioSummarizer

AudioSummarizer - Forensic audio characterization library with speech-to-text and speaker analysis

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
2.7.5 595 3/30/2026
2.7.4 484 3/30/2026
2.7.3 585 3/30/2026
2.7.2 231 3/30/2026
2.7.1 551 3/29/2026
2.7.0 504 3/29/2026
2.6.0 516 3/29/2026
2.5.0-alpha0 628 2/10/2026
2.1.0 562 2/9/2026
2.1.0-preview2 550 2/9/2026
2.0.1-rc2 110 2/9/2026
2.0.1-rc0 601 2/9/2026
2.0.0-rc5 207 2/9/2026
1.1.1 132 2/4/2026
1.0.0 122 2/4/2026
0.0.0-alpha.0.266 60 2/9/2026
0.0.0-alpha.0.265 60 2/9/2026
0.0.0-alpha.0.263 56 2/9/2026
0.0.0-alpha.0.262 56 2/8/2026
0.0.0-alpha.0.258 57 2/8/2026
Loading failed