Mostlylucid.LucidRAG.Storage.Core
2.7.5
dotnet add package Mostlylucid.LucidRAG.Storage.Core --version 2.7.5
NuGet\Install-Package Mostlylucid.LucidRAG.Storage.Core -Version 2.7.5
<PackageReference Include="Mostlylucid.LucidRAG.Storage.Core" Version="2.7.5" />
<PackageVersion Include="Mostlylucid.LucidRAG.Storage.Core" Version="2.7.5" />
<PackageReference Include="Mostlylucid.LucidRAG.Storage.Core" />
paket add Mostlylucid.LucidRAG.Storage.Core --version 2.7.5
#r "nuget: Mostlylucid.LucidRAG.Storage.Core, 2.7.5"
#:package Mostlylucid.LucidRAG.Storage.Core@2.7.5
#addin nuget:?package=Mostlylucid.LucidRAG.Storage.Core&version=2.7.5
#tool nuget:?package=Mostlylucid.LucidRAG.Storage.Core&version=2.7.5
Mostlylucid.Storage.Core
NuGet: Mostlylucid.LucidRAG.Storage.Core
Unified vector storage library for LucidRAG with support for InMemory, DuckDB, and Qdrant backends.
Overview
This library provides a single, unified interface (IVectorStore) for embedding storage and retrieval across all
LucidRAG pipelines:
- DocumentPipeline - PDF, DOCX, Markdown text embeddings
- ImagePipeline - Image OCR, CLIP visual embeddings
- DataPipeline - Data profile embeddings for semantic search
Supported Backends
| Backend | Persistence | Use Case | Default Mode |
|---|---|---|---|
| InMemory | ❌ Ephemeral | Tool/MCP mode, one-shot analysis | MCP server, CLI tools |
| DuckDB | ✅ File-based | Standalone mode, development | Standalone apps |
| Qdrant | ✅ Server-based | Production, distributed systems | Production deployments |
Quick Start
Installation
dotnet add package Mostlylucid.LucidRAG.Storage.Core
Tool/MCP Mode (No Persistence)
For one-shot analysis where you don't need to persist embeddings:
services.AddVectorStoreForToolMode();
Standalone Mode (DuckDB Persistence)
For standalone apps with embedded persistence:
services.AddVectorStoreForStandaloneMode(dataDirectory: "./data");
Production Mode (Qdrant)
For production deployments with dedicated vector database:
services.AddVectorStoreForProductionMode(qdrantHost: "localhost", qdrantPort: 6334);
Custom Configuration
services.AddVectorStore(options =>
{
options.Backend = VectorStoreBackend.DuckDB;
options.PersistVectors = true;
options.ReuseExistingEmbeddings = true;
options.DuckDB.DatabasePath = "./vectors.duckdb";
options.DuckDB.EnableVSS = true;
options.DuckDB.VectorDimension = 384;
});
Usage
Initialize Collection
var vectorStore = serviceProvider.GetRequiredService<IVectorStore>();
var schema = new VectorStoreSchema
{
VectorDimension = 384,
DistanceMetric = VectorDistance.Cosine,
StoreText = true // false for privacy-preserving mode
};
await vectorStore.InitializeAsync("documents", schema);
Insert Documents
var documents = new[]
{
new VectorDocument
{
Id = "doc1:segment0",
Embedding = new float[384], // your embedding vector
ParentId = "doc1",
ContentHash = ContentHasher.ComputeHash(text),
Text = text,
Metadata = new Dictionary<string, object>
{
["type"] = "text",
["language"] = "en",
["confidence"] = 0.95
}
}
};
await vectorStore.UpsertDocumentsAsync("documents", documents);
Search
var query = new VectorSearchQuery
{
QueryEmbedding = queryVector,
TopK = 10,
MinScore = 0.7,
IncludeDocument = false, // privacy-preserving - only return IDs
Filters = new Dictionary<string, object>
{
["language"] = "en"
}
};
var results = await vectorStore.SearchAsync("documents", query);
foreach (var result in results)
{
Console.WriteLine($"{result.Id}: {result.Score:F3}");
Console.WriteLine($" Metadata: {string.Join(", ", result.Metadata)}");
}
Content Hash-Based Caching
// Get documents by content hash (for deduplication)
var hashes = new[] { "abc123", "def456" };
var cached = await vectorStore.GetDocumentsByHashAsync("documents", hashes);
if (cached.TryGetValue("abc123", out var doc))
{
Console.WriteLine("Reusing existing embedding!");
}
Incremental Updates
// Remove stale segments, keep new ones
var validHashes = new[] { "hash1", "hash2", "hash3" };
await vectorStore.RemoveStaleDocumentsAsync("documents", parentId: "doc1", validHashes);
Architecture
IMultiVectorStore : IVectorStore (unified interface)
├── InMemoryVectorStore (ephemeral, fastest)
├── DuckDBVectorStore (persistent, embedded, HNSW indexes)
└── QdrantVectorStore (persistent, server-based, production)
Used by:
├── DocSummarizer.Core (text embeddings)
├── ImageSummarizer.Core (OCR + CLIP embeddings)
├── AudioSummarizer.Core (voice embeddings via EmbeddingStorageWave)
├── DataSummarizer.Core (data profile embeddings)
└── LucidRAG (all of the above)
Configuration Reference
appsettings.json
{
"VectorStore": {
"Backend": "DuckDB",
"CollectionName": "documents",
"PersistVectors": true,
"ReuseExistingEmbeddings": true,
"DuckDB": {
"DatabasePath": "./data/vectors.duckdb",
"EnableVSS": true,
"EnablePersistence": true,
"VectorDimension": 384,
"HNSW": {
"M": 16,
"EfConstruction": 200,
"EfSearch": 100
}
},
"Qdrant": {
"Host": "localhost",
"Port": 6334,
"ApiKey": null,
"VectorSize": 384,
"UseHttps": false
},
"InMemory": {
"MaxDocuments": 0,
"Verbose": false
}
}
}
Backend Comparison
InMemory
Pros:
- Fastest (no disk I/O)
- No external dependencies
- Perfect for testing
Cons:
- Data lost on restart
- Limited by RAM
- No HNSW acceleration
Best for: MCP tools, one-shot CLI analysis, unit tests
DuckDB
Pros:
- File-based persistence
- No external server needed
- HNSW indexes (with VSS extension)
- Graceful fallback if VSS unavailable
Cons:
- Single-process (no distributed)
- HNSW persistence experimental
Best for: Standalone apps, development, embedded scenarios
Qdrant
Pros:
- Production-grade
- Distributed/multi-node
- Advanced filtering
- Multi-vector support
Cons:
- Requires external server
- More complex deployment
Best for: Production, multi-tenant, high-scale
Performance
DuckDB HNSW vs Brute-Force
| Documents | HNSW (VSS) | Brute-Force | Speedup |
|---|---|---|---|
| 1,000 | 2ms | 15ms | 7.5× |
| 10,000 | 5ms | 150ms | 30× |
| 100,000 | 10ms | 1,500ms | 150× |
Startup Time (Reindex on Restart)
| Backend | 1,000 docs | 10,000 docs | Persistence |
|---|---|---|---|
| InMemory | ~5s | ~50s | ❌ |
| DuckDB | ~2s | ~5s | ✅ |
| Qdrant | ~1s | ~2s | ✅ |
Privacy-Preserving Mode
Set StoreText = false in schema to avoid storing plaintext:
var schema = new VectorStoreSchema
{
VectorDimension = 384,
StoreText = false // Only store embeddings, not text
};
Search results will only return IDs and scores, not text content.
Multi-Vector Support
IMultiVectorStore extends IVectorStore with named vector support for multi-modal embeddings. All three backends (
InMemory, DuckDB, Qdrant) implement it.
public interface IMultiVectorStore : IVectorStore
{
Task InitializeMultiVectorAsync(
string collectionName,
VectorStoreSchema primarySchema,
IEnumerable<NamedVectorConfig> namedVectors,
CancellationToken ct = default);
Task UpsertMultiVectorDocumentsAsync(
string collectionName,
IEnumerable<MultiVectorDocument> documents,
CancellationToken ct = default);
Task<List<VectorSearchResult>> SearchByNamedVectorAsync(
string collectionName,
string vectorName,
VectorSearchQuery query,
CancellationToken ct = default);
}
Usage
var store = serviceProvider.GetRequiredService<IMultiVectorStore>();
// Initialize with primary + named vectors
await store.InitializeMultiVectorAsync("images", primarySchema, new[]
{
new NamedVectorConfig { Name = "visual", Dimension = 512 },
new NamedVectorConfig { Name = "color", Dimension = 128 },
});
// Upsert documents with named vectors
var doc = new MultiVectorDocument
{
Id = "img1", Embedding = primaryEmbedding,
NamedVectors = { ["visual"] = clipVector, ["color"] = colorVector }
};
await store.UpsertMultiVectorDocumentsAsync("images", [doc]);
// Search by named vector
var results = await store.SearchByNamedVectorAsync("images", "visual", query);
Backend Implementation Details
| Backend | Named Vector Strategy |
|---|---|
| InMemory | Stored in MultiVectorDocument.NamedVectors dictionary, cosine search in-memory |
| DuckDB | Side table {collection}_named_vectors with (document_id, vector_name) PK, cosine search in-memory |
| Qdrant | Native VectorParamsMap with per-vector HNSW indexes, native search via vectorName parameter |
Enables separate embeddings for:
- Text (OCR from image)
- Visual (CLIP embedding)
- Color (color histogram)
- Motion (optical flow)
- Voice (ECAPA-TDNN speaker embedding)
Implementation Status
Phase 1: Foundation ✅ COMPLETE
- Create project structure
- Define
IVectorStoreinterface - Define
VectorDocument,VectorSearchQuery,VectorSearchResultmodels - Configuration classes with mode-specific factories
- DI extension methods
Phase 2: DuckDB Implementation ✅ COMPLETE
- Port
DuckDBVectorStorefrom DataSummarizer - VSS extension integration with graceful fallback
- In-memory cosine similarity fallback when VSS unavailable
- HNSW index configuration and creation
- Content hash-based caching for deduplication
- Summary caching support
- Schema migration for dimension changes
- All
IVectorStoremethods implemented
Key Features:
- ✅ VSS extension detection and loading
- ✅ Experimental persistence:
SET hnsw_enable_experimental_persistence = true - ✅ Dynamic vector dimensions (384 default, configurable)
- ✅ Vector literal syntax:
[val1,val2,...]::FLOAT[dim] - ✅ HNSW index with configurable M and ef_construction
- ✅ Dual search modes: VSS
array_distance()or in-memory cosine - ✅ Metadata filtering support
- ✅ Privacy-preserving mode (StoreText = false)
Phase 3: Remaining Implementations ✅ COMPLETE
- Port
InMemoryVectorStorefrom DocSummarizer.Core - Port
QdrantVectorStorefrom DocSummarizer.Core - Add
IMultiVectorStorefor multi-modal pipelines (image, audio)
InMemoryVectorStore Features:
- ✅ ConcurrentDictionary-based storage for thread-safety
- ✅ Cosine similarity search (brute-force, no index)
- ✅ LRU eviction with configurable
MaxDocumentslimit - ✅ Content hash-based caching
- ✅ Summary caching
- ✅ Metadata filtering support
- ✅ Zero external dependencies
QdrantVectorStore Features:
- ✅ Production-grade persistent storage
- ✅ Qdrant gRPC client integration
- ✅ Batch upserts (100 points per batch)
- ✅ Filter-based search and deletion
- ✅ Content hash-based caching
- ✅ Summary caching with metadata
- ✅ Privacy-preserving mode support
- ✅ Configurable distance metrics (Cosine, Euclidean, Dot Product)
- ✅ HTTPS support
Phase 4: Migration
- Migrate DocSummarizer.Core
- Migrate DataSummarizer
- Migrate Mostlylucid.RAG
- Update LucidRAG web app
License
MIT - Part of the LucidRAG project
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- DuckDB.NET.Data.Full (>= 1.5.0)
- Microsoft.Extensions.DependencyInjection.Abstractions (>= 10.0.5)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.5)
- Microsoft.Extensions.Options (>= 10.0.5)
- Mostlylucid.LucidRAG.Summarizer.Core (>= 2.7.5)
- Qdrant.Client (>= 1.17.0)
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Mostlylucid.LucidRAG.Storage.Core:
| Package | Downloads |
|---|---|
|
Mostlylucid.LucidRAG.DocSummarizer
Local-first document summarization library using BERT embeddings, RAG, and optional LLM synthesis. Supports markdown, PDF, DOCX, and URLs. Every claim is grounded with citations. Runs entirely offline with ONNX models, or optionally uses Ollama/Docling for enhanced features. |
|
|
Mostlylucid.LucidRAG.AudioSummarizer
AudioSummarizer - Forensic audio characterization library with speech-to-text and speaker analysis |
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 2.7.5 | 595 | 3/30/2026 |
| 2.7.4 | 484 | 3/30/2026 |
| 2.7.3 | 585 | 3/30/2026 |
| 2.7.2 | 231 | 3/30/2026 |
| 2.7.1 | 551 | 3/29/2026 |
| 2.7.0 | 504 | 3/29/2026 |
| 2.6.0 | 516 | 3/29/2026 |
| 2.5.0-alpha0 | 628 | 2/10/2026 |
| 2.1.0 | 562 | 2/9/2026 |
| 2.1.0-preview2 | 550 | 2/9/2026 |
| 2.0.1-rc2 | 110 | 2/9/2026 |
| 2.0.1-rc0 | 601 | 2/9/2026 |
| 2.0.0-rc5 | 207 | 2/9/2026 |
| 1.1.1 | 132 | 2/4/2026 |
| 1.0.0 | 122 | 2/4/2026 |
| 0.0.0-alpha.0.266 | 60 | 2/9/2026 |
| 0.0.0-alpha.0.265 | 60 | 2/9/2026 |
| 0.0.0-alpha.0.263 | 56 | 2/9/2026 |
| 0.0.0-alpha.0.262 | 56 | 2/8/2026 |
| 0.0.0-alpha.0.258 | 57 | 2/8/2026 |