ReLinker 1.0.111

There is a newer version of this package available.
See the version list below for details.

dotnet add package ReLinker --version 1.0.111

NuGet\Install-Package ReLinker -Version 1.0.111

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="ReLinker" Version="1.0.111" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="ReLinker" Version="1.0.111" />
                    

                            Directory.Packages.props

<PackageReference Include="ReLinker" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add ReLinker --version 1.0.111

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: ReLinker, 1.0.111"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#addin nuget:?package=ReLinker&version=1.0.111
                    

                            Install as a Cake Addin

#tool nuget:?package=ReLinker&version=1.0.111
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

ReLinker: Record Linkage System Documentation

Overview

ReLinker is a comprehensive C# record linkage (entity resolution) system designed to identify and match duplicate or related records across datasets. The system implements sophisticated algorithms for blocking, similarity computation, probabilistic matching, and clustering to efficiently process large datasets and find potential matches.

Key Concepts

Record Linkage

Record linkage is the process of identifying records that refer to the same entity across different data sources or within the same dataset. This is crucial for:

Data deduplication
Master data management
Entity resolution
Data integration

System Architecture

The ReLinker system follows a multi-stage pipeline:

Data Loading - Extract records from various database sources
Blocking - Reduce comparison space by grouping similar records
Similarity Computation - Calculate similarity scores between record pairs
Probabilistic Matching - Use statistical models to determine match likelihood
Clustering - Group matched records into entities

Core Components

1. Record Structure

public class Record
{
    public string Id { get; set; }
    public Dictionary<string, string> Fields { get; set; }
}

The fundamental data structure representing a single record with:

Id: Unique identifier for the record
Fields: Key-value pairs containing the record's attributes

2. Data Loading System

Interface: IDatabaseLoader

Provides a unified interface for loading records from different database systems:

public interface IDatabaseLoader
{
    Task<List<Record>> LoadRecordsAsync();
    IAsyncEnumerable<Record> LoadRecordsInBatchesAsync(int batchSize, int startOffset = 0);
    List<Record> LoadRecords();
    IEnumerable<Record> LoadRecordsInBatches(int batchSize, int startOffset = 0);
}

Supported Database Types

1. GenericDbLoader

Supports any ADO.NET-compatible database
Uses DbProviderFactories for database abstraction
Parameters: provider name, connection string, SQL query

2. DuckDbLoader

Specialized loader for DuckDB (analytical database)
Optimized for columnar data processing
Parameters: connection string, SQL query

3. RavenDbLoader

Document database loader for RavenDB
Handles JSON document structures
Parameters: URL, database name, collection name

Factory Pattern

public static IDatabaseLoader CreateLoader(string type, string connectionStringOrUrl, 
    string queryOrCollection, string providerName = null)

Creates appropriate loader based on database type specification.

3. Blocking System

Blocking is an optimization technique that reduces the number of record comparisons by only comparing records that share certain characteristics.

BlockingRule

public class BlockingRule
{
    public string Name { get; set; }
    public Func<Record, string> RuleFunc { get; set; }
}

Defines a blocking strategy:

Name: Descriptive name for the rule
RuleFunc: Function that extracts a blocking key from a record

BlockingHelper

Generates candidate pairs using blocking rules:

Processes records in batches for memory efficiency
Uses parallel processing for performance
Only compares records with matching blocking keys

Example Blocking Rules:

First three characters of last name
Soundex code of name
ZIP code prefix
Date of birth year

4. Similarity Functions

The system implements multiple string similarity algorithms:

Levenshtein Similarity (Token-based with IDF weighting)

Computes edit distance between tokenized strings
Weights operations by Inverse Document Frequency (IDF)
Better handles importance of rare vs. common terms

Jaro Similarity (IDF-weighted)

Focuses on character transpositions and matches
Weighted by term importance using IDF
Good for names with common misspellings

TF-IDF Cosine Similarity

Treats strings as document vectors
Uses Term Frequency-Inverse Document Frequency weighting
Effective for longer text fields

SimilarityFunction Structure

public class SimilarityFunction
{
    public string FieldName { get; set; }
    public Func<Record, Record, double> Compute { get; set; }
}

5. Probabilistic Matching

Fellegi-Sunter Model

The system implements the classic Fellegi-Sunter probabilistic record linkage model:

M-probability: Probability that a field agrees given records are a true match
U-probability: Probability that a field agrees given records are not a match
Log-likelihood ratio: Combines evidence from multiple fields

MatchScorer

public static List<ScoredPair> Score(
    IEnumerable<(Record, Record)> pairs,
    List<SimilarityFunction> functions,
    double[] mProbs,
    double[] uProbs)

Computes match scores using the Fellegi-Sunter model.

Expectation-Maximization (EM) Algorithm

public static (double[], double[]) EstimateParametersWithEM(
    List<ScoredPair> scoredPairs,
    List<SimilarityFunction> functions,
    int maxIterations = 10)

Automatically estimates M and U probabilities from unlabeled data:

E-step: Estimate match probabilities for each pair
M-step: Update M and U parameters based on estimated probabilities
Iterate: Repeat until convergence

6. Clustering System

UnionFind Data Structure

Implements efficient clustering using Union-Find (Disjoint Set Union):

public class UnionFind
{
    public string Find(string x)      // Find root of component
    public void Union(string x, string y)  // Merge two components
    public Dictionary<string, List<string>> GetClusters()  // Get final clusters
}

Features:

Path compression optimization
Efficient O(α(n)) amortized time complexity
Groups records that should be considered the same entity

System Workflow

1. Data Preparation

// Load records from database
var loader = DatabaseLoaderFactory.CreateLoader("duckdb", connectionString, query);
var records = await loader.LoadRecordsAsync();

2. Define Blocking Rules

var blockingRules = new List<BlockingRule>
{
    new BlockingRule("LastName3", r => r.Fields["LastName"].Substring(0, 3)),
    new BlockingRule("ZipCode", r => r.Fields["ZipCode"])
};

3. Generate Candidate Pairs

var candidatePairs = BlockingHelper.GenerateCandidatePairsInBatches(
    records, blockingRules, batchSize: 1000);

4. Define Similarity Functions

var similarities = new List<SimilarityFunction>
{
    new SimilarityFunction 
    { 
        FieldName = "Name", 
        Compute = (r1, r2) => Similarity.JaroSimilarity(
            r1.Fields["Name"], r2.Fields["Name"], idfDictionary)
    }
};

5. Score Pairs and Estimate Parameters

// Initial scoring with default parameters
var initialScores = MatchScorer.Score(candidatePairs, similarities, mProbs, uProbs);

// Refine parameters using EM
var (refinedM, refinedU) = MatchScorer.EstimateParametersWithEM(
    initialScores, similarities, maxIterations: 10);

// Final scoring with refined parameters
var finalScores = MatchScorer.Score(candidatePairs, similarities, refinedM, refinedU);

6. Apply Threshold and Cluster

var unionFind = new UnionFind();
foreach (var pair in finalScores.Where(p => p.Score > threshold))
{
    unionFind.Union(pair.Record1.Id, pair.Record2.Id);
}
var clusters = unionFind.GetClusters();

Advanced Features

Batch Processing

The system supports both synchronous and asynchronous batch processing:

Memory-efficient streaming for large datasets
Configurable batch sizes
Offset-based pagination support

Parallel Processing

Multi-threaded candidate pair generation
Parallel similarity computation
Thread-safe operations with proper locking

Retry Mechanism

public static T RetryWithBackoff<T>(Func<T> operation, 
    int maxRetries = 3, int delayMilliseconds = 500)

Implements exponential backoff for database operations.

Logging

Simple logging system for monitoring:

Info-level logging for successful operations
Error-level logging for exceptions
Timestamped log entries

Performance Considerations

Blocking Effectiveness

Good blocking rules can reduce comparisons from O(n²) to O(n·k)
Multiple blocking rules can improve recall
Balance between precision and recall

Memory Management

Batch processing prevents memory overflow
Streaming interfaces for large datasets
Efficient data structures (UnionFind)

Computational Complexity

Similarity computation: O(n·m) per pair (n,m = string lengths)
EM algorithm: O(iterations · pairs · features)
UnionFind clustering: O(n·α(n)) amortized

Configuration Examples

Example 1: Customer Deduplication

// Step 1: Set up database loader
var connectionString = "Data Source=customers.db";
var query = "SELECT id, first_name, last_name, phone, email, address FROM customers";
var loader = DatabaseLoaderFactory.CreateLoader("generic", connectionString, query, "System.Data.SQLite");

// Step 2: Load records
var records = await loader.LoadRecordsAsync();

// Step 3: Build IDF dictionary for weighting (you'd calculate this from your corpus)
var idf = new Dictionary<string, double>
{
    {"john", 0.5}, {"smith", 0.3}, {"the", 0.1}, {"inc", 0.7}, 
    {"street", 0.2}, {"avenue", 0.2} // Add more terms as needed
};

// Step 4: Define blocking rules to reduce comparisons
var blockingRules = new List<BlockingRule>
{
    // Compare records with same first 6 digits of phone
    new BlockingRule("Phone", r => 
    {
        var phone = r.Fields.GetValueOrDefault("phone", "");
        return phone.Length >= 6 ? phone.Substring(0, 6) : phone;
    }),
    // Compare records with same email domain
    new BlockingRule("Email", r => 
    {
        var email = r.Fields.GetValueOrDefault("email", "");
        var parts = email.Split('@');
        return parts.Length > 1 ? parts[1] : "";
    })
};

// Step 5: Define similarity functions for each field
var similarities = new List<SimilarityFunction>
{
    new SimilarityFunction 
    { 
        FieldName = "FullName",
        Compute = (r1, r2) => 
        {
            var name1 = $"{r1.Fields.GetValueOrDefault("first_name", "")} {r1.Fields.GetValueOrDefault("last_name", "")}";
            var name2 = $"{r2.Fields.GetValueOrDefault("first_name", "")} {r2.Fields.GetValueOrDefault("last_name", "")}";
            return Similarity.JaroSimilarity(name1, name2, idf);
        }
    },
    new SimilarityFunction 
    { 
        FieldName = "Address",
        Compute = (r1, r2) => Similarity.LevenshteinSimilarity(
            r1.Fields.GetValueOrDefault("address", ""), 
            r2.Fields.GetValueOrDefault("address", ""), idf)
    },
    new SimilarityFunction 
    { 
        FieldName = "Phone",
        Compute = (r1, r2) => 
        {
            var phone1 = r1.Fields.GetValueOrDefault("phone", "");
            var phone2 = r2.Fields.GetValueOrDefault("phone", "");
            return phone1 == phone2 ? 1.0 : 0.0; // Exact match for phones
        }
    }
};

// Step 6: Generate candidate pairs using blocking
var candidatePairs = BlockingHelper.GenerateCandidatePairsInBatches(
    records, blockingRules, batchSize: 1000);

// Step 7: Initial parameters (will be refined by EM)
double[] initialM = {0.9, 0.8, 0.95}; // High probability of agreement for matches
double[] initialU = {0.1, 0.1, 0.05}; // Low probability of agreement for non-matches

// Step 8: Score pairs and refine parameters
var initialScores = MatchScorer.Score(candidatePairs, similarities, initialM, initialU);
var (refinedM, refinedU) = MatchScorer.EstimateParametersWithEM(initialScores, similarities, 10);
var finalScores = MatchScorer.Score(candidatePairs, similarities, refinedM, refinedU);

// Step 9: Apply threshold and cluster matches
double threshold = 2.0; // Log-likelihood ratio threshold (tune based on your needs)
var unionFind = new UnionFind();
foreach (var pair in finalScores.Where(p => p.Score > threshold))
{
    unionFind.Union(pair.Record1.Id, pair.Record2.Id);
}
var clusters = unionFind.GetClusters();

// Step 10: Process results
foreach (var cluster in clusters.Where(c => c.Value.Count > 1))
{
    Console.WriteLine($"Found duplicate group: {string.Join(", ", cluster.Value)}");
}

Example 2: Product Matching Across Catalogs

// Step 1: Load products from two different sources
var loader1 = DatabaseLoaderFactory.CreateLoader("duckdb", 
    "Data Source=:memory:", 
    "SELECT id, title, brand, category, price FROM catalog1");
var loader2 = DatabaseLoaderFactory.CreateLoader("duckdb", 
    "Data Source=:memory:", 
    "SELECT id, product_name, manufacturer, type, cost FROM catalog2");

var catalog1 = await loader1.LoadRecordsAsync();
var catalog2 = await loader2.LoadRecordsAsync();
var allProducts = catalog1.Concat(catalog2).ToList();

// Step 2: Build IDF dictionary for product terms
var idf = new Dictionary<string, double>
{
    {"laptop", 0.8}, {"computer", 0.7}, {"apple", 0.9}, {"samsung", 0.9},
    {"black", 0.2}, {"white", 0.2}, {"pro", 0.4}, {"plus", 0.3},
    {"inch", 0.1}, {"gb", 0.1} // Add domain-specific terms
};

// Step 3: Blocking rules for products
var blockingRules = new List<BlockingRule>
{
    // Group by brand/manufacturer
    new BlockingRule("Brand", r => 
    {
        var brand = r.Fields.GetValueOrDefault("brand", "") + 
                   r.Fields.GetValueOrDefault("manufacturer", "");
        return brand.ToLower().Trim();
    }),
    // Group by category/type
    new BlockingRule("Category", r => 
    {
        var category = r.Fields.GetValueOrDefault("category", "") + 
                      r.Fields.GetValueOrDefault("type", "");
        return category.ToLower().Trim();
    }),
    // Group by price range (within $50)
    new BlockingRule("PriceRange", r => 
    {
        var priceStr = r.Fields.GetValueOrDefault("price", "") + 
                      r.Fields.GetValueOrDefault("cost", "");
        if (double.TryParse(priceStr, out double price))
        {
            return ((int)(price / 50) * 50).ToString(); // Round to nearest $50
        }
        return "unknown";
    })
};

// Step 4: Product similarity functions
var similarities = new List<SimilarityFunction>
{
    new SimilarityFunction 
    { 
        FieldName = "ProductTitle",
        Compute = (r1, r2) => 
        {
            var title1 = r1.Fields.GetValueOrDefault("title", "") + 
                         r1.Fields.GetValueOrDefault("product_name", "");
            var title2 = r2.Fields.GetValueOrDefault("title", "") + 
                         r2.Fields.GetValueOrDefault("product_name", "");
            return Similarity.TfIdfSimilarity(title1, title2, idf);
        }
    },
    new SimilarityFunction 
    { 
        FieldName = "Brand",
        Compute = (r1, r2) => 
        {
            var brand1 = (r1.Fields.GetValueOrDefault("brand", "") + 
                         r1.Fields.GetValueOrDefault("manufacturer", "")).ToLower();
            var brand2 = (r2.Fields.GetValueOrDefault("brand", "") + 
                         r2.Fields.GetValueOrDefault("manufacturer", "")).ToLower();
            return brand1 == brand2 ? 1.0 : 0.0;
        }
    },
    new SimilarityFunction 
    { 
        FieldName = "Price",
        Compute = (r1, r2) => 
        {
            var price1Str = r1.Fields.GetValueOrDefault("price", "") + 
                           r1.Fields.GetValueOrDefault("cost", "");
            var price2Str = r2.Fields.GetValueOrDefault("price", "") + 
                           r2.Fields.GetValueOrDefault("cost", "");
            
            if (double.TryParse(price1Str, out double price1) && 
                double.TryParse(price2Str, out double price2))
            {
                double priceDiff = Math.Abs(price1 - price2);
                double avgPrice = (price1 + price2) / 2;
                return Math.Max(0, 1 - (priceDiff / avgPrice)); // Similarity decreases with price difference
            }
            return 0.0;
        }
    }
};

// Continue with scoring and clustering as in Example 1...

Example 3: Complete End-to-End Pipeline

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using ReLinker;

public class RecordLinkagePipeline
{
    public async Task<Dictionary<string, List<string>>> RunPipeline(
        string dbType, 
        string connectionString, 
        string query,
        Dictionary<string, double> idfDictionary,
        double threshold = 2.0)
    {
        try
        {
            // Step 1: Load data
            SimpleLogger.Info("Loading records from database...");
            var loader = DatabaseLoaderFactory.CreateLoader(dbType, connectionString, query);
            var records = await loader.LoadRecordsAsync();
            SimpleLogger.Info($"Loaded {records.Count} records");

            // Step 2: Define blocking and similarity based on your data structure
            var (blockingRules, similarities) = ConfigureForYourDomain(idfDictionary);

            // Step 3: Generate pairs
            SimpleLogger.Info("Generating candidate pairs...");
            var candidatePairs = BlockingHelper.GenerateCandidatePairsInBatches(
                records, blockingRules, batchSize: 5000).ToList();
            SimpleLogger.Info($"Generated {candidatePairs.Count} candidate pairs");

            // Step 4: Score and cluster
            SimpleLogger.Info("Scoring pairs...");
            double[] mProbs = Enumerable.Repeat(0.9, similarities.Count).ToArray();
            double[] uProbs = Enumerable.Repeat(0.1, similarities.Count).ToArray();

            var scores = MatchScorer.Score(candidatePairs, similarities, mProbs, uProbs);
            
            SimpleLogger.Info("Refining parameters with EM...");
            var (refinedM, refinedU) = MatchScorer.EstimateParametersWithEM(scores, similarities);
            
            var finalScores = MatchScorer.Score(candidatePairs, similarities, refinedM, refinedU);

            // Step 5: Cluster
            SimpleLogger.Info("Clustering matches...");
            var unionFind = new UnionFind();
            var matchCount = 0;
            
            foreach (var pair in finalScores.Where(p => p.Score > threshold))
            {
                unionFind.Union(pair.Record1.Id, pair.Record2.Id);
                matchCount++;
            }
            
            var clusters = unionFind.GetClusters();
            var duplicateGroups = clusters.Where(c => c.Value.Count > 1).Count();
            
            SimpleLogger.Info($"Found {matchCount} matches forming {duplicateGroups} duplicate groups");
            
            return clusters;
        }
        catch (Exception ex)
        {
            SimpleLogger.Error($"Pipeline failed: {ex.Message}");
            throw;
        }
    }

    private (List<BlockingRule>, List<SimilarityFunction>) ConfigureForYourDomain(
        Dictionary<string, double> idf)
    {
        // Customize these based on your specific data fields!
        var blocking = new List<BlockingRule>
        {
            new BlockingRule("FirstChar", r => 
            {
                var name = r.Fields.GetValueOrDefault("name", "") + 
                          r.Fields.GetValueOrDefault("company_name", "");
                return string.IsNullOrEmpty(name) ? "" : name.Substring(0, 1).ToUpper();
            })
        };

        var similarities = new List<SimilarityFunction>
        {
            new SimilarityFunction 
            { 
                FieldName = "Name",
                Compute = (r1, r2) => 
                {
                    var name1 = r1.Fields.GetValueOrDefault("name", "") + 
                               r1.Fields.GetValueOrDefault("company_name", "");
                    var name2 = r2.Fields.GetValueOrDefault("name", "") + 
                               r2.Fields.GetValueOrDefault("company_name", "");
                    return Similarity.JaroSimilarity(name1, name2, idf);
                }
            }
        };

        return (blocking, similarities);
    }
}

// Usage:
var pipeline = new RecordLinkagePipeline();
var idf = BuildIdfDictionary(); // You need to implement this based on your data
var results = await pipeline.RunPipeline("duckdb", "Data Source=mydata.db", 
    "SELECT * FROM records", idf, threshold: 1.5);

Getting Started Guide

Quick Start Checklist

Before you begin, you'll need:

Your database connection details:
- Connection string for your database
- SQL query to retrieve records (must have an ID column as first column)
- Provider name for generic databases (e.g., "System.Data.SqlClient", "System.Data.SQLite")
Understanding of your data:
- What fields contain the information you want to match on?
- What are the actual column names in your database?
- What kind of data quality issues do you expect?
Basic parameters to tune:
- Batch size: Start with 1000-5000 records per batch
- Match threshold: Start with 1.5-2.0, adjust based on results
- EM iterations: 10 is usually sufficient

Step-by-Step Setup

Step 1: Prepare Your IDF Dictionary

The IDF (Inverse Document Frequency) dictionary weights the importance of terms. Here's how to build one:

public Dictionary<string, double> BuildIdfDictionary(List<Record> records)
{
    var termDocumentCounts = new Dictionary<string, int>();
    var totalDocuments = records.Count;

    // Count how many records contain each term
    foreach (var record in records)
    {
        var allText = string.Join(" ", record.Fields.Values).ToLower();
        var terms = allText.Split(new[] { ' ', ',', '.', '!', '?' }, 
                                 StringSplitOptions.RemoveEmptyEntries);
        var uniqueTerms = new HashSet<string>(terms);
        
        foreach (var term in uniqueTerms)
        {
            termDocumentCounts[term] = termDocumentCounts.GetValueOrDefault(term, 0) + 1;
        }
    }

    // Calculate IDF: log(total_docs / docs_containing_term)
    var idf = new Dictionary<string, double>();
    foreach (var kvp in termDocumentCounts)
    {
        idf[kvp.Key] = Math.Log((double)totalDocuments / kvp.Value);
    }

    return idf;
}

Step 2: Choose Your Database Loader Parameters

For SQL Server:

var loader = DatabaseLoaderFactory.CreateLoader(
    "generic",                                    // Type
    "Server=localhost;Database=MyDB;Trusted_Connection=true;", // Connection
    "SELECT id, first_name, last_name, email FROM customers",  // Query
    "System.Data.SqlClient"                      // Provider
);

For SQLite:

var loader = DatabaseLoaderFactory.CreateLoader(
    "generic",
    "Data Source=C:\\path\\to\\database.db",
    "SELECT rowid, name, address, phone FROM contacts",
    "System.Data.SQLite"
);

For DuckDB:

var loader = DatabaseLoaderFactory.CreateLoader(
    "duckdb",
    "Data Source=:memory:",  // or path to .duckdb file
    "SELECT id, product_name, brand FROM products",
    null  // No provider needed for DuckDB
);

Step 3: Understand Blocking Strategy

Blocking is crucial for performance. Choose blocking keys that:

Group likely matches together
Distribute records somewhat evenly
Are not too restrictive (don't miss true matches)

Common blocking strategies:

// For names - first few characters
new BlockingRule("NamePrefix", r => 
{
    var name = r.Fields.GetValueOrDefault("name", "").ToLower().Trim();
    return name.Length >= 3 ? name.Substring(0, 3) : name;
})

// For addresses - ZIP code
new BlockingRule("ZipCode", r => 
{
    var zip = r.Fields.GetValueOrDefault("zip_code", "");
    return zip.Length >= 5 ? zip.Substring(0, 5) : zip;
})

// For emails - domain
new BlockingRule("EmailDomain", r => 
{
    var email = r.Fields.GetValueOrDefault("email", "");
    var parts = email.Split('@');
    return parts.Length > 1 ? parts[1].ToLower() : "";
})

// For phone numbers - area code + first 3 digits
new BlockingRule("PhonePrefix", r => 
{
    var phone = r.Fields.GetValueOrDefault("phone", "").Replace("-", "").Replace("(", "").Replace(")", "");
    return phone.Length >= 6 ? phone.Substring(0, 6) : phone;
})

1. Blocking Strategy

Use multiple complementary blocking rules
Choose blocking keys with good discriminative power
Monitor blocking effectiveness (pairs generated vs. true matches)

2. Similarity Functions

Select appropriate similarity functions for each field type
Use IDF weighting for better term importance handling
Normalize similarity scores to [0,1] range

3. Parameter Tuning

Use EM algorithm for automatic parameter estimation
Validate parameters on labeled data when available
Monitor convergence of EM algorithm

4. Threshold Selection

Use validation data to select optimal threshold
Consider precision/recall trade-offs
Apply different thresholds for different use cases

5. Performance Optimization

Profile blocking effectiveness
Monitor memory usage with large datasets
Use appropriate batch sizes for your system

Limitations and Considerations

Current Limitations

No support for phonetic similarity (Soundex, Metaphone)
Limited to string similarity functions
No machine learning-based similarity
Basic logging system

Scalability Considerations

Memory usage grows with dataset size
Blocking effectiveness crucial for large datasets
Consider distributed processing for very large datasets

Data Quality Impact

Poor data quality reduces matching accuracy
Standardization preprocessing recommended
Handle missing values appropriately

This documentation provides a comprehensive overview of the ReLinker system, its components, and usage patterns. The system provides a solid foundation for record linkage tasks with good performance characteristics and extensibility options.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- DuckDB.NET.Data (>= 1.3.0)
- RavenDB.Client (>= 7.0.3)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.1.0	73	6/21/2025
1.0.111	99	6/20/2025
1.0.15	96	6/20/2025
1.0.13	97	6/20/2025
1.0.12	96	6/20/2025
1.0.0	103	6/20/2025