Faborite.Core 0.1.1

dotnet add package Faborite.Core --version 0.1.1
                    
NuGet\Install-Package Faborite.Core -Version 0.1.1
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Faborite.Core" Version="0.1.1" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Faborite.Core" Version="0.1.1" />
                    
Directory.Packages.props
<PackageReference Include="Faborite.Core" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Faborite.Core --version 0.1.1
                    
#r "nuget: Faborite.Core, 0.1.1"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Faborite.Core@0.1.1
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Faborite.Core&version=0.1.1
                    
Install as a Cake Addin
#tool nuget:?package=Faborite.Core&version=0.1.1
                    
Install as a Cake Tool

Faborite ๐ŸŽฏ

CI/CD Pipeline codecov NuGet License: MIT .NET

Sync Microsoft Fabric lakehouse data locally for faster development.

Faborite lets you pull sample data from your Fabric Lakehouses to your local machine, so you can develop and test notebooks/scripts without waiting for cloud compute.

Why Faborite?

When working with Microsoft Fabric, you often need to:

  • ๐Ÿข Wait for cloud compute to spin up just to test a simple query
  • ๐Ÿ’ธ Pay for compute time during development iterations
  • ๐Ÿ”„ Context-switch between local and cloud environments

Faborite solves this by bringing a representative sample of your data locally, enabling:

  • โšก Instant iteration - No cold start, no waiting
  • ๐Ÿ’ฐ Cost savings - Develop locally, deploy to cloud
  • ๐Ÿงช Better testing - Test with real data patterns locally

Features

  • ๐ŸŽฒ Smart Sampling - Random, recent, stratified, or custom SQL sampling
  • ๐Ÿ“ฆ Multiple Formats - Export to Parquet, CSV, JSON, or DuckDB
  • โšก Fast - Parallel downloads, DuckDB-powered sampling
  • ๐Ÿ”ง Configurable - Sensible defaults, fully customizable per-table
  • ๐Ÿ” Secure - Uses Azure authentication (CLI, Service Principal, Managed Identity)
  • ๐Ÿš€ Single Executable - Built with .NET 10 for fast startup and easy deployment
  • ๐Ÿ›ก๏ธ Production Ready - Comprehensive validation, logging, and retry policies

Installation

Download Binary

Download the latest release from GitHub Releases:

Platform Download
Windows (x64) faborite-win-x64.zip
Linux (x64) faborite-linux-x64.tar.gz
macOS (x64) faborite-osx-x64.tar.gz
macOS (ARM64) faborite-osx-arm64.tar.gz

As .NET Global Tool

dotnet tool install -g faborite

From Source

git clone https://github.com/mjtpena/faborite.git
cd faborite
dotnet build

Quick Start

1. Login to Azure

az login

2. Initialize Configuration

faborite init

Edit faborite.json with your workspace and lakehouse IDs.

3. Sync Data

# Sync all tables with defaults (10,000 random rows each)
faborite sync --workspace <workspace-id> --lakehouse <lakehouse-id>

# Or use the config file
faborite sync

4. Use Your Data

import duckdb
df = duckdb.read_parquet('./local_lakehouse/customers/customers.parquet').df()

CLI Reference

sync

Sync data from OneLake to local machine.

faborite sync [options]
Option Short Description Default
--workspace -w Workspace ID (GUID) From config
--lakehouse -l Lakehouse ID (GUID) From config
--config -c Config file path faborite.json
--rows -n Number of rows to sample 10000
--strategy -s Sampling strategy random
--format -f Output format parquet
--output -o Output directory ./local_lakehouse
--table -t Tables to sync (repeatable) All tables
--skip Tables to skip (repeatable) None
--parallel -p Max parallel downloads 4
--no-schema Skip schema export false

Examples:

# Sync specific tables
faborite sync -w <id> -l <id> --table customers --table orders

# Custom sampling
faborite sync -w <id> -l <id> --rows 5000 --strategy recent

# Export as DuckDB database
faborite sync -w <id> -l <id> --format duckdb

# Export as CSV
faborite sync -w <id> -l <id> --format csv

list-tables (alias: ls)

List available tables in a lakehouse.

faborite list-tables -w <workspace-id> -l <lakehouse-id>

init

Generate a sample configuration file.

faborite init [options]
Option Short Description Default
--output -o Output file path faborite.json
--force -f Overwrite existing file false

status

Show status of locally synced data.

faborite status [options]
Option Short Description Default
--path -p Local data directory ./local_lakehouse

Sampling Strategies

Strategy Description Use Case
random Random sample using DuckDB's USING SAMPLE General development
recent Most recent rows by date column Time-series data
head First N rows Quick testing
tail Last N rows Recent additions
stratified Proportional sample by column Categorical data
query Custom SQL query Complex filters
full All rows (no sampling) Small lookup tables

Configuration

Config File

Create a faborite.json file in your project root:

{
  "workspaceId": "your-workspace-guid",
  "lakehouseId": "your-lakehouse-guid",
  "sample": {
    "rows": 10000,
    "strategy": "random"
  },
  "format": {
    "output": "parquet",
    "compression": "snappy"
  },
  "sync": {
    "localPath": "./local_lakehouse",
    "parallelTables": 4,
    "includeSchema": true
  },
  "auth": {
    "method": "cli"
  },
  "tableOverrides": {
    "large_table": {
      "rows": 1000,
      "strategy": "recent",
      "dateColumn": "created_at"
    },
    "lookup_table": {
      "strategy": "full"
    }
  }
}

Environment Variables

All configuration can be overridden with environment variables:

Variable Description
FABORITE_WORKSPACE_ID Workspace ID
FABORITE_LAKEHOUSE_ID Lakehouse ID
FABORITE_OUTPUT_PATH Output directory
FABORITE_SAMPLE_ROWS Default sample rows
FABORITE_FORMAT Output format
AZURE_TENANT_ID Azure tenant for service principal auth
AZURE_CLIENT_ID Azure client ID for service principal auth
AZURE_CLIENT_SECRET Azure client secret for service principal auth

Output Structure

./local_lakehouse/
โ”œโ”€โ”€ customers/
โ”‚   โ”œโ”€โ”€ customers.parquet
โ”‚   โ””โ”€โ”€ _schema.json
โ”œโ”€โ”€ orders/
โ”‚   โ”œโ”€โ”€ orders.parquet
โ”‚   โ””โ”€โ”€ _schema.json
โ”œโ”€โ”€ products/
โ”‚   โ”œโ”€โ”€ products.parquet
โ”‚   โ””โ”€โ”€ _schema.json
โ””โ”€โ”€ lakehouse.duckdb     # When using --format duckdb

Authentication

Faborite uses Azure Identity for authentication:

Method Description Config
Azure CLI (default) Uses az login credentials "method": "cli"
Service Principal App registration with secret "method": "serviceprincipal"
Managed Identity For Azure-hosted environments "method": "managedidentity"
Default Azure DefaultAzureCredential chain "method": "default"

Service Principal Setup

# Set environment variables
export AZURE_TENANT_ID="your-tenant-id"
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"

# Update config
{
  "auth": {
    "method": "serviceprincipal",
    "tenantId": "your-tenant-id",
    "clientId": "your-client-id"
  }
}

Using with Notebooks

After syncing, load data in your local notebooks:

Python with DuckDB

import duckdb

# If exported as DuckDB
conn = duckdb.connect('./local_lakehouse/lakehouse.duckdb')
df = conn.execute("SELECT * FROM customers").df()

# If exported as Parquet
df = duckdb.read_parquet('./local_lakehouse/customers/customers.parquet').df()

Python with Pandas

import pandas as pd

df = pd.read_parquet('./local_lakehouse/customers/customers.parquet')

Python with Polars

import polars as pl

df = pl.read_parquet('./local_lakehouse/customers/customers.parquet')

.NET

using DuckDB.NET.Data;

using var connection = new DuckDBConnection("Data Source=./local_lakehouse/lakehouse.duckdb");
connection.Open();
// Query your data...

Requirements

  • Runtime: .NET 10.0 (or use self-contained builds)
  • Azure: Account with access to Microsoft Fabric
  • Permissions: Read access to the target Lakehouse via OneLake

Finding Your IDs

  1. Workspace ID: Go to your Fabric workspace โ†’ Settings โ†’ Copy the Workspace ID
  2. Lakehouse ID: Open your Lakehouse โ†’ Settings โ†’ Copy the Lakehouse ID

Development

Prerequisites

Building

# Clone the repository
git clone https://github.com/mjtpena/faborite.git
cd faborite

# Build
dotnet build

# Run tests
dotnet test

# Run with coverage
dotnet test --collect:"XPlat Code Coverage"

# Run the CLI
dotnet run --project src/Faborite.Cli -- sync -w <workspace-id> -l <lakehouse-id>

Publishing

# Publish self-contained for Windows
dotnet publish src/Faborite.Cli -c Release -r win-x64 --self-contained -o publish/win-x64

# Publish self-contained for Linux
dotnet publish src/Faborite.Cli -c Release -r linux-x64 --self-contained -o publish/linux-x64

# Publish self-contained for macOS
dotnet publish src/Faborite.Cli -c Release -r osx-x64 --self-contained -o publish/osx-x64
dotnet publish src/Faborite.Cli -c Release -r osx-arm64 --self-contained -o publish/osx-arm64

Project Structure

faborite/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ Faborite.Core/           # Core library
โ”‚   โ”‚   โ”œโ”€โ”€ Configuration/       # Config loading & validation
โ”‚   โ”‚   โ”œโ”€โ”€ OneLake/             # OneLake ADLS Gen2 client
โ”‚   โ”‚   โ”œโ”€โ”€ Sampling/            # DuckDB-powered sampling
โ”‚   โ”‚   โ”œโ”€โ”€ Export/              # Format exporters
โ”‚   โ”‚   โ”œโ”€โ”€ Logging/             # Logging infrastructure
โ”‚   โ”‚   โ”œโ”€โ”€ Resilience/          # Retry policies (Polly)
โ”‚   โ”‚   โ””โ”€โ”€ FaboriteService.cs   # Main orchestrator
โ”‚   โ””โ”€โ”€ Faborite.Cli/            # CLI application
โ”‚       โ”œโ”€โ”€ Commands/            # CLI commands
โ”‚       โ””โ”€โ”€ Program.cs           # Entry point
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ Faborite.Core.Tests/     # Core library tests
โ”‚   โ””โ”€โ”€ Faborite.Cli.Tests/      # CLI tests
โ”œโ”€โ”€ .github/
โ”‚   โ””โ”€โ”€ workflows/
โ”‚       โ””โ”€โ”€ ci.yml               # CI/CD pipeline
โ””โ”€โ”€ Faborite.sln

Roadmap

  • Delta Lake time travel support
  • Incremental sync (only changed data)
  • Schema drift detection
  • VS Code extension
  • GitHub Action for CI/CD pipelines
  • Support for Fabric Warehouses

Contributing

Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.

Quick Contribution Steps

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Security

Please see our Security Policy for reporting vulnerabilities.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


Made with โค๏ธ by Michael John Peรฑa

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.1.1 36 1/7/2026
0.1.0 40 1/6/2026