Faborite.Core
0.1.1
dotnet add package Faborite.Core --version 0.1.1
NuGet\Install-Package Faborite.Core -Version 0.1.1
<PackageReference Include="Faborite.Core" Version="0.1.1" />
<PackageVersion Include="Faborite.Core" Version="0.1.1" />
<PackageReference Include="Faborite.Core" />
paket add Faborite.Core --version 0.1.1
#r "nuget: Faborite.Core, 0.1.1"
#:package Faborite.Core@0.1.1
#addin nuget:?package=Faborite.Core&version=0.1.1
#tool nuget:?package=Faborite.Core&version=0.1.1
Faborite ๐ฏ
Sync Microsoft Fabric lakehouse data locally for faster development.
Faborite lets you pull sample data from your Fabric Lakehouses to your local machine, so you can develop and test notebooks/scripts without waiting for cloud compute.
Why Faborite?
When working with Microsoft Fabric, you often need to:
- ๐ข Wait for cloud compute to spin up just to test a simple query
- ๐ธ Pay for compute time during development iterations
- ๐ Context-switch between local and cloud environments
Faborite solves this by bringing a representative sample of your data locally, enabling:
- โก Instant iteration - No cold start, no waiting
- ๐ฐ Cost savings - Develop locally, deploy to cloud
- ๐งช Better testing - Test with real data patterns locally
Features
- ๐ฒ Smart Sampling - Random, recent, stratified, or custom SQL sampling
- ๐ฆ Multiple Formats - Export to Parquet, CSV, JSON, or DuckDB
- โก Fast - Parallel downloads, DuckDB-powered sampling
- ๐ง Configurable - Sensible defaults, fully customizable per-table
- ๐ Secure - Uses Azure authentication (CLI, Service Principal, Managed Identity)
- ๐ Single Executable - Built with .NET 10 for fast startup and easy deployment
- ๐ก๏ธ Production Ready - Comprehensive validation, logging, and retry policies
Installation
Download Binary
Download the latest release from GitHub Releases:
| Platform | Download |
|---|---|
| Windows (x64) | faborite-win-x64.zip |
| Linux (x64) | faborite-linux-x64.tar.gz |
| macOS (x64) | faborite-osx-x64.tar.gz |
| macOS (ARM64) | faborite-osx-arm64.tar.gz |
As .NET Global Tool
dotnet tool install -g faborite
From Source
git clone https://github.com/mjtpena/faborite.git
cd faborite
dotnet build
Quick Start
1. Login to Azure
az login
2. Initialize Configuration
faborite init
Edit faborite.json with your workspace and lakehouse IDs.
3. Sync Data
# Sync all tables with defaults (10,000 random rows each)
faborite sync --workspace <workspace-id> --lakehouse <lakehouse-id>
# Or use the config file
faborite sync
4. Use Your Data
import duckdb
df = duckdb.read_parquet('./local_lakehouse/customers/customers.parquet').df()
CLI Reference
sync
Sync data from OneLake to local machine.
faborite sync [options]
| Option | Short | Description | Default |
|---|---|---|---|
--workspace |
-w |
Workspace ID (GUID) | From config |
--lakehouse |
-l |
Lakehouse ID (GUID) | From config |
--config |
-c |
Config file path | faborite.json |
--rows |
-n |
Number of rows to sample | 10000 |
--strategy |
-s |
Sampling strategy | random |
--format |
-f |
Output format | parquet |
--output |
-o |
Output directory | ./local_lakehouse |
--table |
-t |
Tables to sync (repeatable) | All tables |
--skip |
Tables to skip (repeatable) | None | |
--parallel |
-p |
Max parallel downloads | 4 |
--no-schema |
Skip schema export | false |
Examples:
# Sync specific tables
faborite sync -w <id> -l <id> --table customers --table orders
# Custom sampling
faborite sync -w <id> -l <id> --rows 5000 --strategy recent
# Export as DuckDB database
faborite sync -w <id> -l <id> --format duckdb
# Export as CSV
faborite sync -w <id> -l <id> --format csv
list-tables (alias: ls)
List available tables in a lakehouse.
faborite list-tables -w <workspace-id> -l <lakehouse-id>
init
Generate a sample configuration file.
faborite init [options]
| Option | Short | Description | Default |
|---|---|---|---|
--output |
-o |
Output file path | faborite.json |
--force |
-f |
Overwrite existing file | false |
status
Show status of locally synced data.
faborite status [options]
| Option | Short | Description | Default |
|---|---|---|---|
--path |
-p |
Local data directory | ./local_lakehouse |
Sampling Strategies
| Strategy | Description | Use Case |
|---|---|---|
random |
Random sample using DuckDB's USING SAMPLE |
General development |
recent |
Most recent rows by date column | Time-series data |
head |
First N rows | Quick testing |
tail |
Last N rows | Recent additions |
stratified |
Proportional sample by column | Categorical data |
query |
Custom SQL query | Complex filters |
full |
All rows (no sampling) | Small lookup tables |
Configuration
Config File
Create a faborite.json file in your project root:
{
"workspaceId": "your-workspace-guid",
"lakehouseId": "your-lakehouse-guid",
"sample": {
"rows": 10000,
"strategy": "random"
},
"format": {
"output": "parquet",
"compression": "snappy"
},
"sync": {
"localPath": "./local_lakehouse",
"parallelTables": 4,
"includeSchema": true
},
"auth": {
"method": "cli"
},
"tableOverrides": {
"large_table": {
"rows": 1000,
"strategy": "recent",
"dateColumn": "created_at"
},
"lookup_table": {
"strategy": "full"
}
}
}
Environment Variables
All configuration can be overridden with environment variables:
| Variable | Description |
|---|---|
FABORITE_WORKSPACE_ID |
Workspace ID |
FABORITE_LAKEHOUSE_ID |
Lakehouse ID |
FABORITE_OUTPUT_PATH |
Output directory |
FABORITE_SAMPLE_ROWS |
Default sample rows |
FABORITE_FORMAT |
Output format |
AZURE_TENANT_ID |
Azure tenant for service principal auth |
AZURE_CLIENT_ID |
Azure client ID for service principal auth |
AZURE_CLIENT_SECRET |
Azure client secret for service principal auth |
Output Structure
./local_lakehouse/
โโโ customers/
โ โโโ customers.parquet
โ โโโ _schema.json
โโโ orders/
โ โโโ orders.parquet
โ โโโ _schema.json
โโโ products/
โ โโโ products.parquet
โ โโโ _schema.json
โโโ lakehouse.duckdb # When using --format duckdb
Authentication
Faborite uses Azure Identity for authentication:
| Method | Description | Config |
|---|---|---|
| Azure CLI (default) | Uses az login credentials |
"method": "cli" |
| Service Principal | App registration with secret | "method": "serviceprincipal" |
| Managed Identity | For Azure-hosted environments | "method": "managedidentity" |
| Default | Azure DefaultAzureCredential chain | "method": "default" |
Service Principal Setup
# Set environment variables
export AZURE_TENANT_ID="your-tenant-id"
export AZURE_CLIENT_ID="your-client-id"
export AZURE_CLIENT_SECRET="your-client-secret"
# Update config
{
"auth": {
"method": "serviceprincipal",
"tenantId": "your-tenant-id",
"clientId": "your-client-id"
}
}
Using with Notebooks
After syncing, load data in your local notebooks:
Python with DuckDB
import duckdb
# If exported as DuckDB
conn = duckdb.connect('./local_lakehouse/lakehouse.duckdb')
df = conn.execute("SELECT * FROM customers").df()
# If exported as Parquet
df = duckdb.read_parquet('./local_lakehouse/customers/customers.parquet').df()
Python with Pandas
import pandas as pd
df = pd.read_parquet('./local_lakehouse/customers/customers.parquet')
Python with Polars
import polars as pl
df = pl.read_parquet('./local_lakehouse/customers/customers.parquet')
.NET
using DuckDB.NET.Data;
using var connection = new DuckDBConnection("Data Source=./local_lakehouse/lakehouse.duckdb");
connection.Open();
// Query your data...
Requirements
- Runtime: .NET 10.0 (or use self-contained builds)
- Azure: Account with access to Microsoft Fabric
- Permissions: Read access to the target Lakehouse via OneLake
Finding Your IDs
- Workspace ID: Go to your Fabric workspace โ Settings โ Copy the Workspace ID
- Lakehouse ID: Open your Lakehouse โ Settings โ Copy the Lakehouse ID
Development
Prerequisites
- .NET 10.0 SDK
- Azure CLI (for authentication)
Building
# Clone the repository
git clone https://github.com/mjtpena/faborite.git
cd faborite
# Build
dotnet build
# Run tests
dotnet test
# Run with coverage
dotnet test --collect:"XPlat Code Coverage"
# Run the CLI
dotnet run --project src/Faborite.Cli -- sync -w <workspace-id> -l <lakehouse-id>
Publishing
# Publish self-contained for Windows
dotnet publish src/Faborite.Cli -c Release -r win-x64 --self-contained -o publish/win-x64
# Publish self-contained for Linux
dotnet publish src/Faborite.Cli -c Release -r linux-x64 --self-contained -o publish/linux-x64
# Publish self-contained for macOS
dotnet publish src/Faborite.Cli -c Release -r osx-x64 --self-contained -o publish/osx-x64
dotnet publish src/Faborite.Cli -c Release -r osx-arm64 --self-contained -o publish/osx-arm64
Project Structure
faborite/
โโโ src/
โ โโโ Faborite.Core/ # Core library
โ โ โโโ Configuration/ # Config loading & validation
โ โ โโโ OneLake/ # OneLake ADLS Gen2 client
โ โ โโโ Sampling/ # DuckDB-powered sampling
โ โ โโโ Export/ # Format exporters
โ โ โโโ Logging/ # Logging infrastructure
โ โ โโโ Resilience/ # Retry policies (Polly)
โ โ โโโ FaboriteService.cs # Main orchestrator
โ โโโ Faborite.Cli/ # CLI application
โ โโโ Commands/ # CLI commands
โ โโโ Program.cs # Entry point
โโโ tests/
โ โโโ Faborite.Core.Tests/ # Core library tests
โ โโโ Faborite.Cli.Tests/ # CLI tests
โโโ .github/
โ โโโ workflows/
โ โโโ ci.yml # CI/CD pipeline
โโโ Faborite.sln
Roadmap
- Delta Lake time travel support
- Incremental sync (only changed data)
- Schema drift detection
- VS Code extension
- GitHub Action for CI/CD pipelines
- Support for Fabric Warehouses
Contributing
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
Quick Contribution Steps
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Security
Please see our Security Policy for reporting vulnerabilities.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- DuckDB - For blazing fast local analytics
- Azure SDK for .NET - For Azure integration
- Spectre.Console - For beautiful CLI output
- Polly - For resilience policies
Made with โค๏ธ by Michael John Peรฑa
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Azure.Identity (>= 1.13.1)
- Azure.Storage.Files.DataLake (>= 12.21.0)
- DuckDB.NET.Data (>= 1.2.0)
- DuckDB.NET.Data.Full (>= 1.2.0)
- Microsoft.Extensions.Configuration (>= 10.0.0)
- Microsoft.Extensions.Configuration.EnvironmentVariables (>= 10.0.0)
- Microsoft.Extensions.Configuration.Json (>= 10.0.0)
- Microsoft.Extensions.Logging.Abstractions (>= 10.0.0)
- Microsoft.Extensions.Options.ConfigurationExtensions (>= 10.0.0)
- Parquet.Net (>= 5.0.2)
- Polly (>= 8.4.2)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.