Microsoft.Azure.Databricks.Client 2.0.0-rc.2

Prefix Reserved
This is a prerelease version of Microsoft.Azure.Databricks.Client.
There is a newer version of this package available.
See the version list below for details.
dotnet add package Microsoft.Azure.Databricks.Client --version 2.0.0-rc.2                
NuGet\Install-Package Microsoft.Azure.Databricks.Client -Version 2.0.0-rc.2                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Microsoft.Azure.Databricks.Client" Version="2.0.0-rc.2" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add Microsoft.Azure.Databricks.Client --version 2.0.0-rc.2                
#r "nuget: Microsoft.Azure.Databricks.Client, 2.0.0-rc.2"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install Microsoft.Azure.Databricks.Client as a Cake Addin
#addin nuget:?package=Microsoft.Azure.Databricks.Client&version=2.0.0-rc.2&prerelease

// Install Microsoft.Azure.Databricks.Client as a Cake Tool
#tool nuget:?package=Microsoft.Azure.Databricks.Client&version=2.0.0-rc.2&prerelease                

Azure Databricks Client Library


Build Status CodeQL NuGet version () Version 1.1 ()

The Azure Databricks Client Library offers a convenient interface for automating your Azure Databricks workspace through Azure Databricks REST API.

The implementation of this library is based on REST API version 2.0 and above.

The master branch is for version 2.0. Version 1.1 (stable) is in the releases/1.1 branch.

Requirements

You must have personal access tokens (PAT) or Azure Active Directory tokens (AAD Token) to access the databricks REST API.

Supported APIs

REST API Version Description
Clusters 2.0 The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.
Jobs 2.1 The Jobs API allows you to programmatically manage Azure Databricks jobs.
Dbfs 2.0 The DBFS API is a Databricks API that makes it simple to interact with various data sources without having to include your credentials every time you read a file.
Secrets 2.0 The Secrets API allows you to manage secrets, secret scopes, and access permissions.
Groups 2.0 The Groups API allows you to manage groups of users.
Libraries 2.0 The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster.
Token 2.0 The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Azure Databricks REST APIs.
Workspace 2.0 The Workspace API allows you to list, import, export, and delete notebooks and folders.
InstancePool 2.0 The Instance Pools API allows you to create, edit, delete and list instance pools.
🆕 Permissions 2.0 The Permissions API lets you manage permissions for Token, Cluster, Pool, Job, Delta Live Tables pipeline, Notebook, Directory, MLflow experiment, MLflow registered model, SQL warehouse, Repo and Cluster Policies.
🆕 Cluster Policies 2.0 The Cluster Policies API allows you to create, list, and edit cluster policies.
🆕 Global Init Scripts 2.0 The Global Init Scripts API lets Azure Databricks administrators add global cluster initialization scripts in a secure and controlled manner.

Usage

Check out the Sample project for more detailed usages.

In the following examples, the baseUrl variable should be set to the workspace base URL, which looks like https://adb-<workspace-id>.<random-number>.azuredatabricks.net, and token variable should be set to your Databricks personal access token.

Creating client

using (var client = DatabricksClient.CreateClient(baseUrl, token))
{
    // ...
}

Cluster API

  • Create a single node cluster:
var clusterConfig = ClusterAttributes
            .GetNewClusterConfiguration("Sample cluster")
            .WithRuntimeVersion(RuntimeVersions.Runtime_10_4)
            .WithAutoScale(3, 7)
            .WithAutoTermination(30)
            .WithClusterLogConf("dbfs:/logs/")
            .WithNodeType(NodeTypes.Standard_D3_v2)
            .WithClusterMode(ClusterMode.SingleNode);

var clusterId = await client.Clusters.Create(clusterConfig);
  • Wait for the cluster to be ready (or fail to start):
using Policy = Polly.Policy;

static async Task WaitForCluster(IClustersApi clusterClient, string clusterId, int pollIntervalSeconds = 15)
{
    var retryPolicy = Policy.Handle<WebException>()
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.BadGateway)
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.InternalServerError)
        .Or<ClientApiException>(e => e.Message.Contains("\"error_code\":\"TEMPORARILY_UNAVAILABLE\""))
        .Or<TaskCanceledException>(e => !e.CancellationToken.IsCancellationRequested)
        .OrResult<ClusterInfo>(info => info.State is not (ClusterState.RUNNING or ClusterState.ERROR or ClusterState.TERMINATED))
        .WaitAndRetryForeverAsync(
            _ => TimeSpan.FromSeconds(pollIntervalSeconds),
            (delegateResult, _) =>
            {
                if (delegateResult.Exception != null)
                {
                    Console.WriteLine($"[{DateTime.UtcNow:s}] Failed to query cluster info - {delegateResult.Exception}");
                }
            });
    await retryPolicy.ExecuteAsync(async () =>
    {
        var info = await clusterClient.Get(clusterId);
        Console.WriteLine($"[{DateTime.UtcNow:s}] Cluster:{clusterId}\tState:{info.State}\tMessage:{info.StateMessage}");
        return info;
    });
}

await WaitForCluster(client.Clusters, clusterId);

  • Stop a cluster:
await client.Clusters.Terminate(clusterId);
await WaitForCluster(client.Clusters, clusterId);
  • Delete a cluster:
await client.Clusters.Delete(clusterId);

Jobs API

  • Create a job:
// Job schedule
var schedule = new CronSchedule
{
    QuartzCronExpression = "0 0 9 ? * MON-FRI",
    TimezoneId = "Europe/London",
    PauseStatus = PauseStatus.UNPAUSED
};

// Run with a job cluster
var newCluster = ClusterAttributes.GetNewClusterConfiguration()
    .WithClusterMode(ClusterMode.SingleNode)
    .WithNodeType(NodeTypes.Standard_D3_v2)
    .WithRuntimeVersion(RuntimeVersions.Runtime_10_4);

// Create job settings
var jobSettings = new JobSettings
{
    MaxConcurrentRuns = 1,
    Schedule = schedule,
    Name = "Sample Job"
};

// Adding 3 tasks to the job settings.
var task1 = jobSettings.AddTask("task1", new NotebookTask { NotebookPath = SampleNotebookPath })
    .WithDescription("Sample Job - task1")
    .WithNewCluster(newCluster);
var task2 = jobSettings.AddTask("task2", new NotebookTask { NotebookPath = SampleNotebookPath })
    .WithDescription("Sample Job - task2")
    .WithNewCluster(newCluster);
jobSettings.AddTask("task3", new NotebookTask { NotebookPath = SampleNotebookPath }, new[] { task1, task2 })
    .WithDescription("Sample Job - task3")
    .WithNewCluster(newCluster);

// Create the job.
Console.WriteLine("Creating new job");
var jobId = await client.Jobs.Create(jobSettings);
Console.WriteLine("Job created: {0}", jobId);
  • Start a job run
// Start the job and retrieve the run id.
Console.WriteLine("Run now: {0}", jobId);
var runId = await client.Jobs.RunNow(jobId);
  • Wait for a job run to complete
using Policy = Polly.Policy;

static async Task WaitForRun(IJobsApi jobClient, long runId, int pollIntervalSeconds = 15)
{
    var retryPolicy = Policy.Handle<WebException>()
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.BadGateway)
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.InternalServerError)
        .Or<ClientApiException>(e => e.Message.Contains("\"error_code\":\"TEMPORARILY_UNAVAILABLE\""))
        .Or<TaskCanceledException>(e => !e.CancellationToken.IsCancellationRequested)
        .OrResult<RunState>(state =>
            state.LifeCycleState is RunLifeCycleState.PENDING or RunLifeCycleState.RUNNING
                or RunLifeCycleState.TERMINATING)
        .WaitAndRetryForeverAsync(
            _ => TimeSpan.FromSeconds(pollIntervalSeconds),
            (delegateResult, _) =>
            {
                if (delegateResult.Exception != null)
                {
                    Console.WriteLine(
                        $"[{DateTime.UtcNow:s}] Failed to query run - {delegateResult.Exception}");
                }
            });
    await retryPolicy.ExecuteAsync(async () =>
    {
        var (run, _) = await jobClient.RunsGet(runId);
        Console.WriteLine(
            $"[{DateTime.UtcNow:s}] Run:{runId}\tLifeCycleState:{run.State.LifeCycleState}\tResultState:{run.State.ResultState}\tCompleted:{run.IsCompleted}"
        );
        return run.State;
    });
}

await WaitForRun(client.Jobs, runId);
  • Export a job run
var (run, _) = await client.Jobs.RunsGet(runId);
foreach (var runTask in run.Tasks)
{
    var viewItems = await client.Jobs.RunsExport(runTask.RunId);
    foreach (var viewItem in viewItems)
    {
        Console.WriteLine($"Exported view item from run {runTask.RunId}, task \"{runTask.TaskKey}\", view \"{viewItem.Name}\"");
        Console.WriteLine("====================");
        Console.WriteLine(viewItem.Content[..200] + "...");
        Console.WriteLine("====================");
    }
}

Secrets API

Creating secret scope

const string scope = "SampleScope";
await client.Secrets.CreateScope(scope, null);

Create text secret

var secretName = "secretkey.text";
await client.Secrets.PutSecret("secret text", scope, secretName);

Create binary secret

var secretName = "secretkey.bin";
await client.Secrets.PutSecret(new byte[]{0x01, 0x02, 0x03, 0x04}, scope, secretName);

Resiliency

The clusters/create, jobs/run-now and jobs/runs/submit APIs support idempotency token. It is optional token to guarantee the idempotency of requests. If a resource (a cluster or a run) with the provided token already exists, the request does not create a new resource but returns the ID of the existing resource instead.

If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one resource is launched with that idempotency token.

The following code illustrates how to use Polly to retry the request with idempotency_token if the request fails.

using Polly;

double retryIntervalSec = 15;
string idempotencyToken = Guid.NewGuid().ToString();

var clusterInfo = ClusterAttributes.GetNewClusterConfiguration("my-cluster")
    .WithNodeType("Standard_D3_v2")
    .WithNumberOfWorkers(25)
    .WithRuntimeVersion(RuntimeVersions.Runtime_7_3);

var retryPolicy = Policy.Handle<WebException>()
    .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.BadGateway)
    .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.InternalServerError)
    .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.ServiceUnavailable)
    .Or<ClientApiException>(e => e.Message.Contains("\"error_code\":\"TEMPORARILY_UNAVAILABLE\""))
    .Or<TaskCanceledException>(e => !e.CancellationToken.IsCancellationRequested)
    .WaitAndRetryForeverAsync(_ => TimeSpan.FromSeconds(retryIntervalSec));

var clusterId = await retryPolicy.ExecuteAsync(async () => await client.Clusters.Create(clusterInfo, idempotencyToken));

Breaking changes

  • The v2 of the library targets .NET 6 runtime.

  • The Jobs API was redesigned to align with the version 2.1 of the REST API.

    • In the previous version, the Jobs API only supports single task per job. The new Jobs API supports multiple tasks per job, where the tasks are represented as a DAG.

    • The new version supports two more types of task: Python Wheel task and Delta Live Tables pipeline task.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit Microsoft Contributor License Agreement (CLA).

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (6)

Showing the top 5 NuGet packages that depend on Microsoft.Azure.Databricks.Client:

Package Downloads
Storage.Net.Microsoft.Azure.Databricks.Dbfs

Extension to Storage.Net that provides a read-only filesystem access to Azure Databricks.

Energinet.DataHub.Core.Databricks.Jobs

[Release Notes](https://github.com/Energinet-DataHub/geh-core/blob/master/source/Databricks/documents/release-notes/release-notes.md) [Documentation](https://github.com/Energinet-DataHub/geh-core/blob/master/source/Databricks/documents/documentation.md)

Arcus.BackgroundJobs.Databricks

Provides capabilities for running background jobs to automate Databricks workflows.

Storage.NetCore.Databricks

Extension to Storage.Net that provides access to various aspects of Databricks, such as DBFS, secrets, clusters, workbooks and so on. Azure and AWS are fully supported.

Storage.Net.Databricks

Extension to Storage.Net that provides access to various aspects of Databricks, such as DBFS, secrets, clusters, workbooks and so on. Azure and AWS are fully supported.

GitHub repositories (1)

Showing the top 1 popular GitHub repositories that depend on Microsoft.Azure.Databricks.Client:

Repository Stars
robinrodricks/FluentStorage
A polycloud .NET cloud storage abstraction layer. Provides Blob storage (AWS S3, GCP, FTP, SFTP, Azure Blob/File/Event Hub/Data Lake) and Messaging (AWS SQS, Azure Queue/ServiceBus). Supports .NET 5+ and .NET Standard 2.0+. Pure C#.
Version Downloads Last updated
2.6.0 5,939 9/24/2024
2.5.2 9,526 9/5/2024
2.5.1 13,063 7/10/2024
2.5.0 2,288 6/21/2024
2.4.0 36,986 4/15/2024
2.3.0 12,618 3/8/2024
2.2.1 44,447 11/11/2023
2.2.0 25,977 9/24/2023
2.1.2 12,138 9/10/2023
2.1.2-rc.1 211 9/2/2023
2.1.1 21,290 8/6/2023
2.1.0 15,478 6/15/2023
2.0.0 31,049 3/24/2023
2.0.0-rc.5 729 2/23/2023
2.0.0-rc.4 108 2/17/2023
2.0.0-rc.3 4,781 11/20/2022
2.0.0-rc.2 2,606 11/1/2022
2.0.0-rc.1 1,929 7/19/2022
1.1.2515.1 72,528 11/20/2022
1.1.2395.2 68,200 7/23/2022
1.1.2388.2 14,321 7/16/2022
1.1.2380.6 807 7/8/2022
1.1.2364.2 5,996 6/23/2022
1.1.2304.4 10,513 4/23/2022
1.1.2133.2 72,578 11/3/2021
1.1.2098.1 6,742 9/29/2021
1.1.2014.2 9,781 7/7/2021
1.1.1978.2 5,789 6/1/2021
1.1.1957.4 11,249 5/11/2021
1.1.1944.1 17,132 4/28/2021
1.1.1808.3 36,393 12/13/2020
1.1.1671.1 54,875 7/29/2020
1.1.1613.2 29,863 6/1/2020
1.1.1552.1 19,737 4/1/2020
1.1.1526.2 16,581 3/6/2020
1.1.1491.2 14,223 1/31/2020
1.1.1488.1 889 1/28/2020
1.1.1227.1 2,543 12/28/2019
1.1.924.1 19,088 9/24/2019
1.1.827.2 9,311 8/27/2019
1.1.714.2 28,897 7/15/2019
1.1.614.1 1,022 6/14/2019
1.1.602.3 9,949 6/3/2019
1.1.125.4 30,708 1/25/2019
1.1.103.1 5,561 1/3/2019
1.0.1128.1 1,009 11/28/2018
1.0.1120.1 774 11/21/2018
1.0.1106.2 828 11/7/2018
1.0.813.2 1,839 8/13/2018
1.0.810.3 1,053 8/11/2018
1.0.809.2 1,441 8/10/2018

Changes in 2.0.0:
           - The v2 library targets .NET 6 runtime.
           - The v2 library migrated the underlying JSON parsing library from [Newtonsoft.Json](Newtonsoft.Json) to [System.Text.Json](System.Text.Json).
           - The Jobs API was redesigned to support the version 2.1 of the REST API.
           - Added support for ClusterPolicies API.
           - Added support for Global Init Scripts API.
           - Added support for creating clusters with credential pass-through.
           - Added support for configuring the HttpClient object used by DatabricksClient.
           - Added unit tests.