Databricks Asset Bundles: A Complete Guide

What Are Databricks Asset Bundles?

Databricks Asset Bundles (DABs) provide a structured way to package, version, and deploy data assets and their configurations within the Databricks ecosystem. They help organize resources and ensure consistent deployment across different environments, making it easier to manage MLOps and data engineering workflows.

Key Benefits of Databricks Asset Bundles

Declarative Configuration: Define your workspace resources and their relationships in YAML files
Environment Management: Easily manage configurations across development, staging, and production
Version Control: Track changes and maintain history through Git integration
CI/CD Integration: Seamlessly integrate with existing CI/CD pipelines
Reproducibility: Ensure consistent deployment across different workspaces

How Do Databricks Asset Bundles Work?

Databricks Asset Bundles use declarative configuration files to define:

Workflows (including tasks and schedules)
Jobs
MLflow Models
Experiments
Permissions
Variables and Secrets

These configurations are defined in YAML files and can be deployed across different environments while maintaining environment-specific settings.

Creating a Databricks Asset Bundle

To create an asset bundle:

Initialize the Bundle: Create a databricks.yml file in your project root
Define Resources: Specify workflows, jobs, and other resources in your configuration
Set Variables: Configure environment-specific variables
Deploy: Use the Databricks CLI to deploy your bundle

Example Bundle Structure

my_project/
│── databricks.yml      # Main bundle configuration
│── resources/          # Resource definitions
│   ├── jobs.yml
│   ├── workflows.yml
│── src/               # Source code
│   ├── notebook1.py
│   ├── notebook2.py
│── conf/              # Environment configurations
    ├── dev.yml
    ├── prod.yml

Example Configuration

bundle:
  name: my_project
  include:
    - resources/*.yml

variables:
  env:
    description: Environment name
    default: dev

Example Bundle Configurations

Let’s look at several common configuration patterns for Databricks Asset Bundles:

Basic Bundle Configuration

bundle:
  name: my_project
  include:
    - resources/*.yml

variables:
  env:
    description: Environment name
    default: dev

SQL Warehouse Configuration

bundle:
  name: default_sql

variables:
  warehouse_id:
    description: The warehouse to use
  catalog:
    description: The catalog to use
  schema:
    description: The schema to use

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://company.databricks.com
    variables:
      warehouse_id: abcdef1234567890
      catalog: main
      schema: ${workspace.current_user.short_name}

Environment-Specific Configuration

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://company.databricks.com

  prod:
    mode: production
    workspace:
      host: https://company.databricks.com
    root_path: /Users/user@company.com/.bundle/${bundle.name}/${bundle.target}
    permissions:
      - user_name: user@company.com
        level: CAN_MANAGE
    run_as:
      user_name: user@company.com

dbt Integration Configuration

bundle:
  name: dbt_sql
  include:
    - resources/*.yml

targets:
  dev:
    mode: development
    workspace:
      host: https://company.databricks.com

Key configuration elements include:

Bundle Name and Includes: Define the project name and included resource files
Variables: Declare environment-specific variables
Targets: Configure different deployment environments (dev, prod)
Permissions: Set access controls and run-as users
Workspace Settings: Configure host and root paths
Mode: Set development or production deployment modes

Deploying Databricks Asset Bundles

Deploy bundles using the Databricks CLI v2:

databricks bundle deploy

For specific environments:

databricks bundle deploy --target prod

Best Practices for Using Databricks Asset Bundles

Use Git for Version Control: Store your bundles in a Git repository to track changes.
Leverage Environment Configurations: Maintain separate configuration files for dev, staging, and production.
Automate Deployments: Use CI/CD pipelines to deploy asset bundles seamlessly.
Monitor and Update Regularly: Ensure assets are updated and maintain compatibility with Databricks updates.

Using Databricks with Evidence

For teams looking to integrate Databricks with Evidence, Evidence provides a powerful way to create real-time reports and dashboards from Databricks. With Evidence, you can:

Visualize Databricks Data Pipelines with SQL-based reporting.
Automate Data Workflows using Databricks and Evidence.
Collaborate with Teams using version-controlled reporting and analytics.

Learn more about using Databricks with Evidence by visiting the Evidence documentation.