LEARN

Databricks Asset Bundles: A Complete Guide

A guide to Databricks Asset Bundles, how they work, and how to use them for consistent data workflows.

Archie Sarre Wood
Archie Sarre Wood

What Are Databricks Asset Bundles?

Databricks Asset Bundles (DABs) provide a structured way to package, version, and deploy data assets and their configurations within the Databricks ecosystem. They help organize resources and ensure consistent deployment across different environments, making it easier to manage MLOps and data engineering workflows.

Key Benefits of Databricks Asset Bundles

  • Declarative Configuration: Define your workspace resources and their relationships in YAML files
  • Environment Management: Easily manage configurations across development, staging, and production
  • Version Control: Track changes and maintain history through Git integration
  • CI/CD Integration: Seamlessly integrate with existing CI/CD pipelines
  • Reproducibility: Ensure consistent deployment across different workspaces

How Do Databricks Asset Bundles Work?

Databricks Asset Bundles use declarative configuration files to define:

  • Workflows (including tasks and schedules)
  • Jobs
  • MLflow Models
  • Experiments
  • Permissions
  • Variables and Secrets

These configurations are defined in YAML files and can be deployed across different environments while maintaining environment-specific settings.

Creating a Databricks Asset Bundle

To create an asset bundle:

  1. Initialize the Bundle: Create a databricks.yml file in your project root
  2. Define Resources: Specify workflows, jobs, and other resources in your configuration
  3. Set Variables: Configure environment-specific variables
  4. Deploy: Use the Databricks CLI to deploy your bundle

Example Bundle Structure

my_project/
│── databricks.yml      # Main bundle configuration
│── resources/          # Resource definitions
│   ├── jobs.yml
│   ├── workflows.yml
│── src/               # Source code
│   ├── notebook1.py
│   ├── notebook2.py
│── conf/              # Environment configurations
    ├── dev.yml
    ├── prod.yml

Example Configuration

bundle:
  name: my_project
  include:
    - resources/*.yml

variables:
  env:
    description: Environment name
    default: dev

Example Bundle Configurations

Let’s look at several common configuration patterns for Databricks Asset Bundles:

Basic Bundle Configuration

bundle:
  name: my_project
  include:
    - resources/*.yml

variables:
  env:
    description: Environment name
    default: dev

SQL Warehouse Configuration

bundle:
  name: default_sql

variables:
  warehouse_id:
    description: The warehouse to use
  catalog:
    description: The catalog to use
  schema:
    description: The schema to use

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://company.databricks.com
    variables:
      warehouse_id: abcdef1234567890
      catalog: main
      schema: ${workspace.current_user.short_name}

Environment-Specific Configuration

targets:
  dev:
    mode: development
    default: true
    workspace:
      host: https://company.databricks.com

  prod:
    mode: production
    workspace:
      host: https://company.databricks.com
    root_path: /Users/user@company.com/.bundle/${bundle.name}/${bundle.target}
    permissions:
      - user_name: user@company.com
        level: CAN_MANAGE
    run_as:
      user_name: user@company.com

dbt Integration Configuration

bundle:
  name: dbt_sql
  include:
    - resources/*.yml

targets:
  dev:
    mode: development
    workspace:
      host: https://company.databricks.com

Key configuration elements include:

  • Bundle Name and Includes: Define the project name and included resource files
  • Variables: Declare environment-specific variables
  • Targets: Configure different deployment environments (dev, prod)
  • Permissions: Set access controls and run-as users
  • Workspace Settings: Configure host and root paths
  • Mode: Set development or production deployment modes

Deploying Databricks Asset Bundles

Deploy bundles using the Databricks CLI v2:

databricks bundle deploy

For specific environments:

databricks bundle deploy --target prod

Best Practices for Using Databricks Asset Bundles

  1. Use Git for Version Control: Store your bundles in a Git repository to track changes.
  2. Leverage Environment Configurations: Maintain separate configuration files for dev, staging, and production.
  3. Automate Deployments: Use CI/CD pipelines to deploy asset bundles seamlessly.
  4. Monitor and Update Regularly: Ensure assets are updated and maintain compatibility with Databricks updates.

Using Databricks with Evidence

For teams looking to integrate Databricks with Evidence, Evidence provides a powerful way to create real-time reports and dashboards from Databricks. With Evidence, you can:

  • Visualize Databricks Data Pipelines with SQL-based reporting.
  • Automate Data Workflows using Databricks and Evidence.
  • Collaborate with Teams using version-controlled reporting and analytics.

Learn more about using Databricks with Evidence by visiting the Evidence documentation.

Get Started with Evidence

Build performant data apps using SQL and markdown

Join industry leaders version controlling their reporting layer

Start Free Trial →