What Are Databricks Asset Bundles?
Databricks Asset Bundles (DABs) provide a structured way to package, version, and deploy data assets and their configurations within the Databricks ecosystem. They help organize resources and ensure consistent deployment across different environments, making it easier to manage MLOps and data engineering workflows.
Key Benefits of Databricks Asset Bundles
- Declarative Configuration: Define your workspace resources and their relationships in YAML files
- Environment Management: Easily manage configurations across development, staging, and production
- Version Control: Track changes and maintain history through Git integration
- CI/CD Integration: Seamlessly integrate with existing CI/CD pipelines
- Reproducibility: Ensure consistent deployment across different workspaces
How Do Databricks Asset Bundles Work?
Databricks Asset Bundles use declarative configuration files to define:
- Workflows (including tasks and schedules)
- Jobs
- MLflow Models
- Experiments
- Permissions
- Variables and Secrets
These configurations are defined in YAML files and can be deployed across different environments while maintaining environment-specific settings.
Creating a Databricks Asset Bundle
To create an asset bundle:
- Initialize the Bundle: Create a
databricks.yml
file in your project root - Define Resources: Specify workflows, jobs, and other resources in your configuration
- Set Variables: Configure environment-specific variables
- Deploy: Use the Databricks CLI to deploy your bundle
Example Bundle Structure
my_project/
│── databricks.yml # Main bundle configuration
│── resources/ # Resource definitions
│ ├── jobs.yml
│ ├── workflows.yml
│── src/ # Source code
│ ├── notebook1.py
│ ├── notebook2.py
│── conf/ # Environment configurations
├── dev.yml
├── prod.yml
Example Configuration
bundle:
name: my_project
include:
- resources/*.yml
variables:
env:
description: Environment name
default: dev
Example Bundle Configurations
Let’s look at several common configuration patterns for Databricks Asset Bundles:
Basic Bundle Configuration
bundle:
name: my_project
include:
- resources/*.yml
variables:
env:
description: Environment name
default: dev
SQL Warehouse Configuration
bundle:
name: default_sql
variables:
warehouse_id:
description: The warehouse to use
catalog:
description: The catalog to use
schema:
description: The schema to use
targets:
dev:
mode: development
default: true
workspace:
host: https://company.databricks.com
variables:
warehouse_id: abcdef1234567890
catalog: main
schema: ${workspace.current_user.short_name}
Environment-Specific Configuration
targets:
dev:
mode: development
default: true
workspace:
host: https://company.databricks.com
prod:
mode: production
workspace:
host: https://company.databricks.com
root_path: /Users/user@company.com/.bundle/${bundle.name}/${bundle.target}
permissions:
- user_name: user@company.com
level: CAN_MANAGE
run_as:
user_name: user@company.com
dbt Integration Configuration
bundle:
name: dbt_sql
include:
- resources/*.yml
targets:
dev:
mode: development
workspace:
host: https://company.databricks.com
Key configuration elements include:
- Bundle Name and Includes: Define the project name and included resource files
- Variables: Declare environment-specific variables
- Targets: Configure different deployment environments (dev, prod)
- Permissions: Set access controls and run-as users
- Workspace Settings: Configure host and root paths
- Mode: Set development or production deployment modes
Deploying Databricks Asset Bundles
Deploy bundles using the Databricks CLI v2:
databricks bundle deploy
For specific environments:
databricks bundle deploy --target prod
Best Practices for Using Databricks Asset Bundles
- Use Git for Version Control: Store your bundles in a Git repository to track changes.
- Leverage Environment Configurations: Maintain separate configuration files for dev, staging, and production.
- Automate Deployments: Use CI/CD pipelines to deploy asset bundles seamlessly.
- Monitor and Update Regularly: Ensure assets are updated and maintain compatibility with Databricks updates.
Using Databricks with Evidence
For teams looking to integrate Databricks with Evidence, Evidence provides a powerful way to create real-time reports and dashboards from Databricks. With Evidence, you can:
- Visualize Databricks Data Pipelines with SQL-based reporting.
- Automate Data Workflows using Databricks and Evidence.
- Collaborate with Teams using version-controlled reporting and analytics.
Learn more about using Databricks with Evidence by visiting the Evidence documentation.