Databricks: Pass Parameters To Notebooks With Python
So, you're diving into the world of Databricks and want to figure out how to pass parameters to your notebooks using Python? You've come to the right place! Passing parameters is super useful for making your notebooks dynamic and reusable. Let's break down exactly how to do this, making it easy to understand and implement.
Why Pass Parameters to Databricks Notebooks?
Before we dive into the how-to, let's quickly chat about why you'd even want to do this. Imagine you have a notebook that processes data, but you want to run it on different datasets or with different configurations each time. Instead of creating multiple notebooks (which would be a maintenance nightmare), you can pass parameters to control the notebook's behavior. This way, you're keeping things DRY (Don't Repeat Yourself) and making your workflow much more efficient.
Passing parameters helps you achieve:
- Reusability: Use the same notebook for different scenarios.
- Flexibility: Adjust the notebook's behavior without modifying the code.
- Automation: Integrate notebooks into automated workflows with varying inputs.
So, if you want to level up your Databricks game and make your notebooks more powerful, learning how to pass parameters is a must!
Methods for Passing Parameters
There are a couple of ways to pass parameters to a Databricks notebook using Python. We'll cover two common methods:
- Using
dbutils.widgets - Using
%runcommand
Method 1: Using dbutils.widgets
The dbutils.widgets utility is specifically designed for creating interactive widgets within Databricks notebooks. These widgets can then be used as parameters in your code.
Step 1: Create Widgets
First, you need to create widgets in your notebook. You can create different types of widgets like text boxes, dropdowns, and more. Here’s how to create a text widget:
dbutils.widgets.text("input_param", "", "Enter a value:")
In this code:
"input_param"is the name of the widget (and the parameter you'll use later).""is the default value (empty string in this case)."Enter a value:"is the label that will be displayed next to the widget.
You can also create other types of widgets. For example, a dropdown widget:
dbutils.widgets.dropdown("dropdown_param", "option1", ["option1", "option2", "option3"], "Select an option:")
Here:
"dropdown_param"is the name of the dropdown widget."option1"is the default selected option.["option1", "option2", "option3"]is the list of available options."Select an option:"is the label for the dropdown.
Step 2: Access Widget Values
Once you've created your widgets, you can access their values using dbutils.widgets.get():
input_value = dbutils.widgets.get("input_param")
dropdown_value = dbutils.widgets.get("dropdown_param")
print(f"Input Value: {input_value}")
print(f"Dropdown Value: {dropdown_value}")
This code retrieves the values entered or selected in the widgets and stores them in variables. Now you can use these variables in your notebook code.
Step 3: Use Parameters in Your Code
Now that you have the parameter values, you can use them in your data processing or analysis.
# Example: Filtering a DataFrame based on the input parameter
import pandas as pd
# Sample DataFrame
data = {
'ID': [1, 2, 3, 4, 5],
'Value': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)
# Filter the DataFrame based on the input value
filtered_df = df[df['Value'] > int(input_value)]
print("Filtered DataFrame:")
print(filtered_df)
In this example, we're filtering a Pandas DataFrame based on the value entered in the input_param widget. Remember to convert the input value to the appropriate data type (e.g., int() or float()) if necessary.
Complete Example
Here's a complete example of how to use dbutils.widgets to pass parameters and process data:
# Create a text widget
dbutils.widgets.text("threshold", "5", "Enter a threshold value:")
# Get the value from the widget
threshold_value = int(dbutils.widgets.get("threshold"))
# Sample data (replace with your actual data)
data = {
'ID': [1, 2, 3, 4, 5],
'Value': [10, 20, 30, 4, 50]
}
# Create a Pandas DataFrame
df = spark.createDataFrame(pd.DataFrame(data))
# Filter the DataFrame based on the threshold value
filtered_df = df.filter(df["Value"] > threshold_value)
# Show the filtered DataFrame
display(filtered_df)
This code snippet creates a text widget named threshold, retrieves its value, and then filters a DataFrame based on that value. The display() function is used to show the resulting DataFrame in Databricks.
Method 2: Using %run command
Another way to pass parameters to a Databricks notebook is by using the %run command. This command allows you to execute another notebook from within your current notebook and pass parameters during the execution.
Step 1: Define Parameters in the Called Notebook
In the notebook that you want to call (the