Defining your first asset
In this course, you’ll use data from NYC OpenData to analyze New York City taxi rides. The first asset you’ll define uses data from TLC Trip Record Data, which contains trip records for several types of vehicles. However, we’ll focus on trip data for yellow cabs in this asset.
Your first asset, which you’ll name taxi_trips_file
, will retrieve the yellow taxi trip data for March 2023 and save it to a location on your local machine.
Project structure
Before we write our first asset, let's talk a little about project structures in Dagster. In the previous lesson we mentioned dg
and how it offers a lot of helpful functionality to quickstart our project. We can use commands like dg scaffold project
to initialize a uv
virtual environment for us but we already took care of that when we set up the course in lesson 2.
However we can use dg
to scaffold a file for our first asset. Run the following command to create the file that will contain our first asset.
dg scaffold dagster.asset assets/trips.py
This will add a trips.py
file to our Dagster project.
.
└── dagster_essentials
└── defs
└── assets
├── __init__.py
├── constants.py # already present
└── trips.py
Note: If we were starting a project from scratch we would use dg init
which will handle the creation of our virtual environment. However since we already have a virtual environment defined, we can skip this step.
Using dg
to scaffold your project will ensure that files are placed in the correct location. We can ensure that everything is configured correctly also using dg
.
> dg check defs
No definitions are defined for this project.
This command will confirm that our project is laid out correctly. Next we can use dg
to list all the objects in our project.
> dg list defs
No definitions are defined for this project.
This makes sense because even though we created the file that will contain our asset, we have not yet included the code.
Defining your first asset
With the files set we can now add our first asset.
Navigate and open the newly created
defs/assets/trips.py
file in your Dagster project. This is where you’ll write your asset code.Within the
trips.py
file, remove the generated code from the scaffolding and replace it with the following imports:import requests from dagster_essentials.defs.assets import constants
Below the imports, let's define a function that takes no inputs and returns nothing (type-annoted with
None
). Add the following code to create a function to do this namedtaxi_trips_file
:def taxi_trips_file() -> None: """ The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal. """ month_to_fetch = '2023-03' raw_trips = requests.get( f"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_{month_to_fetch}.parquet" ) with open(constants.TAXI_TRIPS_TEMPLATE_FILE_PATH.format(month_to_fetch), "wb") as output_file: output_file.write(raw_trips.content)
To turn the function into an asset in Dagster, you’ll need to do two things:
Import the Dagster library:
import dagster as dg
Add the
@dg.asset
decorator before thetaxi_trips_file
function. At this point, your code should look like this:import requests from dagster_essentials.defs.assets import constants import dagster as dg @dg.asset def taxi_trips_file() -> None: """ The raw parquet files for the taxi trips dataset. Sourced from the NYC Open Data portal. """ month_to_fetch = '2023-03' raw_trips = requests.get( f"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_{month_to_fetch}.parquet" ) with open(constants.TAXI_TRIPS_TEMPLATE_FILE_PATH.format(month_to_fetch), "wb") as output_file: output_file.write(raw_trips.content)
That’s it - you’ve created your first Dagster asset! Using the @dg.asset
decorator, you can easily turn any existing Python function into a Dagster asset.
We can use dg
again to check our asset:
> dg check defs
No definitions are defined for this project.
And now when we run dg list defs
our asset will register:
> dg list defs
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ taxi_trips_file │ default │ │ │ The raw parquet files for the taxi trips dataset. Sourced │
│ │ │ │ │ from the NYC Open Data portal. │
└─────────────────┴─────────┴──────┴───────┴──────────────────────────────────────────────────────────────┘
Questions about the -> None
bit? That's a Python feature called type annotation. In this case, it's saying that the function returns nothing. You can learn more about type annotations in the Python documentation. We highly recommend using type annotations in your code to make it easier to read and understand.