-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for running dagster in the cloud #49
Comments
For 2, we can use |
Also we could still track it from dagster using the ExternalAsset resource if we wanted to! |
Does this mean we have the option to run it somewhere else, but then Dagster can just track if the file is there are the end? |
I'm begining to like 2 and 6 more. Means we have one mission control, but can scale things to different platforms as we need to |
For reference on Planetary Computer, this is basically the script I have that processes EUMETSAT data and saves to huggingface: https://github.com/jacobbieker/planetary-datasets/blob/main/planetary_datasets/conversion/zarr/eumetsat.py |
The only things in the PC version is steps that install Satip in the VM |
what do you mean? |
I have a very slightly modified version of that script that just has this at the top of the file, to make it simpler to run the script there and include installing the missing dependencies in the VM. The VMs come with geospatial stuff already installed. """This gather and uses the global mosaic of geostationary satellites from NOAA on AWS"""
import subprocess
def install(package):
subprocess.check_call(["/srv/conda/envs/notebook/bin/python", "-m", "pip", "install", package])
install("datasets")
install("satip")
"""Convert EUMETSAT raw imagery files to Zarr"""
try:
import satip
except ImportError:
print("Please install Satip to continue") |
When We don't have enough internet bandwidth it might be worth thinking can we run things in the cloud.
Theres quite a few different options.
Planetary computers
. This is free to run. Resource 3.2 TB RAM + 400 dask cluster. Can start up vms too.Dataproc
to run jobs. Lots of different frameworks supportThe text was updated successfully, but these errors were encountered: