You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
discussions at the prism sprint planning suggests that azure data lake is what microsoft recommend for data storage to feed into azure ML. Currently we're using Blob storage, which is apparently less performant. Should be easy to:
copy some data into a data lake
create a data store to point to the data lake (or relevant subset of contents)
create a dataset from the datastore (this should be exactly the same as the datastore abstracts away where the data is actually coming from)
create a notebook to compare data loading performance for the two options
STRETCH: it might be interesting to try and implement using azure data lake analytics (ADLA ) (equivalent to AWS athena) to do the querying of our tabular data files, and compare getting data directly through this compared to getting from a AML dataset. Also might be interesting ton think about a lazy loading strcuture backed by ADLA e.g. a zarr
The text was updated successfully, but these errors were encountered:
discussions at the prism sprint planning suggests that azure data lake is what microsoft recommend for data storage to feed into azure ML. Currently we're using Blob storage, which is apparently less performant. Should be easy to:
The text was updated successfully, but these errors were encountered: