Replies: 2 comments 6 replies
-
cc @jsignell |
Beta Was this translation helpful? Give feedback.
-
scatter will, by default, create a copy on every worker. How many workers there are depends on how many you are requesting. If you do not specify anything this depends on the number of CPUs your machine has. In general, This is not what you are asking for but unless you have a very specific reason to use scatter (imho, most users use this by accident) a better approach might be import pandas
from distributed.client import Client
c = Client()
def generate_dataframe():
return pandas.DataFrame([1, 2, 3])
df_fut = c.submit(generate_dataframe)
df2 = c.gather(f)
def foo(df):
do_stuff_with_df(df)
c.submit(foo, df_fut) I can also recommend looking into dask.delayed which is a more high-level interface for a similar set of features. |
Beta Was this translation helpful? Give feedback.
-
Hi guys,
I would like to find out how many data copies are occurred when getting an object from distributed memory. An example:
Would
df2
be a copy ofdf
at the main process anddf3
at the worker process (2 copies)?Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions