Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for wordbatch.py and update for batcher.py #15

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions wordbatch/batcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def split_batches(self, data, minibatch_size= None):
return data_split

def merge_batches(self, data):
"""Merge a list of data minibatches into one single instance representing the data
"""Merge a list of data minibatches into one single data instance

Parameters
----------
Expand All @@ -137,7 +137,12 @@ def merge_batches(self, data):

def parallelize_batches(self, task, data, args, method=None, timeout=-1, rdd_col= 1, input_split=False,
merge_output= True, minibatch_size= None, procs=None):
"""
"""Apply a specified function/task to the data specified in parallel

Data will be splitted into mini-batches unless explicitly declared not to
do so. Then workers will apply the function to the mini-batches in parallel.
More specifically, every single time in every single process/thread, there is
exactly one worker applying the function to exactly one mini-batch

Parameters
----------
Expand Down Expand Up @@ -170,7 +175,7 @@ def parallelize_batches(self, task, data, args, method=None, timeout=-1, rdd_col

merge_output: boolean, default True
If True, results from minibatches will be reduced into one single instance before return.

minibatch_size: int
Expected size of each mini-batch to individually perform task on. The actual sizes will be
the same as the specified value except the last mini-batch, whose size might be exactly the same
Expand Down
Loading