Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explainers, and minor datastore implements #29

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

connermanuel
Copy link
Collaborator

Two things:

  • Adds LIME explainability to machine learning modules
  • Implement DS on fairness module

TODO:

  • Currently the behavior of the datastore is inconsistent, depending on whether or not you have Hadoop installed (Pyspark was looking for local files in HDFS by default on my machine).
  • Right now, I have the featurizer use the datastore data for its unweighted data, but to weight the data we use a trick where we make w copies of each record for w in weights. This makes a copy of the dataframe which is stored in memory. Ideally we wouldn't have to make an entirely new copy of the dataframe (and one with duplicated rows at that) because this would consume a lot of memory.
    (Let me know if you'd like me to make issues for these)

Copy link
Collaborator

@LucioMelito LucioMelito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Two things:

  • It would be great if you could write a bit more in the docstrings, using the reStructuredText format; this will allow us to populate the API docs automatically
  • We started using pytest for testing instead of notebooks, although we only have them for the datastore at the moment; if you want to try and have tests like that for fairness it'd be great, but no pressure

Also go ahead and add those two issues.

@@ -82,7 +91,8 @@ def __init__(self, cfg_dir: str, spark: bool = True):
self.mobiledata: SparkDataFrame
self.mobilemoney: SparkDataFrame
self.antennas: SparkDataFrame
self.shapefiles: Union[Dict[str, GeoDataFrame]] = {}
# TODO: was this supoposed to be a TypedDict?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's supposed to be the dictionary of shapefiles for tower voronois and/or admin units. I think the union part is a mistake, and we can change Dict to Mapping

@@ -29,7 +29,8 @@ def __init__(self,
clean_folders: bool = False) -> None:
self.cfg = datastore.cfg
self.ds = datastore
self.outputs = datastore.outputs + 'featurizer/'
self.outputs = datastore.outputs + '/featurizer/'
self.spark_outputs = datastore.spark_outputs + '/featurizer/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we should do is change all these statements to os.path.join - feel free to do it or we can leave it for later

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants