Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 60441a1
Author: Steven Goldenberg <[email protected]>
Date:   Thu May 23 11:59:21 2024 -0400

    Upload of basic Tensorflow model

    Still needs a unit test, but the model works well for relatively standard models. Subclassed models may need specialized modules.

commit b009362
Author: Steven Goldenberg <[email protected]>
Date:   Fri May 10 11:46:50 2024 -0400

    Squashed commit of the following:

    commit 7ff070a
    Author: Steven Goldenberg <[email protected]>
    Date:   Thu May 9 17:07:04 2024 -0400

        Format pandas_standard_scaler using black

        Unittests still work and changes seem to mostly be single quotes to double and removing spaces.

    commit dac7a31
    Author: Steven Goldenberg <[email protected]>
    Date:   Thu May 9 16:17:17 2024 -0400

        Make unittest for config IO

    commit 447a444
    Author: Steven Goldenberg <[email protected]>
    Date:   Thu May 9 15:35:09 2024 -0400

        Fix more documentation in pandas_standard_scaler

    commit 1a46420
    Author: Steven Goldenberg <[email protected]>
    Date:   Thu May 9 14:35:00 2024 -0400

        Update pandas_standard_scaler.py

        Adding lots of additional documentation. Some functions aren't fully documented, but this is a very good start.

    commit de25bd6
    Author: Steven Goldenberg <[email protected]>
    Date:   Thu May 9 13:21:29 2024 -0400

        Rename IO functions for yaml configurations

        [save/load]_config --> [save/load]_yaml_config to avoid confusion if we want another version for JSON or something else.

    commit a001284
    Author: Steven Goldenberg <[email protected]>
    Date:   Wed May 8 14:52:09 2024 -0400

        Simplify save/load config

        Saving and loading configurations are now done by utility functions in the utils folder. This simplifies the modules and allows for unit testing on the config I/O functions outside of any module.
        Utility functions try to properly handle FileNotFound and FileExists errors.

    commit 83f019f
    Merge: a3f758f c63e630
    Author: Steven Goldenberg <[email protected]>
    Date:   Thu May 2 09:25:19 2024 -0400

        Merge branch 'main' into 36-make-pandasstandardscaler-for-hugs

    commit a3f758f
    Author: Steven Goldenberg <[email protected]>
    Date:   Mon Apr 29 15:12:56 2024 -0400

        Fix reverse() and add unit test

    commit 79566d2
    Author: Steven Goldenberg <[email protected]>
    Date:   Mon Apr 29 14:53:35 2024 -0400

        Full implementation of scaler with unittests

        The implementation avoids using scikit-learn entirely for a number of reasons including no option for axis changes and no option for changing the epsilon value when dealing with small variances.

    commit ca21fcf
    Author: Steven Goldenberg <[email protected]>
    Date:   Fri Apr 26 17:35:18 2024 -0400

        Started implementation of PandasStandardScaler

        Using scikit-learn's StandardScaler as a base. Saving and loading is a bit tricky as the internal state of the scikit-learn implementation isn't easily saved. It looks like you can save it's internal __dict__ and then set the attributes on load, but I'm not sure if this is robust...

    commit 574d0ac
    Author: Steven Goldenberg <[email protected]>
    Date:   Fri Apr 26 16:17:59 2024 -0400

        Fix tab bug in pandas_standard_scaler.py

    commit 676d640
    Author: Steven Goldenberg <[email protected]>
    Date:   Fri Apr 26 16:07:31 2024 -0400

        Create pandas_standard_scaler.py

        Inital upload of a standard scaler that supports pandas dataframes as input. Maybe it should be renamed to DataFrame scaler to match the parser_to_dataframe.py file...

commit c63e630
Merge: eea09a5 ff117c0
Author: Steven Goldenberg <[email protected]>
Date:   Thu May 2 09:22:45 2024 -0400

    Merge pull request #35 from JeffersonLab/24-common-csv-parser

    24 common csv parser

commit ff117c0
Author: Steven Goldenberg <[email protected]>
Date:   Thu May 2 09:18:31 2024 -0400

    Delete csv_parser_v0.py

    The functionality of the CSV parser is being handled by the Parser2Dataframe now and is no longer used as the entrypoint to the registered CSVParser. Keeping this code will only make the repository more confusing.

commit e11864f
Author: Steven Goldenberg <[email protected]>
Date:   Thu May 2 09:09:36 2024 -0400

    Save_Load Unit Tests

    Also fixes a few documentation errors and the unittest logging (now -v turns on extra logging from the module)
  • Loading branch information
sgoldenCS committed May 23, 2024
1 parent e9bd796 commit a851c5d
Show file tree
Hide file tree
Showing 12 changed files with 673 additions and 143 deletions.
31 changes: 31 additions & 0 deletions jlab_datascience_toolkit/cfgs/base_model_cfg.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
model_config:
class_name: Sequential
config:
dtype: float32
layers:
- class_name: Dense
config:
activation: relu
units: 128
module: keras.layers
registered_name: null
- class_name: Dense
config:
units: 1
module: keras.layers
registered_name: null
name: basic_model
trainable: true
module: keras
registered_name: null
compile_config:
loss:
class_name: BinaryCrossentropy
config:
from_logits: true
module: keras.losses
registered_name: null
metrics:
- accuracy
optimizer: adam

128 changes: 0 additions & 128 deletions jlab_datascience_toolkit/data_parser/csv_parser_v0.py

This file was deleted.

31 changes: 17 additions & 14 deletions jlab_datascience_toolkit/data_parser/parser_to_dataframe.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from jlab_datascience_toolkit.core.jdst_data_parser import JDSTDataParser
from jlab_datascience_toolkit.utils.io import save_yaml_config, load_yaml_config
from pathlib import Path
import pandas as pd
import logging
Expand Down Expand Up @@ -30,8 +31,9 @@ class Parser2DataFrame(JDSTDataParser):
Format of files to parse. Currently supports csv, feather, json
and pickle. Defaults to csv
`read_kwargs: dict = {}`
Arguments to be passed
Arguments to be passed to the read function determined by `file_format`
`concat_kwargs: dict = {}`
Arguments to be passed to pd.concat()
Attributes
----------
Expand Down Expand Up @@ -118,11 +120,7 @@ def load(self, path: str):
path (str): Path to folder containing module files.
"""
base_path = Path(path)
with open(base_path.joinpath('config.yaml'), 'r') as f:
loaded_config = yaml.safe_load(f)

self.config.update(loaded_config)
self.setup()
self.load_config(base_path)

def save(self, path: str):
"""Save the entire module state to a folder at `path`
Expand All @@ -132,8 +130,7 @@ def save(self, path: str):
"""
save_dir = Path(path)
os.makedirs(save_dir)
with open(save_dir.joinpath('config.yaml'), 'w') as f:
yaml.safe_dump(self.config, f)
self.save_config(save_dir)

def load_data(self) -> pd.DataFrame:
""" Loads all files listed in `config['filepaths']`
Expand Down Expand Up @@ -165,13 +162,19 @@ def load_data(self) -> pd.DataFrame:

return output

def load_config(self, path: str):
parser_log.debug('Calling load()...')
return self.load(path)
def load_config(self, path: Path | str):
self.config.update(load_yaml_config(path))
self.setup()

def save_config(self, path: Path | str, overwrite=False):
""" Saves this modules configuration to the file specified by path
If path is a directory, we save the configuration as config.yaml
def save_config(self, path: str):
parser_log.debug('Calling save()...')
return self.save(path)
Args:
path (Path | str): Location for saved configuration. Either a filename or directory is
acceptable.
"""
save_yaml_config(self.config, path, overwrite)

def save_data(self):
return super().save_data()
4 changes: 4 additions & 0 deletions jlab_datascience_toolkit/data_prep/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@

from jlab_datascience_toolkit.data_prep.numpy_minmax_scaler import NumpyMinMaxScaler

register(
id = "PandasStandardScaler_v0",
entry_point="jlab_datascience_toolkit.data_prep.pandas_standard_scaler:PandasStandardScaler"
)
Loading

0 comments on commit a851c5d

Please sign in to comment.