New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add rolling-statistics to Projects #32

Open

jameno wants to merge 70 commits into master from jm/rolling-statistics

Contributor

jameno commented Oct 19, 2018

New pull-request branch for the rolling statistics project

ed-nykaza and others added 3 commits

October 18, 2018 12:23


          setting -2 for endIndex default to the length of the unique donors

05b07d2


          add missing argument to checkIndex

77ef21d


          Add rolling-statistics to Projects

67527e4

New pull-request branch for the rolling statistics project

jameno requested a review from ed-nykaza

October 19, 2018 19:19

ed-nykaza and others added 17 commits

October 23, 2018 10:59


          add batch processing for local time estimate with multicores


          create a place holder when a download in progress

0a29e7c


          run on multiple cores

cf0679a


          add the processing time to the script

59a0804


          check track of run time

4ff76a1


          keep track of run time

526dbed


          make anonymizing optional and chance input to a string

6d647f8


          sort columns by alpha order, then <>.<>, and then fields w embedded json

737ffb6


          keep track of processing time and clean up code and output

fafc768


          Merge pull request #33 from tidepool-org/etn/opt-for-aws

230c93b

Etn/opt for aws


          initial commit

5e90969


          Initial commit of node-data-tools

97dc327


          Refactor processing into functions

5da5d25


          Refactor xlsxStreamWriter into function

4c931d7


          merge on "id" since "id" is unique per each row of data

58806a0

this fixes as bug where the scheduleNames were being appended to data types that did not have that data, because the merge was happening on "time" instead of "id." In other words, if the another data type happened at the same "time" as the scheduleName, then the schedule data was appended to that row of data as well.


          Merge pull request #34 from tidepool-org/etn/fix-scheduleName-merge-bug

d08970f

Etn/fix schedule name merge bug


          Further refactoring to export a library

c5691a6

ed-nykaza suggested changes

View reviewed changes

Contributor

ed-nykaza left a comment

@jameno great work! lots of minor/picking things in here, which are not a reflection of the general good quality of your work. I am happy to discuss if you want.

projects/rolling-statistics/rolling_statistics.py Outdated

+              Created: 9/15/2018
+              author: Jason Meno
+              dependencies:
+                  * requires Tidepool user's data with est.localTime

Contributor

ed-nykaza Oct 19, 2018

I understand why you made this a requirement, but think that this could be made more general by having the user specify which time field to use. I suggest adding this feature to your TODO list, rather than adding the feature now.

projects/rolling-statistics/rolling_statistics.py Outdated


		TODO:

		-Add to summary statistics:

Contributor

ed-nykaza Oct 19, 2018

I think that we should talk about what we would like our template to be for these types of comments, so that all of our code looks the same, AND I think that we should create a template that we can use, and other contributors can follow. In my code, I created a bulleted list with "*" like you did in line 12. Though, I am not suggesting that we do things my way, but rather that we agree upon a way, add it to a template, and that we both follow our rules/template.

projects/rolling-statistics/rolling_statistics.py

@@ @@ -0,0 +1,699 @@ @@
+              #!/usr/bin/env python3
+              # -*- coding: utf-8 -*-
+              # pylint: disable=C0301

Contributor

ed-nykaza Oct 19, 2018

this didn't work in the spyder editor for me. did it work for you? let's talk about about this.

projects/rolling-statistics/rolling_statistics.py Outdated

+              -Vectorize round5minutes function
+              -Verify basal suspends are correctly implemented
+              """
+              # %% REQUIRED LIBRARIES

Contributor

ed-nykaza Oct 19, 2018

I believe that PEP suggests two blank lines between sections

projects/rolling-statistics/rolling_statistics.py Outdated

+              import argparse
+              import time
+              # %% USER INPUTS

Contributor

ed-nykaza Oct 19, 2018

two blank lines

projects/rolling-statistics/rolling_statistics.py Outdated

+                  rolling_df = pd.DataFrame(index=np.arange(len(df)))
+                  rolling_df["est.localTime_rounded"] = df["est.localTime_rounded"]
+                  # Loop through rolling stats for each time prefix
+                  for i in range(0, len(rolling_prefixes)):

Contributor

ed-nykaza Oct 30, 2018

ah, don't be like me! 😉 you already used variable "i" above

projects/rolling-statistics/rolling_statistics.py Outdated

+                                                          )))
+                  # Set number of points per rolling window
+                  rolling_points = np.array(pd.Series(args.rollingWindow).map(rolling_dictionary))

Contributor

ed-nykaza Oct 30, 2018

replace args.rollingWindow with rolling_prefixes

projects/rolling-statistics/rolling_statistics.py Outdated

+                      # get estimated HbA1c or Glucose Management Index (GMI)
+                      # GMI(%) = 3.31 + 0.02392 x [mean glucose in mg/dL]
+                      # https://www.jaeb.org/gmi/
+                      rolling_df[rolling_prefixes[i]+"_cgm_eA1c"] = 3.31 + (0.02392*rolling_df[rolling_prefixes[i]+"_cgm_mean"])

Contributor

ed-nykaza Oct 30, 2018

let's call this GMI

projects/rolling-statistics/rolling_statistics.py

+                  below54_dur = 5*rle_below54[0][np.where((rle_below54[2] == True) & (rle_below54[0] >= 3))]
+                  df["event-below54"] = False
+                  df.loc[below54_loc, "event-below54"] = True
+                  df["dur-below54"] = 0

Contributor

ed-nykaza Oct 30, 2018

when specifying durations, I find that including the units in the variable name or colHeading can be useful. Good to know that we are talking about minutes here.

projects/rolling-statistics/rolling_statistics.py

+                  daily_df = daily_df.at_time(daytime_start)
+                  # Move time back 6 hours so that each row represents the appropriate day
+                  daily_df.index = daily_df.index-dt.timedelta(hours=6)

Contributor

ed-nykaza Oct 30, 2018

I would change this so that it is only giving the date, not date time. However, I would also include the start dateTime and the end dateTime to be specific about what time interval you are talking about (i.e., 2017-01-01 06:00 is the start dateTime and 2017-01-02 05:00 is the endTime for the day = 2017-01-01).

Lennart Goedhart added 8 commits

October 31, 2018 10:07


          Shouldn't commit local settings

de8dafd


          Count affected records during processing, not writing

91aa777


          Code cleanup

24af3a0


          Prepare for npm publishing, add command line utility

7b2a031

- Add README
- Add .npmignore
- Add command line tool


          Give node-data-tools its own .gitignore

e5309cc


          Don't ship .eslintrc in npm module

8d00240


          Add command pathway to cli

6fcf59f


          Update README to match CLI changes

563769f

ed-nykaza and others added 29 commits

January 25, 2019 09:25


          example (WIP) using parser


          addtional parsing

2eb7162


          Merge remote-tracking branch 'origin/rpw/loop-report-parser' into rpw…

5fa04d1

…/loop-report-parser


          fixed error in carb store and check for section in dictionary prior t…

d167462

…o parsing


          refactored to return one dictionary and ability to parse multiple fil…

9ef2fab

…es at once.


          added basal profile

5e4ceb4


          updated with premeal, workout and suspendthresholdunit

d1c70b3


          minor updates on naming

289d187


          updated insulin_sensitivity_factor_schedule to match the output style…

ceb9c4c

… of basal_rate_schedule


          fixed the insulin_sensitivity_factor_timeZone bug

fd649f1


          converted carb_ratio_schedule to a list of dicts

957d6f7


          removed commented code

7becfbd


          refactor example and save json

bf66861


          updated strings to floats

b719e60


          test files

eaa9acb


          fixed error in carb


          test loop report

f159f50


          added check for file and directory

4c22140


          added invalid directory test

946316f


          updated error messages

3f8add6


          update example to export csv file

f06e26e


          include index in output and the same file name for input and output

066904b


          minor refactoring

cf148d4


          Merge pull request #46 from tidepool-org/rpw/loop-report-parser

931dc40

Rpw/loop report parser


          Updated Rolling Statistics Code Framework

427bb5a

Refactored code for rolling statistics


          Merge remote-tracking branch 'origin/jm/rolling-statistics' into jm/r…

306f9c9

…olling-statistics


          Update README

4bf7c92

Remove metrics argument


          Update README

bd5b3c7

Formatting


          Update README

21e4d37

Added filename input details

ed-nykaza requested a review from rpwils

April 9, 2019 17:45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet