Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the statistics of hypothesis testing #135

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

260147169
Copy link

Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.

Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.
@jraffa
Copy link
Collaborator

jraffa commented Oct 11, 2022

Thanks for the idea and the PR. A couple suggestions:

using the README.md example:


import pandas as pd
data=load_dataset('pn2012')
columns = ['Age', 'SysABP', 'Height', 'Weight', 'ICU', 'death']
categorical = ['ICU', 'death']
groupby = ['death']
nonnormal = ['Age']
labels={'death': 'mortality'}
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=True,test_stat=True))

Works, but the table could use a little cleanup:

                         Grouped by mortality
                                      Missing           Overall                 0                 1 Test-stat P-Value
n                                                          1000               864               136
Age, median [Q1,Q3]                         0  68.0 [53.0,79.0]  66.0 [52.8,78.0]  75.0 [62.0,83.0]    23.882  <0.001
SysABP, mean (SD)                         291      114.3 (40.2)      115.4 (38.3)      107.6 (49.4)     1.510   0.134
Height, mean (SD)                         475      170.1 (22.1)      170.3 (23.2)      168.5 (11.3)     1.030   0.304
Weight, mean (SD)                         302       82.9 (23.8)       83.0 (23.6)       82.3 (25.4)     0.277   0.782
ICU, n (%)          CCU                     0        162 (16.2)        137 (15.9)         25 (18.4)    20.093  <0.001
                    CSRU                             202 (20.2)        194 (22.5)           8 (5.9)    20.093
                    MICU                             380 (38.0)        318 (36.8)         62 (45.6)    20.093
                    SICU                             256 (25.6)        215 (24.9)         41 (30.1)    20.093
mortality, n (%)    0                       0        864 (86.4)       864 (100.0)                     991.508  <0.001
                    1                                136 (13.6)                         136 (100.0)   991.508

There is some redundancy wrt to the Test-stat column. There should only be one test-stat, as p-value is done.

Changing pval to False breaks it:

 mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 424, in __init__
    self.cat_table = self._create_cat_table(data, overall)
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 1348, in _create_cat_table
    table = table.join(self._htest_table[['Test-stat']])
AttributeError: 'TableOne' object has no attribute '_htest_table'

I also don't think the Fisher test is handled appropriately. There really isn't a test stat for it, so it should be blank, but I believe it reports the Chisq's test statistic and the Fisher p-value:

td = pd.DataFrame({'a':[0,0,0,1]*10 + [1],'b':[1,1,1,1]*10 + [0]})
TableOne(td,columns=['a','b'],categorical=['a','b'],pval=True,groupby="b",test_stat=True)
           Grouped by b
                Missing    Overall          0           1 Test-stat P-Value
n                               41          1          40
a, n (%) 1            0  11 (26.8)  1 (100.0)   10 (25.0)     0.280   0.268
         0               30 (73.2)              30 (75.0)     0.280
b, n (%) 0            0    1 (2.4)  1 (100.0)                 9.744   0.024
         1               40 (97.6)             40 (100.0)     9.744

I think t-test, ANOVA, MW, and KW all have test-stats. @tompollard are there any other tests we should worry about. I don't think the mode test is reported like this, so it should be safe.

@260147169
Copy link
Author

Thanks so mush for collaborator's @jraffa and owner's @tompollard help and suggestion.
The update contains the following:

1.After cleaning up redundancy. There will be one test-stat, as p-value.
The code is same as above. The results are as follow:

Missing Overall 0 1 Test-stat P-Value
n 1000 864 136
Age, median [Q1,Q3] 0 68.0 [53.0,79.0] 66.0 [52.8,78.0] 75.0 [62.0,83.0] 23.882 <0.001
SysABP, mean (SD) 291 114.3 (40.2) 115.4 (38.3) 107.6 (49.4) 1.510 0.134
Height, mean (SD) 475 170.1 (22.1) 170.3 (23.2) 168.5 (11.3) 1.030 0.304
Weight, mean (SD) 302 82.9 (23.8) 83.0 (23.6) 82.3 (25.4) 0.277 0.782
ICU, n (%) CCU 0 162 (16.2) 137 (15.9) 25 (18.4) 20.093 <0.001
CSRU 202 (20.2) 194 (22.5) 8 (5.9)
MICU 380 (38.0) 318 (36.8) 62 (45.6)
SICU 256 (25.6) 215 (24.9) 41 (30.1)
mortality, n (%) 0 0 864 (86.4) 864 (100.0) 991.508 <0.001
1 136 (13.6) 136 (100.0)

2.When pval=False, it will not break.

mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)

Missing Overall 0 1 Test-stat
n 1000 864 136
Age, median [Q1,Q3] 0 68.0 [53.0,79.0] 66.0 [52.8,78.0] 75.0 [62.0,83.0] 23.882
SysABP, mean (SD) 291 114.3 (40.2) 115.4 (38.3) 107.6 (49.4) 1.510
Height, mean (SD) 475 170.1 (22.1) 170.3 (23.2) 168.5 (11.3) 1.030
Weight, mean (SD) 302 82.9 (23.8) 83.0 (23.6) 82.3 (25.4) 0.277
ICU, n (%) CCU 0 162 (16.2) 137 (15.9) 25 (18.4) 20.093
CSRU 202 (20.2) 194 (22.5) 8 (5.9)
MICU 380 (38.0) 318 (36.8) 62 (45.6)
SICU 256 (25.6) 215 (24.9) 41 (30.1)
mortality, n (%) 0 0 864 (86.4) 864 (100.0) 991.508
1 136 (13.6) 136 (100.0)

3.Fisher's test doesn't calculate statistics. The test_stat of Fisher's test is set to None. And the warning message will prompt the users.

Missing Overall 0 1 Test-stat
n 41 1 40
a, n (%) 1 0 11 (26.8) 1 (100.0) 10 (25.0) nan
0 30 (73.2) 30 (75.0)
b, n (%) 0 0 1 (2.4) 1 (100.0) nan
1 40 (97.6) 40 (100.0)

[1] Fisher's test did not caompute statistics of hypothesis testing. The following variables are affected: a, b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants