[ENH] Add fisher z test #7

adam2392 · 2023-04-11T14:26:11Z

Towards #5

Changes proposed in this pull request:

Adds partial correlation test
Setsup initial API design
Includes sphinx docs

bloebp

It shouldn't be part of this PR, but since it is still in a setup period, can we fix the build issues?

pywhy_stats/fisher_z_test.py

tests/test_fisherz_test.py

bloebp · 2023-04-11T15:16:32Z

Can you also extend
https://github.com/py-why/pywhy-stats/blob/main/pywhy_stats/independence.py#L15
accordingly? For now, the AUTO would default to the fisher test.

Regarding the discussions before, lets try to introduce some pattern for the naming. Here for instance:
fisherz_indep and fisherz_condindep (maybe we can discuss a better naming pattern).

adam2392 · 2023-04-11T15:39:59Z

Regarding the discussions before, lets try to introduce some pattern for the naming. Here for instance: fisherz_indep and fisherz_condindep (maybe we can discuss a better naming pattern).

Quick question regarding this now that I'm trying this out. FisherZ test that I have rn nicely implements this with the condition_on being optional. In the two function case, I actually have to split up the function.

What if we have a decorator pattern internally, or tags pattern, so that way a function can be marked where condition_on is supported or not?

bloebp · 2023-04-11T16:12:37Z

Regarding the discussions before, lets try to introduce some pattern for the naming. Here for instance: fisherz_indep and fisherz_condindep (maybe we can discuss a better naming pattern).

Quick question regarding this now that I'm trying this out. FisherZ test that I have rn nicely implements this with the condition_on being optional. In the two function case, I actually have to split up the function.

What if we have a decorator pattern internally, or tags pattern, so that way a function can be marked where condition_on is supported or not?

I was more thinking of a very simple solution like:

def fisherz_indep(X, Y, ...):
   return fisherz(X,Y, ...)

def fisherz_condindep(X, Y, condition_on, ...):
   return fisherz(X,Y, condition_on,...)

The original fisherz implementation is still in there, no need to modify or split that.

Just so that one can always expect the pattern:
module_indep and module_conindep

We can of course still discuss the "indep" and "conindep" name.

adam2392 · 2023-04-11T20:14:03Z

I like the design for a few reasons:

It is clean to read fisherz.ind and fisherz.condind
consistent API

I have a few issues tho and hopefully we can figure out a way around them now before committing:

we cannot type check high-level independence_test
how to properly document the API? We either have a file-level documentation for the whole set of functions inside a file (similar to networkx), or we document each function separately. The cons of documenting each function separately is that there's a lot of repetition when looking at *.ind vs *.condind. It's pretty much exactly the same. The pros of documenting each separately are that we might have a bunch of tests without the *.condind ability.

Some thoughts on an alternative API design proposal

I'm trying to see if the following minor change would help us get the best of both worlds? If instead fisherz(...) was just a function that handles independence and optionally conditional independence, then we could have the following:

class Methods(Enum):
    """Methods for independence testing."""

    AUTO = 0

    # this is no longer a module, but a function
    FISHERZ = fisherz

def independence_test(
    X: ArrayLike,
    Y: ArrayLike,
    condition_on: Optional[ArrayLike] = None,
    method=Methods.AUTO,
    **kwargs,
) -> PValueResult:
    """Perform a (conditional) independence test to determine whether X and Y are independent.

    The test may be conditioned on an optional set of variables. This is, test whether ``X _||_ Y |
    condition_on``, where the null hypothesis is that X and Y are independent.

    Parameters
    ----------
    X : ArrayLike, shape (n_samples, n_features_x)
        Data matrix for X.
    Y : ArrayLike, shape (n_samples, n_features_y)
        Data matrix for Y.
    condition_on : ArrayLike or None, shape (n_samples, n_features_z), optional
        Data matrix for the conditioning variables. If None is given, an unconditional test
        is performed.
    method : Methods, optional
        Independence test method from the Methods enum. Default is Methods.AUTO, which will
        automatically select an appropriate method.
    **kwargs : dict or None, optional
        Additional keyword arguments to be passed to the specific test method

    Returns
    -------
    result : PValueResult
        An instance of the PValueResult data class, containing the p-value, test statistic,
        and any additional information related to the independence test.

    See Also
    --------
    fisherz : Fisher's Z test for independence
    """
    method_func: Callable[[ArrayLike, ArrayLike, Optional[ArrayLike]], PValueResult]
    if method == Methods.AUTO:
        method_func = Methods.FISHERZ
    else:
        method_func = method

    # do some error-checking
    if method_func = Methods.FisherZ:
          pval = scipy.stats.normaltest(X, Y, ...)
          if pval < 0.05: raise warning(your data is not all normal but you're trying to use partial correlation);

    return method_func(X, Y, condition_on, method, **kwargs)

Then, this independence_test(...) is just a place where we can do high-level checks when possible and provide an easy interface with default options. It is also simplified without the if/else check.

fisherz is a one-stop shop for documentation for partial correlation, rather than a two-stop shop with fisherz.ind and fisherz.condind.

All hypothesis tests that do not support condition_on would just raise a runtime error.

Signed-off-by: Adam Li <[email protected]>

bloebp · 2023-04-11T22:02:24Z

For the ind and condind methods, that's right, we would have some (significant) overlap for the documentation. I would probably leave the general explanation on the file-level and just have a shorter one in the actual methods, like "Applies fisher z test. See blub for more details." and then only list the parameters (where copy and paste for them is fine). This is definitely not optimal, but I guess one would rather look on the documentation site to get more details of a method instead of reading the documentation in the python file. Just the parameters or "unusual" things should be flagged there that are not typical for the type of test.

Regarding the high level API, using callables there would be fine for me too, although we most likely still end up having if/elifs if we do want to perform certain checks or preprocessings depending on the methods. Also, having Callables there makes it a bit more difficult to pass kwargs and have a consistent typing at the same time. If you just call the function directly, you don't run into that issue. Otherwise, I really like where this is going with the additional test in case of fisherz for instance. This high level API is really meant for users who want to throw some data in a method and get a p-value. They could also fully configure the tests there, but it is more meant as a simplified helper function that mostly relies on the default parameters of the underlying method. For more advanced users, they would directly go to e.g. the fisher test and there you have the full API documentation.

Regarding the type checking of the high level API, I wouldn't bother. Just pass it to the methods, which should raise an error accordingly. One thing that is also possible, which we e.g. do in DoWhy, you just make sure that you transform all columns to something numeric (e.g., apply one hot encoding to categorical data).

pywhy_stats/fisherz.py

robertness

Cut and dry.

sampan501 · 2023-04-13T17:55:02Z

Curious why you call Z in the conditional tests as condition_on and not just Z?

bloebp · 2023-04-13T18:04:20Z

Curious why you call Z in the conditional tests as condition_on and not just Z?

To make it more explicit that these are the variables we condition on. It also reads clearer as independence(X, Y, condition_on=Z)

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-04-14T22:26:03Z

Kay, I made some fixes and an attempt to document the "file" instead of functions. I honestly have never done that before, so we prolly need to make sure this looks right so we don't have documentation issues later on.

@bloebp do you have a suggestion on how to format the documentation, so it's relatively nice and easy to read?

Signed-off-by: Adam Li <[email protected]>

codecov-commenter · 2023-04-15T20:01:46Z

Codecov Report

Merging #7 (a6ca007) into main (d2667af) will increase coverage by 75.40%.
The diff coverage is 69.38%.

@@            Coverage Diff             @@
##            main       #7       +/-   ##
==========================================
+ Coverage   0.00%   75.40%   +75.40%     
==========================================
  Files          3        4        +1     
  Lines         18       61       +43     
  Branches       0        9        +9     
==========================================
+ Hits           0       46       +46     
+ Misses        18       14        -4     
- Partials       0        1        +1

Impacted Files	Coverage Δ
pywhy_stats/independence.py	`50.00% <39.13%> (+50.00%)`	⬆️
pywhy_stats/fisherz.py	`95.65% <95.65%> (ø)`
pywhy_stats/pvalue_result.py	`100.00% <100.00%> (ø)`

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

adam2392 · 2023-04-15T22:42:09Z

https://output.circle-artifacts.com/output/job/8095bed2-db5a-4e08-9944-07198eeb1498/artifacts/0/dev/index.html

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-04-16T19:33:33Z

https://output.circle-artifacts.com/output/job/d321692f-73ab-4fa1-bb19-c054a5085f42/artifacts/0/dev/generated/pywhy_stats.Methods.html#pywhy_stats.Methods

bloebp

Looks good! Just some minor comments

pywhy_stats/fisherz.py

pywhy_stats/independence.py

tests/test_fisherz_test.py

Signed-off-by: Adam Li <[email protected]>

adam2392 · 2023-04-19T13:39:58Z

Addressed comments, I will let you merge if happy @bloebp

adam2392 requested a review from bloebp April 11, 2023 14:26

bloebp reviewed Apr 11, 2023

View reviewed changes

pywhy_stats/fisher_z_test.py Outdated Show resolved Hide resolved

tests/test_fisherz_test.py Outdated Show resolved Hide resolved

adam2392 added 8 commits April 11, 2023 17:13

Add fisher z test

6603e3d

Signed-off-by: Adam Li <[email protected]>

Add fisher z

26f96de

Signed-off-by: Adam Li <[email protected]>

Fix

5422bea

Signed-off-by: Adam Li <[email protected]>

Fix circle

bdc9099

Signed-off-by: Adam Li <[email protected]>

Fix unit tests

e75a4a8

Signed-off-by: Adam Li <[email protected]>

Fix test and build

d7b8307

Signed-off-by: Adam Li <[email protected]>

Fix API DOC

f38361b

Signed-off-by: Adam Li <[email protected]>

Fix typing

8dd21c3

Signed-off-by: Adam Li <[email protected]>

adam2392 force-pushed the fisherz branch from c7db6d6 to 8dd21c3 Compare April 11, 2023 21:13

robertness reviewed Apr 13, 2023

View reviewed changes

pywhy_stats/fisherz.py Show resolved Hide resolved

robertness previously approved these changes Apr 13, 2023

View reviewed changes

Try it out

d6c573b

Signed-off-by: Adam Li <[email protected]>

adam2392 dismissed robertness’s stale review via d6c573b April 14, 2023 22:21

Fix

20f6492

Signed-off-by: Adam Li <[email protected]>

adam2392 added 2 commits April 14, 2023 19:53

Fix unit tests?

3593499

Signed-off-by: Adam Li <[email protected]>

Fix the circleCI docs

212b43e

Signed-off-by: Adam Li <[email protected]>

Fix

e6dfcad

Signed-off-by: Adam Li <[email protected]>

adam2392 requested a review from bloebp April 15, 2023 23:00

adam2392 requested a review from robertness April 15, 2023 23:01

adam2392 enabled auto-merge (squash) April 15, 2023 23:02

Fix linkcheck

a6ca007

Signed-off-by: Adam Li <[email protected]>

bloebp reviewed Apr 19, 2023

View reviewed changes

Address comments

3075046

Signed-off-by: Adam Li <[email protected]>

adam2392 requested a review from bloebp April 19, 2023 13:39

bloebp approved these changes Apr 19, 2023

View reviewed changes

adam2392 merged commit c45bbb8 into py-why:main Apr 19, 2023

adam2392 deleted the fisherz branch April 19, 2023 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add fisher z test #7

[ENH] Add fisher z test #7

adam2392 commented Apr 11, 2023 •

edited

Loading

bloebp left a comment

bloebp commented Apr 11, 2023

adam2392 commented Apr 11, 2023

bloebp commented Apr 11, 2023

adam2392 commented Apr 11, 2023 •

edited

Loading

bloebp commented Apr 11, 2023 •

edited

Loading

robertness left a comment

sampan501 commented Apr 13, 2023

bloebp commented Apr 13, 2023

adam2392 commented Apr 14, 2023

codecov-commenter commented Apr 15, 2023 •

edited

Loading

adam2392 commented Apr 15, 2023

adam2392 commented Apr 16, 2023

bloebp left a comment

adam2392 commented Apr 19, 2023

[ENH] Add fisher z test #7

[ENH] Add fisher z test #7

Conversation

adam2392 commented Apr 11, 2023 • edited Loading

bloebp left a comment

Choose a reason for hiding this comment

bloebp commented Apr 11, 2023

adam2392 commented Apr 11, 2023

bloebp commented Apr 11, 2023

adam2392 commented Apr 11, 2023 • edited Loading

Some thoughts on an alternative API design proposal

bloebp commented Apr 11, 2023 • edited Loading

robertness left a comment

Choose a reason for hiding this comment

sampan501 commented Apr 13, 2023

bloebp commented Apr 13, 2023

adam2392 commented Apr 14, 2023

codecov-commenter commented Apr 15, 2023 • edited Loading

Codecov Report

adam2392 commented Apr 15, 2023

adam2392 commented Apr 16, 2023

bloebp left a comment

Choose a reason for hiding this comment

adam2392 commented Apr 19, 2023

adam2392 commented Apr 11, 2023 •

edited

Loading

adam2392 commented Apr 11, 2023 •

edited

Loading

bloebp commented Apr 11, 2023 •

edited

Loading

codecov-commenter commented Apr 15, 2023 •

edited

Loading