Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH, MAINT] Refactor directed-undirected graph class #72

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pywhy_graphs/classes/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from . import timeseries
from .admg import ADMG
from .cpdag import CPDAG
from .diungraph import CG, CPDAG
from .intervention import IPAG, AugmentedGraph, PsiPAG
from .pag import PAG
from .timeseries import (
Expand Down
240 changes: 170 additions & 70 deletions pywhy_graphs/classes/cpdag.py → pywhy_graphs/classes/diungraph.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,67 +8,8 @@
from .base import AncestralMixin, ConservativeMixin


class CPDAG(pywhy_nx.MixedEdgeGraph, AncestralMixin, ConservativeMixin):
"""Completed partially directed acyclic graphs (CPDAG).

CPDAGs generalize causal DAGs by allowing undirected edges.
Undirected edges imply uncertainty in the orientation of the causal
relationship. For example, ``A - B``, can be ``A -> B`` or ``A <- B``,
allowing for a Markov equivalence class of DAGs for each CPDAG.

Parameters
----------
incoming_directed_edges : input directed edges (optional, default: None)
Data to initialize directed edges. All arguments that are accepted
by `networkx.DiGraph` are accepted.
incoming_undirected_edges : input undirected edges (optional, default: None)
Data to initialize undirected edges. All arguments that are accepted
by `networkx.Graph` are accepted.
directed_edge_name : str
The name for the directed edges. By default 'directed'.
undirected_edge_name : str
The name for the directed edges. By default 'undirected'.
attr : keyword arguments, optional (default= no attributes)
Attributes to add to graph as key=value pairs.

See Also
--------
networkx.DiGraph
networkx.Graph
pywhy_graphs.ADMG
pywhy_graphs.networkx.MixedEdgeGraph

Notes
-----
CPDAGs are Markov equivalence class of causal DAGs. The implicit assumption in
these causal graphs are the Structural Causal Model (or SCM) is Markovian, inducing
causal sufficiency, where there is no unobserved latent confounder. This allows CPDAGs
to be learned from score-based (such as the "GES" algorithm) and constraint-based
(such as the PC algorithm) approaches for causal structure learning.

One should not use CPDAGs if they suspect their data has unobserved latent confounders.

**Edge Type Subgraphs**

The data structure underneath the hood is stored in two networkx graphs:
``networkx.Graph`` and ``networkx.DiGraph`` to represent the non-directed
edges and directed edges. Non-directed edges in an CPDAG can be present as
undirected edges standing for uncertainty in which directino the directed
edge is in.

- Directed edges (<-, ->, indicating causal relationship) = `networkx.DiGraph`
The subgraph of directed edges may be accessed by the
`CPDAG.sub_directed_graph`. Their edges in networkx format can be
accessed by `CPDAG.directed_edges` and the corresponding name of the
edge type by `CPDAG.directed_edge_name`.
- Undirected edges (--, indicating uncertainty) = `networkx.Graph`
The subgraph of undirected edges may be accessed by the
`CPDAG.sub_undirected_graph`. Their edges in networkx format can be
accessed by `CPDAG.undirected_edges` and the corresponding name of the
edge type by `CPDAG.undirected_edge_name`.

By definition, no cycles may exist due to the directed edges.
"""
class DiUnGraph(pywhy_nx.MixedEdgeGraph, AncestralMixin):
""" """

def __init__(
self,
Expand All @@ -85,15 +26,6 @@ def __init__(
self._directed_name = directed_edge_name
self._undirected_name = undirected_edge_name

from pywhy_graphs import is_valid_mec_graph

# check that construction of PAG was valid
is_valid_mec_graph(self)

# extended patterns store unfaithful triples
# these can be used for conservative structure learning algorithm
self._unfaithful_triples: Dict[FrozenSet[Node], None] = dict()

@property
def undirected_edge_name(self) -> str:
"""Name of the undirected edge internal graph."""
Expand Down Expand Up @@ -184,6 +116,90 @@ def possible_parents(self, n: Node) -> Iterator[Node]:
"""
return self.sub_undirected_graph().neighbors(n)


class CPDAG(DiUnGraph, ConservativeMixin):
"""Completed partially directed acyclic graphs (CPDAG).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is worth adding a note or warning even to the classes that validity is not guaranteed. Please use the validity check if you need to rigorously check that your construction is valid?

Same in chaingraph? Maybe also in the ADMG, and PAG too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good idea. I will address just CPDAG and Chain Graph in this PR but will look to redo the other classes in another PR.


CPDAGs generalize causal DAGs by allowing undirected edges.
Undirected edges imply uncertainty in the orientation of the causal
relationship. For example, ``A - B``, can be ``A -> B`` or ``A <- B``,
allowing for a Markov equivalence class of DAGs for each CPDAG.

Parameters
----------
incoming_directed_edges : input directed edges (optional, default: None)
Data to initialize directed edges. All arguments that are accepted
by `networkx.DiGraph` are accepted.
incoming_undirected_edges : input undirected edges (optional, default: None)
Data to initialize undirected edges. All arguments that are accepted
by `networkx.Graph` are accepted.
directed_edge_name : str
The name for the directed edges. By default 'directed'.
undirected_edge_name : str
The name for the directed edges. By default 'undirected'.
attr : keyword arguments, optional (default= no attributes)
Attributes to add to graph as key=value pairs.

See Also
--------
networkx.DiGraph
networkx.Graph
pywhy_graphs.ADMG
pywhy_graphs.networkx.MixedEdgeGraph

Notes
-----
CPDAGs are Markov equivalence class of causal DAGs. The implicit assumption in
these causal graphs are the Structural Causal Model (or SCM) is Markovian, inducing
causal sufficiency, where there is no unobserved latent confounder. This allows CPDAGs
to be learned from score-based (such as the "GES" algorithm) and constraint-based
(such as the PC algorithm) approaches for causal structure learning.

One should not use CPDAGs if they suspect their data has unobserved latent confounders.

**Edge Type Subgraphs**

The data structure underneath the hood is stored in two networkx graphs:
``networkx.Graph`` and ``networkx.DiGraph`` to represent the non-directed
edges and directed edges.

- Directed edges (<-, ->, indicating causal relationship) = `networkx.DiGraph`
The subgraph of directed edges may be accessed by the
`CPDAG.sub_directed_graph`. Their edges in networkx format can be
accessed by `CPDAG.directed_edges` and the corresponding name of the
edge type by `CPDAG.directed_edge_name`.
- Undirected edges (--, indicating uncertainty) = `networkx.Graph`
The subgraph of undirected edges may be accessed by the
`CPDAG.sub_undirected_graph`. Their edges in networkx format can be
accessed by `CPDAG.undirected_edges` and the corresponding name of the
edge type by `CPDAG.undirected_edge_name`.

By definition, no cycles may exist due to the directed edges.
"""

def __init__(
self,
incoming_directed_edges=None,
incoming_undirected_edges=None,
directed_edge_name: str = "directed",
undirected_edge_name: str = "undirected",
**attr,
):
super().__init__(
incoming_directed_edges=incoming_directed_edges,
incoming_undirected_edges=incoming_undirected_edges,
directed_edge_name=directed_edge_name,
undirected_edge_name=undirected_edge_name,
)
from pywhy_graphs import is_valid_mec_graph

# check that construction of PAG was valid
is_valid_mec_graph(self)

# extended patterns store unfaithful triples
# these can be used for conservative structure learning algorithm
self._unfaithful_triples: Dict[FrozenSet[Node], None] = dict()

def add_edge(self, u_of_edge, v_of_edge, edge_type="all", **attr):
from pywhy_graphs.algorithms.generic import _check_adding_cpdag_edge

Expand All @@ -200,3 +216,87 @@ def add_edges_from(self, ebunch_to_add, edge_type, **attr):
self, u_of_edge=u_of_edge, v_of_edge=v_of_edge, edge_type=edge_type
)
return super().add_edges_from(ebunch_to_add, edge_type, **attr)


class CG(DiUnGraph):
"""Chain Graphs (CG).

Chain graphs represent a generalization of DAGs and undirected graphs.
Undirected edges ``A - B`` in a chain graph represent a symmetric association of
two variables due to processes such as dynamic feedback (where ``A``
influences ``B`` and vice versa) or an artefact of selection bias (where the selection
of the sample induces association between ``A`` and ``B``) [1]_.


The implementation supports representation of both Lauritzen-Wermuth-Frydenberg (LWF)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. Maybe citations?

and Andersen-Madigan-Perlman (AMP) chain graphs.


Parameters
----------
incoming_directed_edges : input directed edges (optional, default: None)
Data to initialize directed edges. All arguments that are accepted
by `networkx.DiGraph` are accepted.
incoming_undirected_edges : input undirected edges (optional, default: None)
Data to initialize undirected edges. All arguments that are accepted
by `networkx.Graph` are accepted.
directed_edge_name : str
The name for the directed edges. By default 'directed'.
undirected_edge_name : str
The name for the directed edges. By default 'undirected'.
attr : keyword arguments, optional (default= no attributes)
Attributes to add to graph as key=value pairs.

References
----------
.. [1] Lauritzen, Steffen L., and Thomas S. Richardson. "Chain
graph models and their causal interpretations." Journal of the
Royal Statistical Society: Series B (Statistical Methodology)
64.3 (2002): 321-348.




See Also
--------
networkx.DiGraph
networkx.Graph
pywhy_graphs.ADMG
pywhy_graphs.networkx.MixedEdgeGraph

Notes
-----
**Edge Type Subgraphs**

The data structure underneath the hood is stored in two networkx graphs:
``networkx.Graph`` and ``networkx.DiGraph`` to represent the non-directed
edges and directed edges.

- Directed edges (<-, ->, indicating causal relationship) = `networkx.DiGraph`
The subgraph of directed edges may be accessed by the
`CG.sub_directed_graph`. Their edges in networkx format can be
accessed by `CG.directed_edges` and the corresponding name of the
edge type by `CG.directed_edge_name`.
- Undirected edges (--, indicating uncertainty) = `networkx.Graph`
The subgraph of undirected edges may be accessed by the
`CG.sub_undirected_graph`. Their edges in networkx format can be
accessed by `CG.undirected_edges` and the corresponding name of the
edge type by `CG.undirected_edge_name`.

By definition, no cycles may exist due to the directed edges.
"""

def __init__(
self,
incoming_directed_edges=None,
incoming_undirected_edges=None,
directed_edge_name: str = "directed",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whitespace around " = "

undirected_edge_name: str = "undirected",
**attr,
):
super().__init__(
incoming_directed_edges=incoming_directed_edges,
incoming_undirected_edges=incoming_undirected_edges,
directed_edge_name=directed_edge_name,
undirected_edge_name=undirected_edge_name,
)