Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose all settings in ydata-profiling ProfileReport #2921

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions plugins/flytekit-deck-standard/dev-requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ pandas
plotly
pygments
ydata-profiling
lxml
15 changes: 12 additions & 3 deletions plugins/flytekit-deck-standard/flytekitplugins/deck/renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,21 @@ class FrameProfilingRenderer:
Generate a ProfileReport based on a pandas DataFrame
"""

def __init__(self, title: str = "Pandas Profiling Report"):
def __init__(self, title: Optional[str] = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API wise, there are now two places to set kwargs for ydata_profiling.ProfileReport, in __init__ (for only title) and in to_html.

By looking at the Renderable protocol, it looks like the design was to have configuration passed in __init__, because the to_html to restricted to one input:

https://github.com/flyteorg/flytekit/blob/master/flytekit/deck/renderer.py#L17-L23

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that, but I don't think we quite adhering to the protocol everywhere, for example the pandera renderer.

This is an indication that all invocations of to_html happen in user-code, so the protocol is "just" a convention? Can we make it more strict (and if so, I think we should add kwargs to the protocol).

self._title = title

def to_html(self, df: "pd.DataFrame") -> str:
def to_html(self, df: "pd.DataFrame", **kwargs) -> str:
"""
Generate a ydata_profiling.ProfileReport based on a pandas DataFrame
"""
assert isinstance(df, pd.DataFrame)
profile = ydata_profiling.ProfileReport(df, title=self._title)
if kwargs is None:
kwargs = {}
# For backwards compatibility, if title is None, it will be set to "Pandas Profiling Report".
# Also, if title in kwargs, it will overwrite the title in the constructor.
if "title" not in kwargs:
kwargs["title"] = "Pandas Profiling Report" if self._title is None else self._title
profile = ydata_profiling.ProfileReport(df, **kwargs)
return profile.to_html()


Expand Down
4 changes: 2 additions & 2 deletions plugins/flytekit-deck-standard/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@
extras = {
"pandas": ["pandas"],
"pillow": ["pillow"],
"ydata-profiling": ["ydata-profiling"],
"ydata-profiling": ["ydata-profiling>=2.4.0"],
"markdown": ["markdown"],
"plotly": ["plotly"],
"pygments": ["pygments"],
"all": ["pandas", "pillow", "ydata-profiling", "markdown", "plotly", "pygments"],
"all": ["pandas", "pillow", "ydata-profiling>=2.4.0", "markdown", "plotly", "pygments"],
}

__version__ = "0.0.0+develop"
Expand Down
26 changes: 23 additions & 3 deletions plugins/flytekit-deck-standard/tests/test_renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
TableRenderer,
)
from PIL import Image
from lxml import html

from flytekit.types.file import FlyteFile, JPEGImageFile, PNGImageFile

Expand All @@ -31,10 +32,29 @@
)


def test_frame_profiling_renderer():
renderer = FrameProfilingRenderer()
assert "Pandas Profiling Report" in renderer.to_html(df).title()
@pytest.mark.parametrize(
"title, kwargs, expected_title",
[
(None, {}, "Pandas Profiling Report"),
("Custom Title", {}, "Custom Title"),
(None, {"title": "from-kwargs"}, "from-kwargs"),
(None, {"minimal": False}, "Pandas Profiling Report"),
("Custom Title", {"minimal": False}, "Custom Title"),
(None, {"minimal": True}, "Pandas Profiling Report"),
("Custom Title", {"minimal": True}, "Custom Title"),
("Custom Title", { "minimal": True}, "Custom Title"),
("Custom Title", {"missing_diagrams":{"heatmap": False}}, "Custom Title"),
# Test that title in kwargs takes precedence over title in constructor
("title in constructor", {"title": "title in kwargs"}, "title in kwargs")
],
)
def test_frame_profiling_renderer(title, kwargs, expected_title):
fpr_kwargs = {"title": title} if title else {}
renderer = FrameProfilingRenderer(**fpr_kwargs)
profile_report = renderer.to_html(df, **kwargs)

tree = html.fromstring(profile_report)
assert expected_title == tree.xpath('//title/text()')[0]

def test_markdown_renderer():
md_text = "#Hello Flyte\n##Hello Flyte\n###Hello Flyte"
Expand Down
Loading