Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide simple mechanism for adding icons to datasets #480

Closed
1 task done
datajoely opened this issue Jun 14, 2021 · 7 comments
Closed
1 task done

Provide simple mechanism for adding icons to datasets #480

datajoely opened this issue Jun 14, 2021 · 7 comments

Comments

@datajoely
Copy link
Contributor

Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

Users can label their dataset in the catalog and provide layers - but there is very little they can do to differentiate datasets beyond this from a visual perspective. Adding the facility to apply an icon from an existing library of icons would be an effective mechanism for making the pipeline visualisation a clearer and more efficient story-telling tool.

Context

Why is this change important to you? How would you use it? How can it benefit other users?

A simple example for where this would be useful would be to allow users to mark Excel datasources vs SQL datasources at a glance, even more so in the collapsed label-less view.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

On the YAML catalog side there could be an extra key for icon like so:

flight_times:
    type: pandas.CSVDataSet
    layer: raw
    load_args:
          sep: '|'
    icon: carbon-csv

This could pull in the following icon from the Carbon design system (by IBM): provided by the iconfiy framework which collects several open source icon libraries.
https://iconify.design/icon-sets/carbon/csv.html

By using the [iconfiy-react](https://github.com/iconify/iconify-react) library this would hopefully be a low effort addition

Checklist

  • Include labels so that we can categorise your feature request
@rashidakanchwala rashidakanchwala changed the title Provide simple mechanism for adding icons to datasets [KED-2725] Provide simple mechanism for adding icons to datasets Jun 23, 2021
@datajoely
Copy link
Contributor Author

I'm going to reopen this - since users on Dicsord have been asking for the same thing

@datajoely datajoely reopened this Aug 15, 2022
@tynandebold tynandebold moved this to Inbox in Kedro-Viz Aug 15, 2022
@tynandebold tynandebold changed the title [KED-2725] Provide simple mechanism for adding icons to datasets Provide simple mechanism for adding icons to datasets Aug 15, 2022
@nicolasboisseau
Copy link

Such a feature would be nice, not only at the dataset level, but also at the node level.

Personally when I make pipelines with kedro i try to make nodes responsible for only one main activity (that can be clearly explained to others). It would be nice to represent each node in kedro viz with an icon summarizing the main task the node is doing. e.g. clean, stack, join, filter, train, predict..

And naturally as you said in the issue title, it can be also interesting to identify the dataset types based on their icon, giving a hint to a data source origin : csv, sql, excel file, as a pipeline is often a mix of various dataset types..

Possibilities are quite numerous if a good icon collection can be provided!

@b4rlw
Copy link

b4rlw commented Aug 16, 2022

I would second this. Could help keep the pipeline maintainer right as well as improve comprehension among a non-technical audience. It should go without saying that it should not be compulsory, however. Perhaps could be toggled in Viz.

To play devil's advocate, I would also say that the wrong implementation could add unnecessary complexity.
Should Kedro be Laissez-faire about what icon packs can be used?
If so, would it still be easy in most cases to distinguish the difference between a node and a dataset?
Should the user be able to specify custom icons for their custom datasets?
If yes, would certain visualisations begin to look messy? If no, could things look incomplete?
Would the OCD among us end up spending more time customising icons than writing code? lol.

I think it's a great idea, but only with the right design choices. Users frequently say they like Kedro because it's opinionated, so whatever the implementation is, it should be congruent with that broader ethos.

That's my two pence anyways!

@tynandebold
Copy link
Member

We do support this pattern already for image datasets on Kedro so users can differentiate images and know to click on them. It would be great to infer which icons to use based on what Kedro dataset was chosen so that we reduce complexity for end users and so they won't have to spend a lot of time digging through an icon pack figuring out what icon to use.

For now we've marked this as a minor priority because our current scope of work consists of working on experiment tracking and increasing the general adoption of Viz.

@tynandebold tynandebold moved this from Inbox to Backlog in Kedro-Viz Aug 22, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Aug 24, 2022

I like this idea although agree it's not a high priority.

In terms of implementation, to me this is part of a more general question about how to add custom properties to datasets. In this case it actually goes beyond just datasets since @nicolasboisseau said you might like to add an icon to a node, which could obviously not be done by adding a new attribute in catalog.yml. But the question of custom attributes for datasets keeps coming up and we should figure it out. It's particularly relevant for kedro-viz but really a more general kedro problem. My comment from #907:

The question of adding custom properties to datasets comes up quite a bit, e.g. #662 (put number of rows in dataset on kedro-viz), https://github.com/quantumblacklabs/private-kedro/issues/1148 (add metadata to catalog entries than can be consumed by plugins), kedro-org/kedro#1076 (very long-standing issue on how to add metadata to catalog entries). This is not just limited to kedro-viz but there's a more general kedro question of how to attach metadata to a catalog entry.

Also, just for completeness, I mooted the idea of a new viz.yml configuration file in #903 and #907. This is probably the right approach for the implementation here since it would cater for datasets and nodes. In practical terms, solving the question of custom properties is also quickest and easiest to do on the kedro-viz side without needing a general kedro solution which might take a long time.

@tynandebold
Copy link
Member

I'm going to close this again. We're considering an idea in which we give a more robust set of icons based on datasets and the like, though not anything that could ever be chosen by users. See here #1148.

Repository owner moved this from Backlog to Done in Kedro-Viz Oct 31, 2022
@NeroOkwa
Copy link
Contributor

Context

This feature request was also an output from the Kedro-Viz adoption synthesis #987

Users want to be able to tag datasets, and these tags are inherited by the node. This would allow their team to then tag in the catalog, and then the data scientists can understand better where certain datasets with these tags flow.

Supporting quotes

  • “Providing additional tags, especially because we could then use that within our data life cycle management and Azure in order to kind of get rid of intermediate data sets in experiment runs that we don't need anymore”.
  • “Ideally then we could also filter on certain dataset tags in the kedro visualisation to just see where do we have a type of information coming from, and which pipelines are affected by that. So when we look at like the raw data layer and we want to just see data pipelines that use datasets that are tagged in a specific way. That could be incredibly useful”.
    • “So what I'm wondering here is, and that might be missing something if these tags are coming from the nodes and the data sets inherit them. I know that that works, but I'm more thinking about the other way around, I don't know if you can tag datasets and then the nodes inherit from that”.
    • “That would allow the data sourcing team to then tag in the catalog, and then the data scientists then understands better where certain datasets with these tags flow”.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

6 participants