Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer visualisations from field definition #637

Closed
lukavdplas opened this issue Mar 1, 2022 · 6 comments
Closed

Infer visualisations from field definition #637

lukavdplas opened this issue Mar 1, 2022 · 6 comments
Labels
on hold visualisation changes to visualisation features
Milestone

Comments

@lukavdplas
Copy link
Contributor

However, the current situation in which a Field defines the visualizations that apply to it is a combinatorial problem: the code surface grows with the product of the total number of fields and the number of available visualization types. Ideally, Field should not have this responsibility at all; instead, the frontend should automatically infer which visualizations can be applied to a field based on its data type (and possibly other features). This consideration also applies to filters.

Originally posted by @jgonggrijp in #633 (review)

@lukavdplas
Copy link
Contributor Author

Some thoughts on execution:

Generally speaking, the types of visualisations can be based on the field's es_mapping.type: timeline for date, histogram for keyword, and wordcloud, ngram and related words for text. Future visualisations may want to check for es_mapping.term_vector or es_mapping.fields.

What cannot be inferred is whether a field should be visualised/filtered at all. That is mostly based on whether we think it conveys interesting/relevant information. We could include boolean properties like include_in_visualizations and include_in_filters.

(Alternatively, the interface could be changed to that the user can visualise anything and everything. I don't think that would help user friendliness, but it's worth mentioning.)

In the case of related words (and any future visualisations based on word models), it would be necessary to indicate which field the models are trained on. Per #620, that is / will be done by listing 'related words' in its visualizations. That could be replaced with a boolean property like has_word_models.

A potential issue is when visualisations cannot handle any corpus size: either they are too slow for very large datasets, or they are meaningless for small ones.

@lukavdplas lukavdplas added the visualisation changes to visualisation features label Mar 2, 2022
@lukavdplas
Copy link
Contributor Author

What cannot be inferred is whether a field should be visualised/filtered at all. That is mostly based on whether we think it conveys interesting/relevant information. We could include boolean properties like include_in_visualizations and include_in_filters.

New elasticsearch knowledge gained: setting doc_values: false in the mapping of a field is a way of saying the field will not be used for aggregations. We could check for this property in the frontend to see if a keyword/date field should be used in the histogram/timeline.

@lukavdplas lukavdplas added this to the P&P other milestone Apr 21, 2022
@lukavdplas
Copy link
Contributor Author

Cf. #987

@JeltevanBoheemen
Copy link
Contributor

What cannot be inferred is whether a field should be visualised/filtered at all. That is mostly based on whether we think it conveys interesting/relevant information. We could include boolean properties like include_in_visualizations and include_in_filters.

It is within the realm of imagination that Field properties can lead to multiple available visualisation types, especially seeing that these types keep expanding. Thus, only a boolean indicating if the field should be visualised at all will not suffice.

@lukavdplas
Copy link
Contributor Author

Yeah, that's already true. A text content field can have visualizations=['wordcloud', 'ngram'], for example.

The idea was that that you would have a single toggle for all available visualisations on the field rather than toggling each individually - but I don't think that's so useful now.

The current status of this is that we have decent validation on whether the given visualisation types are valid, but we don't generate that list automatically. This issue is relevant for #982 - the relevance of is that you want to present a list of options in a corpus form, which is still relevant.

Regarding the combinatorial problem mentioned above: very long-term, #1340 is intended to address issues with expanding the number of visualisations. In the short term, I don't think the lists of 2-3 visualisation types are a problem, but we could just consider removing the option and presenting all available visualisations.

@lukavdplas
Copy link
Contributor Author

See #987 - if you use the JSON api, visualisations are inferred. They still have to be listed for Python corpora, but that's appropriate since that format is meant to provide more detailed control.

As mentioned, I see #1340 as the long-term solution for some of the issues described here, but for now I think we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on hold visualisation changes to visualisation features
Projects
None yet
Development

No branches or pull requests

3 participants