Infer visualisations from field definition #637

lukavdplas · 2022-03-01T17:03:19Z

However, the current situation in which a Field defines the visualizations that apply to it is a combinatorial problem: the code surface grows with the product of the total number of fields and the number of available visualization types. Ideally, Field should not have this responsibility at all; instead, the frontend should automatically infer which visualizations can be applied to a field based on its data type (and possibly other features). This consideration also applies to filters.

Originally posted by @jgonggrijp in #633 (review)

The text was updated successfully, but these errors were encountered:

lukavdplas · 2022-03-02T16:10:15Z

Some thoughts on execution:

Generally speaking, the types of visualisations can be based on the field's es_mapping.type: timeline for date, histogram for keyword, and wordcloud, ngram and related words for text. Future visualisations may want to check for es_mapping.term_vector or es_mapping.fields.

What cannot be inferred is whether a field should be visualised/filtered at all. That is mostly based on whether we think it conveys interesting/relevant information. We could include boolean properties like include_in_visualizations and include_in_filters.

(Alternatively, the interface could be changed to that the user can visualise anything and everything. I don't think that would help user friendliness, but it's worth mentioning.)

In the case of related words (and any future visualisations based on word models), it would be necessary to indicate which field the models are trained on. Per #620, that is / will be done by listing 'related words' in its visualizations. That could be replaced with a boolean property like has_word_models.

A potential issue is when visualisations cannot handle any corpus size: either they are too slow for very large datasets, or they are meaningless for small ones.

lukavdplas · 2022-04-12T12:46:05Z

What cannot be inferred is whether a field should be visualised/filtered at all. That is mostly based on whether we think it conveys interesting/relevant information. We could include boolean properties like include_in_visualizations and include_in_filters.

New elasticsearch knowledge gained: setting doc_values: false in the mapping of a field is a way of saying the field will not be used for aggregations. We could check for this property in the frontend to see if a keyword/date field should be used in the histogram/timeline.

lukavdplas · 2022-12-16T11:37:05Z

Cf. #987

JeltevanBoheemen · 2024-02-01T15:54:02Z

What cannot be inferred is whether a field should be visualised/filtered at all. That is mostly based on whether we think it conveys interesting/relevant information. We could include boolean properties like include_in_visualizations and include_in_filters.

It is within the realm of imagination that Field properties can lead to multiple available visualisation types, especially seeing that these types keep expanding. Thus, only a boolean indicating if the field should be visualised at all will not suffice.

lukavdplas · 2024-02-01T16:10:50Z

Yeah, that's already true. A text content field can have visualizations=['wordcloud', 'ngram'], for example.

The idea was that that you would have a single toggle for all available visualisations on the field rather than toggling each individually - but I don't think that's so useful now.

The current status of this is that we have decent validation on whether the given visualisation types are valid, but we don't generate that list automatically. This issue is relevant for #982 - the relevance of is that you want to present a list of options in a corpus form, which is still relevant.

Regarding the combinatorial problem mentioned above: very long-term, #1340 is intended to address issues with expanding the number of visualisations. In the short term, I don't think the lists of 2-3 visualisation types are a problem, but we could just consider removing the option and presenting all available visualisations.

lukavdplas · 2024-05-22T09:23:32Z

See #987 - if you use the JSON api, visualisations are inferred. They still have to be listed for Python corpora, but that's appropriate since that format is meant to provide more detailed control.

As mentioned, I see #1340 as the long-term solution for some of the issues described here, but for now I think we can close this.

lukavdplas added the visualisation changes to visualisation features label Mar 2, 2022

lukavdplas mentioned this issue Mar 16, 2022

Improve visualisation menu #643

Closed

lukavdplas added this to the P&P other milestone Apr 21, 2022

BeritJanssen added the on hold label Aug 18, 2022

lukavdplas closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer visualisations from field definition #637

Infer visualisations from field definition #637

lukavdplas commented Mar 1, 2022

lukavdplas commented Mar 2, 2022

lukavdplas commented Apr 12, 2022

lukavdplas commented Dec 16, 2022

JeltevanBoheemen commented Feb 1, 2024

lukavdplas commented Feb 1, 2024

lukavdplas commented May 22, 2024

Infer visualisations from field definition #637

Infer visualisations from field definition #637

Comments

lukavdplas commented Mar 1, 2022

lukavdplas commented Mar 2, 2022

lukavdplas commented Apr 12, 2022

lukavdplas commented Dec 16, 2022

JeltevanBoheemen commented Feb 1, 2024

lukavdplas commented Feb 1, 2024

lukavdplas commented May 22, 2024