Skip to content

HowTo InfluxDB Tags and Fields explained

Niels Korschinsky edited this page Aug 26, 2021 · 1 revision

HowTo: InfluxDB Tags and Fields explained

This page contains a little summary of the influxdb wiki and some practical informations on how to group data we query into fields and tags to effectivly query them later on.

General notice

Tags vs. Fields

Tags and Fields are very similar. Effectivly you can say, tags are enhanced versions of fields -> they also get indexed while fields do not. This allows us to perform more action on tags as on fields:

  • You can select on fields, like duration > 10
  • You can not delete on fields, only on tags. where COMMAND='all'
  • In grafana only fields are supported nativly to be displayed as data

Note: Due some tricks you can also display tags, but that is a little bit messy.

Fields

This is the data, on which basis you want to create a series/measurement: It is relevant to display a graph, table etc, it does change over time and is the reason why you truly want to capture this measurement.

The data categorized as fields should have a statistical use and contain changing data, which is not used to identify the row itself.

Field may have the following datatypes:

  • Float (default)
  • Int (Appended i, automatically by sppmon if declared as int)
  • String

Tags

Those are information which do not have any statistical use on the first hand, but are used to group the data into different classes. Those should not be changed over time and be only a handfull -> eg hostAddress, Type, Status, etc. This could be the version id, where you can show the data for each version individually.

Tags are always indexed, due this fact only a maximum of 100.000 Tags are recommended by influx. If you are going to reach this limit by any means, that tag should probably be a field.

Tags are the ones, which you want to select on: Show xx where Tag = Special value or Show xx group by Tag

time is always a tag

Example are on our measurement storages

We have the following table:

time free hostAddress isReady name pct_free pct_used site storageId total type used version
24.04.2020 20:47 6,65417E+12 cetvm77 TRUE cetvm77 75,64933144 24,35066856 3104 2108 8,79607E+12 vsnap 2,1419E+12 10.1.5-2076
25.04.2020 01:46 4,4797E+12 cetvm63 TRUE cetvm63 81,48549852 18,51450148 3103 2109 5,49755E+12 vsnap 1,01784E+12 10.1.6-1625
25.04.2020 04:51 4,01947E+12 cetvm63 TRUE cetvm63 73,11395311 26,88604689 3103 2109 5,49755E+12 vsnap 1,47807E+12 10.1.6-1625
25.04.2020 04:59 4,00266E+12 cetvm63 TRUE cetvm63 72,80804113 27,19195887 3103 2109 5,49755E+12 vsnap 1,49489E+12 10.1.6-1625
25.04.2020 05:05 3,98222E+12 cetvm63 TRUE cetvm63 72,43636644 27,56363356 3103 2109 5,49755E+12 vsnap 1,51532E+12 10.1.6-1625
25.04.2020 12:19 3,57224E+12 cetvm63 TRUE cetvm63 64,97873985 35,02126015 3103 2109 5,49755E+12 vsnap 1,92531E+12 10.1.6-1625

Tags explained

time, hostAddress, isReady, name, site, storageId, type and version

  • time: Always a tag, it is indexed, you can always select on.
  • name, site, storageId, hostAddress: This is like a identifier, you probably sometimes are going to select * from storages where hostAddress = 'xyz'. Therefore it should be a tag
  • isReady: This is a a tag since it groups the result into ready and not ready. This might also be a field, but i have used it as tag since we use it to filter and group.

To make a series over the isReady tag, you need to create via a continious query a new measurement, grouped by time. Then you can check on how much % of this time the storage was ready.

You can more easily query statistic data from those tags due a count(randomField) group by type, time(5m)

  • type, version: This is similar to isReady, but has more distinct values. It does still group the results into different results -> Tag.

Fields explained

free, pct_free, pct_used, total and used

Those all have no use in grouping the result towards any storage or type of it. You can still select * from storages where free > 10. The data itself is the result you want to query or to build a time series over.

Clone this wiki locally