Data Column Names #4

spkaluzny · 2019-05-05T04:28:26Z

I think we want to think about the names for the eruption data in R. The names from the tsz data file are:
eruptionID geyser eruption_time_epoch has_seconds exact ns ie E A wc ini maj min q duration entrant observer eruption_comment time_updated time_entered associated_primaryID other_comments
It would be good to have descriptive names with consistent character case. Similar names length would be good as well.

I realize that the data has been available for some time with the above names from the archive and I don't know if using different names in R would have any ramifications.

The text was updated successfully, but these errors were encountered:

taltstidl · 2019-05-10T09:56:09Z

I agree, the current column names are a product of historical developments and are not quite normalized or self-explanatory. I'm including a draft of possible new names here along with a short description (coming later 😉):

eruption_id: The unique database identifier of the eruption.
geyser: The unique name of the geyser that erupted.
time: The timestamp of the eruption. Note that there are modifiers that can change the interpretation of this timestamp, which are also listed below.
has_seconds: Whether the eruption timestamp was recorded with a second precision. If not set, any seconds of the timestamp should be disregarded.
exact: A timestamp modifier which indicates that the exact start time was recorded.
near_start: A timestamp modifier which indicates that the exact start time was not recorded, but where circumstantial evidence suggests that the time was near the actual start of the eruption.
in_eruption: A timestamp modifier which indicates that the start time was recorded when the geyser was already in eruption.
electronic: A timestamp modifier which indicates that the start time was inferred using electronic monitoring equipment such as temperature loggers and seismographs.
approximate: A timestamp modifier which indicates that the given start time is only a rough estimate, usually based on post-eruptive evidence.
webcam: Whether the eruption was seen on a webcam or in-basin.
initial: Whether the eruption was the initial one in a series of eruptions. This is only applicable to geysers which erupt in series.
major: Whether the eruption was of the major type. This is only applicable to geysers that have minor and major eruptions.
minor: Whether the eruption was of the minor type. This is only applicable to geysers that have minor and major eruptions.
questionable: Indicates that there is uncertainty about the report, usually when the observation conditions make it hard to determine the geyser with certainty.
duration: The duration of the eruption as raw text.
entrant: The username of the user that entered the eruption.
observer: The name of the person that observed the eruption. If not given, the observer coincides with the entrant.
comment: Comments on the eruption, usually consisting of more detailed observations on the eruption or the events leading to the eruption.
time_entered: The timestamp when the eruption was entered, as determined by the entry client.
time_updated: The timestamp when the eruption was last updated, as determined by the entry client. If the eruption was not edited, this coincides with the timestamp when it was entered.
primary_id: The unique database identifier of the primary eruption. Reports of the same eruption are grouped together, with the most representative report being selected as the primary eruption.
other_comments: Comments by other people on the eruption.

While ideally we would change the column names directly within the source TSV files, I'm a bit reluctant as it might break things for people already using our archive files. I'll bring it up at our next meeting though. Also, I'll be looking into adding our parsed durations (a numerical value of the duration in seconds) to the archive files.

taltstidl · 2019-05-12T16:04:34Z

@spkaluzny I've updated the column descriptions. We've decided against renaming the column names within the archive files, so it's probably best to map these within the gt_get_data function. If there's anything else I can do, please let me know.

taltstidl mentioned this issue May 8, 2019

Initial package for getting and loading the data. #3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Column Names #4

Data Column Names #4

spkaluzny commented May 5, 2019

taltstidl commented May 10, 2019 •

edited

Loading

taltstidl commented May 12, 2019

Data Column Names #4

Data Column Names #4

Comments

spkaluzny commented May 5, 2019

taltstidl commented May 10, 2019 • edited Loading

taltstidl commented May 12, 2019

taltstidl commented May 10, 2019 •

edited

Loading