stop abusing git for timing data storage #36

jrevels · 2017-07-24T18:36:56Z

Should've made this issue a long time ago.

Each Nanosoldier run generates a fair amount of timing data, which is currently stored in https://github.com/JuliaCI/BaseBenchmarkReports. As was discussed way back in the early days of Nanosoldier, this is a pretty gross abuse of git/GitHub.

We could instead just dump the data on a publicly accessible filesystem (and eventually, let the data be ingested by a more granularly queryable database).

While not directly tied to this issue, it'd also be nice to tackle the old lightweight/stable/portable serialization issue at the same time. It should be simple enough to write a JSON (de)serializer for the list of (benchmark key, BenchmarkTools.Trial) pairs you'd need to store.

The text was updated successfully, but these errors were encountered:

simonbyrne · 2017-07-25T21:32:00Z

What about sticking it in some cloud nosql store (DynamoDB, BigTable, etc.)?

StefanKarpinski · 2017-07-28T03:09:57Z

How about producing CSV and pushing it to S3?

jrevels · 2017-07-28T14:43:08Z

How about producing CSV and pushing it to S3?

The data's not well structured for CSV (AFAICT you'd have to have many, many smallish files), but yeah, any simple storage solution should work fine for this to start.

vtjnash · 2021-03-04T22:29:02Z

The json files compress well (e.g. JuliaCI/BenchmarkTools.jl#79), so while we have generated a lot of data, and may want to add a TSDB for other reasons, the current rate of growth isn't terrible:

$ du -sh NanosoldierReports/
13G     NanosoldierReports/
$ du -sh NanosoldierReports/.git
5.8G    NanosoldierReports/.git

$ du -sh NanosoldierReports/pkgeval/by_date/latest/
11M     NanosoldierReports/pkgeval/by_date/latest/

$ du -sh NanosoldierReports/benchmark/by_date/2021-02/17/
7.4M    NanosoldierReports/benchmark/by_date/2021-02/17/

What might make the most difference is doing each by_date run as an update to latest directly (instead of keeping each copy in a folder). That would make the most use of git's abilities to compare logs and delta-compress changes.

KristofferC · 2021-09-07T14:31:27Z

I kind of doubt the value in saving the timing data for each trial and think it is enough to save the statistics. Sure, people said some years ago that someone maybe would want to do some analysis on the data but looking at the activity in these reports, I doubt that will happen, nor bring anything actionable for old runs.

vtjnash · 2021-09-07T14:43:17Z

The isdaily reports are generated from the data.tar.xz files

KristofferC · 2021-09-07T14:50:00Z

Yes, in there is a ~500 MB json file in there that contains the timing for every sample for every benchmark. I don't think anything has been done with those trial timings other than computing the minimum of it. So I am saying to just store e.g. the minimum and reduce the file size by a couple of order of magnitudes.

maleadt · 2023-01-17T11:42:54Z

Actual timing data isn't stored anymore.

jrevels mentioned this issue Oct 13, 2017

use more portable/stable serialization/deserialization JuliaCI/BenchmarkTools.jl#15

Closed

vtjnash mentioned this issue Sep 7, 2021

migrate to Git LFS for data.tar.xz files #97

Open

maleadt closed this as completed Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stop abusing git for timing data storage #36

stop abusing git for timing data storage #36

jrevels commented Jul 24, 2017

simonbyrne commented Jul 25, 2017

StefanKarpinski commented Jul 28, 2017

jrevels commented Jul 28, 2017

vtjnash commented Mar 4, 2021

KristofferC commented Sep 7, 2021

vtjnash commented Sep 7, 2021

KristofferC commented Sep 7, 2021

maleadt commented Jan 17, 2023

stop abusing git for timing data storage #36

stop abusing git for timing data storage #36

Comments

jrevels commented Jul 24, 2017

simonbyrne commented Jul 25, 2017

StefanKarpinski commented Jul 28, 2017

jrevels commented Jul 28, 2017

vtjnash commented Mar 4, 2021

KristofferC commented Sep 7, 2021

vtjnash commented Sep 7, 2021

KristofferC commented Sep 7, 2021

maleadt commented Jan 17, 2023