-
Notifications
You must be signed in to change notification settings - Fork 14
Fetching and timestamps
Fetch items all have a fetch
method, whether you write it or use a built-in fetcher from vizlab. The fetch
method is specified by the fetcher
argument in the viz.yaml, and it does the work of actually downloading or creating a dataset.
Because it's time consuming to fetch data during every build, vizmake()
skips the fetch when possible. Files are always fetched the first time (to create them) or if their dependencies or arguments in the viz.yaml change. Beyond those basic remake-type controls, we use two further controls to limit unnecessary fetching:
-
fetchTimestamp
- ThefetchTimestamp
concept was motivated by datasets that must be fetched from remote data sources, though now we use it for all fetch items. We usually only want to download data files as often as the remote data are changed, so we represent the remote data status with a local timestamp file. The content of that timestamp file is a single datetime in a standard text format as produced by thewriteTimestamp
helper function, where the datetime should describe the time of the most recent update to the remote data - this could be the 'last-modified' time of a remote file or the timestamp of the most recently added observation in a stream of continuous monitoring data. We usefetchTimestamp
to query the remote source for changes in the timestamp. If the remote data haven't changed (as indicated by whether the timestamp has changed), we don't bother refetching the data. If they have changed, we refetch. See the fetchTimestamp methods section below. -
timetolive
- We can further avoid querying a remote source too often by setting the fetch item'stimetolive
to some positive time interval, which causesvizmake
to wait until that interval has passed until fetching the timestamp, let alone the file. See the timetolive and preferences.yaml section below.
Here's how vizmake
decides whether to [re]fetch a fetch item:
Here's how to set up your fetcher
and timetolive
preference to achieve some common goals:
Desired behavior when vizmake is called |
Example use case | Set fetchTimestamp as |
Set timetolive to |
---|---|---|---|
Only fetch/create once ever (option 1) | Local function result, no remote data |
fetchTimestamp.myfetcher <- alwaysCurrent (exactly this code line) |
Ignored |
Only fetch/create once ever (option 2) | Remote file that will never change | fetchTimestamp.myfetcher <- [any function] |
Inf days |
Fetch every time | Small dataset updated remotely every 2 hours | fetchTimestamp.myfetcher <- neverCurrent |
0 secs (or omit) |
Fetch if >=3 hours have elapsed since last fetch | Big dataset updated remotely every 5 minutes | fetchTimestamp.myfetcher <- neverCurrent |
3 hours |
Fetch from ScienceBase only if the file on SB changed | SB file that colleagues will update periodically |
fetcher: sciencebase in viz.yaml |
0 secs (or omit) |
Fetch from URL only if the remote header's "last-modified" value has changed | Single file at URL |
fetcher: url in viz.yaml |
0 secs (or omit) |
Fetch from URL with complex structure or no "last-modified" | Complex precip shapefiles, dataRetrieval query |
fetchTimestamp.myfetcher <- [custom method] |
Up to you |
The fetchTimestamp
method for a viz item is specified by the same viz.yaml argument, fetcher
, that determines the item's fetch
method. In other words, every fetcher
must have both a fetch
method and a fetchTimestamp
method. For common and simple use cases, you can set your custom fetchTimestamp method equal to a built-in fetcher like alwaysCurrent
or neverCurrent
, whose names describe what those methods assume about the currency of the local timestamp relative to the remote timestamp. Three ways you might define your fetchTimestamp method are below. The third is a simplified version of fetchTimestamp.url
; note the use of the readTimestamp
and writeTimestamp
helpers.
fetchTimestamp.myfetcher <- alwaysCurrent
fetchTimestamp.myfetcher <- neverCurrent
fetchTimestamp.myfetcher <- function(viz) {
# URL will be specified in viz.yaml as remoteURL
checkRequired(viz, "remoteURL")
url <- viz$remoteURL
# read the URL header and the current timestamp file
new.timestamp <- headers(HEAD(url))[['last-modified']]
# Parse the new.timestamp into POSIXct for passing to writeTimestamp
new.timestamp <- parse_http_date(new.timestamp)
attr(new.timestamp, "tzone") <- "UTC"
# write the new timestamp to the file if needed
old.timestamp <- readTimestamp(viz)
if(!is.na(new.timestamp) && (is.na(old.timestamp) || (new.timestamp != old.timestamp))) {
writeTimestamp(new.timestamp, viz)
}
invisible() # return nothing
}
Or you can use built-in pairs of fetch
and fetchTimestamp
methods by setting a known fetcher
in the viz.yaml in one of these three ways:
fetcher: sciencebase
fetcher: url
fetcher: file
Your timetolive
settings live in an optional, local file named preferences.yaml. If you create this file, format it like this (where cuyahoga
and iris_data
are IDs of example fetch items from the viz.yaml):
timetolive:
cuyahoga: 1 secs
iris_data: 2 days
Don't git commit this file - it may need to be different on other computers or the Jenkins machine. If a fetch item listed in the viz.yaml is missing from this preferences.yaml file (or if preferences.yaml doesn't exist), the timetolive
for that file will be 0 secs
.
Collaborating
Using vizlab
Specific concepts