layout |
---|
default |
This repository can be used to host many kinds of large files including PowerPoint presentations, data archives, movies, 3D object files, etc. The repository is configured to treat files with various extensions using GitHub's Large File Support.
Currently, there are layouts defined for data archives and presentations. See the section on adding an new file type if you need to expand the kinds of files hosted here.
Below, we provide instructions for adding binary data archives. Apart from perhaps skipping the first step to create and compress an archive file, the same procedure can be used to add presentations, movies, etc.
The instructions below are written assuming all data files to be added are stored
in a directory named foo_data
. When creating archives, it is a best practice to
ensure all files in the archive expand into a single top-level directory which is
the same name as the archive file name without the extension. For example, all
files in the foo_data.tar.gz
file would expand into a directory named foo_data.
-
It is not a requirement but please try to create all download format options,
.tar.xz
,.tar.gz
and.zip
whenever possible. If you provide only one option,.tar.gz
is probably the best because all platforms have ubiquitous tools to process that format. The advantages of the various formats are....tar.xz
- Usually the smallest files by 2-3x.
- Tools are not pre-built on all systems so users may wind up having to download and install them prior to use.
- A command to produce a maximally compressed
.tar.xz
data file using 4 threads...
tar cvf - foo_dir | xz -9e -T4 - > foo_dir.tar.xz
.tar.gz
(aka.tgz
, tarball)- Most commonly used on Linux/Unix systems.
- Windows and Mac tools often handle
.tar.gz
files. - Best format if you only provide one download option.
- A command to produce a
.tar.gz
compressed data file...
tar cvf - foo_data | gzip --best > foo_data.tar.gz
.zip
(zip)- Most commonly used on Windows systems.
- Linux/Unix and Mac tools often handle
.zip
files. - A command to produce a
.zip
compressed data file...
zip -9 foo_data.zip foo_data
-
Add your data files to the
bindata
directory. Because we don't expect content here to be simultaneously revised by multiple developers or to be changing on a frequent basis, it is perfectly fine to do all the work here directly on themaster
branch.git pull git add bindata/foo_data.tar.xz git add bindata/foo_data.tar.gz git add bindata/foo_data.zip git commit -a -m 'adding foo data' git push
-
Pushing your added data files to GitHub can take a long time depending on file sizes. Once the operation completes, be aware that the files you see in the repo on GitHub will be LFS pointer files . See instructions regarding download links about how to define links to LFS'd content.
-
Create the dataset's landing page by creating a markdown file,
foo.md
, in the_datarchives
collection directory. In the front-matter for this file, you may optionally define the file sizes, sha256 and md5 checksums for the formats you host. If you don't host a specific format, then don't include lines fornbytes
member of that format in the front-matter. Also, feel free to include a detailed description of the data in the body of the file. -
You may optionally add an image for the data. Be sure to create one about 300-600 pixels in width and another, thumbnail, about 64 pixels in height. Be sure to name the files
foo.png
andfoo_thumbnail.png
and put these files in the_datarchives
collection directory allong with thefoo.md
markdown file. If you do this, be sure to set the variablehas_image: true
infoo.md
frontmatter.git add _datarchives/foo.md git add _datarchives/foo.png git add _datarchives/foo_thumbnail.png git commit -a -m 'adding foo data' git push
-
Wait for the site to rebuild. This usually takes less than a few minutes after your push.
Currently, the repository is configured to handle two kinds of file types, data archives and presentations. The steps below outline the process to add a new file type.
-
Edit
_config.yml
to define the new collection. For example, to add a newgorfos
collection, add a new member to thecollection
member like so...collections: gorfos: output: true formats: - .foo - .bar format_remarks: - "foo is for foobirds." - "bar is for barbees."
The output: true
tells GitHub/Jekyll site generator to output an
html
file for each member of the gorfo
collection. The formats
member lists the possible file extensions for this class of file.
The format_remarks
is short text that will appear as a tool-tip in
a web browser when a user hovers the mouse over the format options
on various pages.
-
Add the file extensions to the
.gitattributes
file like so...*.foo filter=lfs diff=lfs merge=lfs -text *.bar filter=lfs diff=lfs merge=lfs -text
-
Create a layout for this new class of file in the
_layouts
directory namedgorfo.html
. This layout will define how landing pages for instances of the gorfo class will appear. These landing pages capture information about a gorfo file instance being hosted such as the size in bytes, sha256 and md5 checksums for various formats. Use the existing layouts as examples to see how this is done. -
Create a top level directory named
_gorfo
. This is where markdown files used to describe each instance of agorfo
file will be placed along with whatever additional assets they may need like images or whatever. For example, if you add a new member to thegorfo
collection namedfred
, there would be afred.md
file in the_gorfo
directory that describes information about the file as well as the file sizes and integrity checks captured asfront-matter
. -
Create a top level directory, such as
bingorfo
where the actual gorfo files will be stored (as LFS'd content). -
Now, you are free to add members to the
_gorfo
directory. Each will involve a new.md
file. See the examples in thedatarchives
collection.