diff --git a/CITATION.cff b/CITATION.cff new file mode 100644 index 000000000..f4a48d544 --- /dev/null +++ b/CITATION.cff @@ -0,0 +1,41 @@ +cff-version: 1.2.0 +message: "If you use this software, please cite it as below." +authors: +- family-names: "Hughes" + given-names: "Liane" +- family-names: "Stark" + given-names: "Katarina Öjefors" +- family-names: "Kochari" + given-names: "Arnold" +- family-names: "Panneerselvam" + given-names: "Senthilkumar" +- family-names: "Ewels" + given-names: "Phil" +- family-names: "Ostberg" + given-names: "Linus" +- family-names: "Kraulis" + given-names: "Per" +- family-names: "Rung" + given-names: "Johan" +- family-names: "Lorenz" + given-names: "Jan" +- family-names: "Asklof" + given-names: "Anna" +- family-names: "Kallberg" + given-names: "Yvonne" +- family-names: "Islam" + given-names: "Kazi Jahurul" +- family-names: "Ouyang" + given-names: "Wei" +- family-names: "Kronander" + given-names: "Elin" +- family-names: "Tewatia" + given-names: "Parul" +- family-names: "Englund" + given-names: "Markus" +- family-names: "Hammaren" + given-names: "Rickard" +- family-names: "Xu" + given-names: "Hao" +title: "Swedish Pathogens Portal" +url: "https://github.com/ScilifelabDataCentre/pathogens-portal" \ No newline at end of file diff --git a/CONTRIBUTING/adding_editing_information.md b/CONTRIBUTING/adding_editing_information.md index 311654516..0b30a6252 100644 --- a/CONTRIBUTING/adding_editing_information.md +++ b/CONTRIBUTING/adding_editing_information.md @@ -1,36 +1,46 @@ -# Instructions on adding and editing information displayed on the Portal +# Instructions on adding and editing information displayed on the portal -We welcome contributions from the community in all sections of our Portal. Here, we describe contributing through GitHub - either through the GitHub web interface or through a local copy on your computer. You should have basic knowledge of the GitHub web interface or CLI in order to be able to use this way of contributing. There are multiple other ways to contribute so that you can use the way that is most convenient for you. Please [see this page for information on other ways you can send contributions](https://covid19dataportal.se/contribute/). +Contributions are welcomed for all sections of the portal. This page describes how to make contributions to multiple different types of pages via GitHub. Contributions can be made either in the GitHub web interface (usually used for small additions to existing pages), or by editing a local copy of the portal on your computer and pushing that to GitHub (typically used for larger contributions and new pages). If you would prefer to contribute using a different route, for example, by sending text documents, please email us at [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se). -In short, in order to add or edit information on the Portal, make a fork of this repository and make changes in the corresponding section as described below. After making changes to the section which you would like to edit/add to as described below, you should send a pull request to the `develop` branch. We will review and approve it asap. +**Table of contents:** -__Table of contents:__ - -- [How to propose changes or additions](#how-to-propose-changes-or-additions) -- [Available datasets](#available-datasets) +- [Making contributions via GitHub](#making-contributions-via-github) +- [How to add a new page](#how-to-add-a-new-page) +- [Adding available data](#adding-available-data) +- [Dashboard pages](#dashboard-pages) +- [Data highlights](#data-highlights) +- [Editorials](#editorials) +- [Emerging pathogens](#emerging-pathogens) +- [Events](#events) +- [News about the portal](#news-about-the-portal) - [Funding opportunities](#funding-opportunities) - [Ongoing research projects](#ongoing-research-projects) -- [Data highlights](#data-highlights) -- [News about the Portal](#news-about-the-portal) -- [Resources](#resources) +- [Publications included in a table](#publications-included-in-a-table) +- [Pandemic preparedness resources](#pandemic-preparedness-resources) +- [Topics](#topics) + +## Making contributions via GitHub -## How to propose changes or additions +All of the information displayed on the portal is contained within [this GitHub repository](https://github.com/ScilifelabDataCentre/pathogens-portal). When you enter that repository, you will see multiple folders. The majority of the content can be found within the `content` folder (where there is one folder for each language used on the portal). Some sections also use information that is ncluded within the `data` or `static` folders. Please see the sections further down on this page for information on how to locate different types of pages. It is also possible to use the URL to gain some idea of where a page is located within the folders. For example, all dashboard pages are within the `dashboards` folder, and their URLs are in the structure http://pathogens.se/dashboards/page_name/. -All information displayed on the Portal is contained within [this GitHub repository](https://github.com/ScilifelabDataCentre/pathogens-portal). Some of the sections use information that is stored in the `data` folder in .JSON format while other sections use information that is stored in the `content` folder in Markdown format. +Please note that we require all commits to be verified, so you must sign your commits. For information on how to set this up, see the [GitHub documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits). ### Using the web interface -- Navigate to the folder and file that is indicated below in the corresponding section. -- Click on the top right corner of the document ("Edit this file"). This should create a fork of the original repository in your own GitHub account. You should see a page where you can directly edit the content. -- Make changes as described below; do not forget to change the last update date on top of the document if needed. -- Scroll to the bottom of the page. Describe what you have changed and press 'Propose changes' -- You should now find yourself on a page where you can create a pull request. Check that the pull request will be sent to the base repository `SciLifeLabDataCentre/pathogens-portal` and base `develop`. You can also review the changes you made. Click on "Create pull request" if everything looks good. -- Once created, a member of the Portal team will review your changes. -Once approved, they will be merged and published. +This route is typically used for making relatively small additions/updates to the content. For example, adding an entry to a dataset/table/database or modifying/adding some text to an existing page. + +- Navigate to the folder that contains your file (typically the `content` folder), and then to the file within that folder that represents the page/section that you want to modify. +- Click on the pencil on the top right hand corner of the page (when you hover over the pencil, it will show "fork this repository and edit this file"/"edit this file"). This will create a fork of the portal code in your own GitHub account. You should then be able to edit the content of the page. +- Make all of the neccessary changes. Remember that if the page appears in both Swedish and English, it is necessary to update both versions of the page. +- Click on "Commit changes". +- Write a description of what you have done in the "Propose changes" pop-up screen, and then press the "Propose changes" button when you are done. +- You should now be taken to a page that allows you to create a pull request. Ensure that the request is being made to the `develop` branch of the base repository `SciLifeLabDataCentre/pathogens-portal`. +- You can see the files that you have changed at the bottom of the page, and can double-check that everything looks as intended. If it does, click "Create pull request". +- A member of the portal team will review the pull request as quickly as possible. Once approved, the changes will show on the portal. ### Using a local copy -If you prefer, you can edit the website files on your computer in your favourite text editor. Fork this repository to your account. Then, clone the forked repository to your machine: +It can sometimes be preferable to make changes to files directly on your computer in your favourite text editor and then push those changes to GitHub. This can be particularly true, for example, when adding files or making larger changes. To do this, you can fork this repository to your account and then clone the forked repository to your machine: ```bash git clone git@github.com:[YOUR-USERNAME]/pathogens-portal.git @@ -49,6 +59,8 @@ Then you can fetch changes at any time from this remote: git pull upstream develop ``` +You can now edit/add files with the text editor on your computer. You can test your changes locally, so you can see how it will look on the site. This is good for ensuring that your changes appear as intended. See [this page](https://github.com/ScilifelabDataCentre/pathogens-portal/blob/develop/CONTRIBUTING/running_a_local_copy.md) for information on how to see the effect of changes that you're making locally. + When you have finished editing, commit and push to your fork: ```bash @@ -59,103 +71,117 @@ git push Once you're finished with your edits and they are committed and pushed to your forked repository, it's time to open a pull request. In short: -- Visit the main repository: [https://github.com/ScilifelabDataCentre/pathogens-portal](https://github.com/ScilifelabDataCentre/pathogens-portal) -- Click the button that reads _"New Pull Request"_ -- Click the text link near the top that says _"compare across forks"_ -- In the right-hand _"head repository"_ drop down, select your username / fork. -- If you're happy with the list of commits shown, and the diff in the _"Files Changed"_ tab, fill in a title and description and click _"Create pull request"_ +- Go to your repository on GitHub and click "New Pull Request". +- Alternatively, go to the main repository on GitHub ([https://github.com/ScilifelabDataCentre/pathogens-portal](https://github.com/ScilifelabDataCentre/pathogens-portal)) and click "New Pull Request", then click the text link near the top that says "compare across forks". +- At the top of the page, you should see a pale grey box under the 'Comparing changes' text. The right-hand "head repository" drop down should show your username/fork and the branch that you worked on. Under 'base repository' you should see "ScilifelabDataCentre/pathogens-portal" and the 'base' should read 'develop'. +- If you're happy with the list of commits and file changes shown towards the bottom of the page, click "Create pull request". +- Write a title and description for your pull request, and then click 'Create pull request'. +- A member of the portal team will review the pull request as quickly as possible. Once approved, the changes will show on the portal. -Once created, a member of the Portal team will review your changes. -Once approved, they will be merged and published. +## How to add a new page -#### Testing with a local copy of the Portal +The easiest way to add a new page to the portal is to [make a local copy](#using-the-web-interface). You can then navigate to the folder to which you would like to add a page (see below to gain an idea of where this might be, or check the URL of pages in the same section, as that will indicate the file structure). Create a new markdown (.md) file. We recommend copying the metadata from an existing page from the same folder to initiate your page. The metadata is the piece at the top of the file that is between two lines of 3 dashes (i.e. '---'). You will have to change the metadata to reflect information about your new page, but this will ensure that your page is correctly initiated. -Because the Portal is built on Hugo, it is quite easy to run a full version of the Portal on your computer and see how your changes look while doing that. See [this page for information on how to do that](https://github.com/ScilifelabDataCentre/pathogens-portal/blob/develop/CONTRIBUTING/running_a_local_copy.md). +## Adding available data -## Available datasets +We have a dataset of available data/code from Sweden that can be found [in the available data section of the portal](http://pathogens.se/datasets/all/). Multiple different types of data can be added to this dataset. Please note that this dataset is also updated approximately monthly by the portal team. -_Instructions coming soon._ +In order to add entries in the dataset, you should edit the `available_datasets.json` within the `data` folder. You need to update the date at the top of the file to the date that you are making the update. Then you can add your entry/entries. Please note that entries must include one author affiliated with a Swedish university, should pertain to COVID-19, and the data/code should be openly available, or have clear instructions for how access can be made (in the event that the data could not be shared openly). Each entry should be in the format: -## Funding opportunities +```bash + { + "doi": "DOI of the paper", + "available_items": [ + { + "type": "data", + "repo_name": "", + "accession_number": "", + "description": "Information that is sufficient for anyone hoping to assess whether they can reuse the data.", + "data_type": ["Biochemistry data", "Protein data", "Serology data", "Public health data", "Health data", "Genomics & transcriptomics data", "Drug discovery data", "Imaging data", "Social science and humanities data", "Other data" (delete as appropriate)], + "data_url": "URL to data" + }, + { + "type": "code", + "repo_name": "", + "accession_number": "", + "description": "Information that is sufficient for anyone hoping to assess whether they can reuse the data.", + "data_type": [see above - different data/code instances within one entry should be the same], + "data_url": "" + } + ], + "title": "title of paper", + "issue": "", + "volume": "", + "publisher": "", + "published": "yyyy-mm-dd", + "author": [ + "List O. A.", "Author T." + ] + }, +``` -We collect funding opportunities relevant for COVID-19, infectious diseases, and antibiotic resistance research. These are displayed under `/funding/`. The data is stored in JSON format in `data/funding.json`. Below is the format used for each entry. All required fields have to be filled out and in some cases there is a specific format. Note that for the field _topic_ you should choose one or more topics corresponding of the call. +## Dashboard pages -```JSON -{ - "topic": ["COVID-19", "General", "Antibiotic resistance", "Infectious diseases"], - "funder": "funder name, required field", - "call_title": "call title, required field", - "call_url": "call URL, including https://, required field", - "call_description": "call description, optional field, markdown formatting allowed", - "applicant": "information about who can apply, optional field, markdown formatting allowed", - "decision_date": "information about when the decision will be made, optional field", - "funding_period": "information about the duration of funding, optional field", - "funding_amount": "information about the amount one call apply for, optional field", - "submission_opendate": "date in format '2006-01-02', optional field", - "submission_deadline": "date in format '2006-01-02', required field" -} -``` +**Data dashboards** are pages that include data from either research groups or public data sources. They include custom, dynamic visualisations of the data alongside relevant information about the background of the study, the methods used to collect data, and the research groups involved, among other things. -At the beginning of the file `data/funding.json` there is a field for the last updated date of this file. The date here needs to be updated whenever new calls are added or changes are made. +### Data visualisations -```JSON - "last_updated": "2006-01-02", -``` +The portal team are happy to create code for custom, dyanmic visualisations for your data. We work with those involved in data collection to create these visualisations, so that they show the data in the most appropriate way. The visualisation codes that we have written to date can be found in [our visualisations GitHub repository](https://github.com/ScilifelabDataCentre/pathogens-portal-visualisations). Our visualisations are typically writtten in Plotly in Python. To get a visualisation produced for your data, please email us at [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se). -## Ongoing research projects +### Dashboard files -We maintain a database of currently ongoing research projects on COVID-19, infectious diseases, and antibiotic resistance in Sweden. These are displayed under `/research_projects/`. The data is stored in JSON format in `data/research_projects.json`. Below is the format used for each entry. All required fields have to be filled out and in some cases there is a specific format. Note that for the field _topic_ you should choose one or more topics corresponding of the call. +There are multiple dashboard pages. They are located within the `dashboards` folders of the `english` and `svenska` folders of the `content` folder. Some dashboards (e.g. the wastewater dashboard) contain multiple pages. In such cases, the pages are included within the same subfolder. -```JSON -{ - "topic": ["COVID-19","Infectious diseases","Antibiotic resistance"], - "funder": "funder name, required field", - "project_title": "title of the research project, required field", - "project_description": "description of the project, optional field, markdown formatting allowed", - "funding_amount": "funding amount in SEK (e.g., 9.000.000 SEK), optional field", - "pi": "name of the principal investigator responsible for the project, optional field", - "pi_affiliation": "name of the university/institute where the project is carried out, optional field", - "startdate": "project start date in format '2006-01-02', optional field", - "enddate": "expected project end date in format '2006-01-02', required field", - "url": "URL (starting with http:// or https://) where more information about the project can be found, optional field" -} +The file for each page is in markdown format. The top of the file is considered metadata, and it sets up your page. It looks as follows: + +```Markdown +--- +title: Title of your page +description: A short (around 100 chracters) description of your dashboard +banner: /dashboard_thumbs/picture.jpg (see information on 'illustrations' below) +toc: false (false means that no table of contents will be show, on the page) +plotly: true (trues means that plotly will be used on the page, this is needed for the visualisations) +menu: + dashboard_menu: + identifier: set_name_id + name: "name as it should appear in the dashboard menu in the header" +dashboards_topics: [COVID-19, Infectious diseases] (add the names of the topics that are appropriate - see below for topics) +--- ``` -At the beginning of the file `data/research_projects.json` there is a field for the last updated date of this file. The date here needs to be updated whenever new projects are added or changes are made. +Under the metadata, you can write information about the research and the data. The portal team can help with adding code related to plots within the page. -```JSON - "last_updated": "2006-01-02", -``` +### Illustrations + +Links to dashboard pages are typically shown in cards e.g. in . These cards show a small image that is representative of that dashboard. The size of the image should be 250 px high and 500 px wide. The portal team can help with resizing/editing images. Images should be placed in the `/static/dashboard_thumbs` folder. ## Data highlights -__Data highlights__ is a section of the portal which contains news items promoting recent openly shared data that can potentially be used by many other researchers to make further discoveries or notable data re-use examples. +**Data highlights** are short, data-centric articles focusing on recent research that openly shares data, code, or other research outputs ### Illustrations -Typically, for each data highlight we prepare two illustrations. One illustration is smaller and appears on overview page with all data highlights. The other illustration appears on the page of the highlight itself. - -The smaller illustration needs to have the width that is twice the length (i.e., length `300 px` and width `600 px`). This way, it will easily fit the look of the page layouts. - -Both illustrations should be placed in the `/static/highlights/banners` folder. The URL of the images placed here will then be `https://covid19dataportal.se/highlights/banners/file_name.png`. +Each highlight can have two illustrations. One is used as a thumbnail image on the pages showing multiple highlights e.g. . The other can be shown on the page. Only the first is required, and should be 250 px high and 500 px wide. The portal team can help with resizing/editing images. The illustrations should be placed in the `/static/highlights/banners` folder. The images can be in .png or .jpg format. ### Data highlight files -The data highlights are generated from Markdown formatted files contained in the `/content/english/highlights/` folder. The file name used here will also be the URL of the data highlight (e.g., `test-highlight.md` will become `https://covid19dataportal.se/highlights/test-highlight/`). +The data highlights are generated from Markdown formatted files contained in the `/content/english/highlights/` folder. The file name used here will also be the URL of the data highlight (e.g., `test-highlight.md` will become `https://pathogens.se/highlights/test-highlight/`). ### Content of the data highlight files -Below is an example of a data highlight file content. You can copy this text into your markdown file and edit it to write your own data highlight. +Below is an example of a data highlight file content. You can copy this text into your markdown file and edit it to write your own data highlight, or you can copy an existing highlight and edit the copy. As with the dashboard files, and most other pages in the site, the file comprises of page metadata and then the content shown on the page itself. ```Markdown --- -title: Important new dataset shared -date: 2021-01-01 -summary: A new dataset containing a large amount of valuable data has been openly shared. -banner: /highlights/banners/example.png -banner_large: /highlights/banners/example_large.png -banner_caption: "Illustration of X. The image was taken from Y." -topics: [COVID-19, Infectious diseases, Antibiotic resistance] +title: Title of page +date: yyyy-mm-dd +summary: A summary of around 100 characters in length +banner: /highlights/banners/picture.png (thumbnail image, see above for details) +banner_large: /highlights/banners/picture.png (larger images shown on the page) +banner_caption: "caption for the image shown on the page" +highlights_topics: [COVID-19, Infectious diseases] (include the names of any topics that apply - see topics section for details) +tags: [keyword 1, keyword 2,...] +images: [/highlights/banners/picture.png] --- This is the text of the highlight. This is the first paragraph. Introduce why this is an important topic. @@ -164,7 +190,7 @@ This is the second paragraph of the text of the highlight. Markdown formatting s We typically describe exactly what data has been shared, how it can be re-used, and give links to where it can be downloaded. -#### Data +#### Data and code availability * [Shared dataset 1](https://example.com/data1/): description of shared dataset 1 * [Shared dataset 2](https://example.com/data2/): description of shared dataset 2 @@ -174,42 +200,87 @@ We typically describe exactly what data has been shared, how it can be re-used, DOI: [_put_DOI_here](https://doi.org/_put_DOI_here) -Andersson, M., Johansson, S., Karlsson, A. Title of the journal publication *Journal Title*, **X** (X) (20XX). +Authors. A..... (20XX) Title of the journal publication. In: Journal Title (Vol. XX, Issue XX, XXXX). #### Funding We typically put funder information near the end, here. +#### Infrastructure + +Information about any infrastruture used to complete the study. ``` -On the top of the file, surrounded by `---`, basic information for this data highlight is provided. It contains the title; publication date (desired; Hugo needs to be run on that day or later for it to appear); summary text that appears in the homepage and in the main page of the Data highlights section; location of the illustration to be displayed on the homepage (`banner`); location of the illustration to be displayed on the page of the highlight (`banner_large`); caption text that will appear under the illustration on the page of the highlight. +## Editorials -The title, date, summary, illustrations will appear where they are supposed to be. +**Editorials** are short, opinion-style articles that describe the current state of an area of research. They are contributed by those working in that area. They are similar in format to data highlights. -## News about the Portal +### Illustrations -News items about the Portal are published under `/updates/`. The news items are written about new sections launched, major updates of the Portal, or major achievements. +Each editorial has an illustation that should be 250 px high and 500 px wide. The portal team can help with resizing/editing images. The illustrations should be placed in the `/static/editorials` folder. The images can be in .png or .jpg format. -### Illustrations +### Editorial files -Typically, for each news item we prepare two illustrations. One illustration is smaller and appears on overview page with all news items. The other illustration appears on the page of the news item itself. +The editorials are generated from Markdown formatted files contained in the `/content/english/editorials/` folder. The file name used here will also be the URL of the editorial (e.g., `test-editorial.md` will become `https://pathogens.se/editorials/test-editorial/`). -The smaller illustration needs to have the width that is twice the length (i.e., length `300 px` and width `600 px`). This way, it will easily fit the look of the page layouts. +### Content of the editorial files -Both illustrations should be placed in the `/static/updates/banners` folder. The URL of the images placed here will then be `https://covid19dataportal.se/updates/banners/file_name.png`. +Below is an example of a editorial file content. You can copy this text into your markdown file and edit it to write your own editorial, or you can copy an existing editorial and edit the copy. As with the dashboard files, and most other pages in the site, the file comprises of page metadata and then the content shown on the page itself. + +```Markdown +--- +title: "title of editorial" +date: yyyy-mm-dd +summary: a short summary of the editorial (less than 100 characters) +banner: /editorials/image_names.jpg +banner_caption: 'caption of image' +tags: [keyword 1, keyword 2...] +editorials_topics: [Infectious diseases] (any topics that apply, see below for information on topics) +editorials_authors: [Your Name] +images: [/editorials/image_name.jpg] +--- + +This is the text of the editorial. This is the first paragraph. + +This is the second paragraph of the text. Markdown formatting should be used in the text. For example, you can make a piece of text italic by placing an asterisk at the beginning and end, *like this*. You can make a piece of text bold by placing two asterisks at the beginning and end, **like this**. You can also add a link with square brackets following round round brackets, [like this](https://example.com/data/). + +#### Cite this editorial + +The portal team can help to upload the editorial on the [SciLifeLab Data Repository](https://figshare.scilifelab.se). Please email us at [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se) to discuss this. +``` + +## Emerging pathogens + +The **emerging pathogens** pages are intended to show the latest information in the event of a new outbreak. The pages are found in the [emerging pathogens section](https://pathogens.se/pathogens/), located in the `/content/english/pathogens/`. It is the first type of page that we put up about an emerging disease, and do so as soon as possible. We include any information and all resources that are currently available. The structure of the pages vary, but only requires a title in the metadata section (see e.g. editorials and highlights above for more information.) and some text in markdown format below this. + +## Events + +The **events** page shows events related to topics around pandemic preparedness, e.g. antibiotic resistance. Users should go to the [events page](https://pathogens.se/events/) on the portal itself and fill in the form to suggest an event. The portal team will typically add the suggestion within one working day. + +## Funding opportunities + +We collect funding opportunities relevant for topics related to pandemic preparedness, e.g. COVID-19, infectious diseases, and antibiotic resistance research. Users should go to the [funding page](https://pathogens.se/funding/) on the portal itself and fill in the form to add a funding opportunity. The portal team will typically add the suggestion within one working day. + +## News about the portal + +News items about the portal are published under `/updates/`. The news items are written about new sections launched, major updates to the portal, or other important information for the communitty. + +### Illustrations + +Each news item can have two illustrations. One is used as a thumbnail image on the [portal news page](http://pathogens.se/updates/). The other can be shown on the page. Only the first is required, and should be 250 px high and 500 px wide. The portal team can help with resizing/editing images. The illustrations should be placed in the `/static//updates/banners/` folder. The images can be in .png or .jpg format. ### News files -The news items can be added in the folder `/content/updates/`. Each news item is a file with extension __.md__. The file name used here will also be the URL of the news item (e.g., `test-news.md` will become `https://covid19dataportal.se/updates/test-news/`). +The news items can be added in the folder `/content/english/updates/`. Each news item is a file with extension **.md**. The file name used here will also be the URL of the news item (e.g., `test-news.md` will become `https://pathogens.se/updates/test-news/`). ### Content of the news files -Below is an example of a news file content. You can copy this text into your markdown file and edit it to write your own news item. +Below is an example of a news file content. You can copy this text into your markdown file and edit it to write your own news item. It comprises some basic page metadata (used to allow the page to be correctly laid out, and between two lines of '---') and then some text in markdown format. The file should be a markdown file (i.e. with the extension '.md'). ```Markdown --- -title: New section on the Portal launched -date: 2006-01-02 +title: title of news item +date: yyyy-mm-dd summary: Today we are launching a new section on the Portal devoted to... banner: /updates/banners/example.png banner_large: /updates/banners/example_large.png @@ -224,8 +295,81 @@ This is the third paragraph of the news item. ``` -On the top of the file, surrounded by `---`, basic information for this news item is provided. It contains the title; publication date (desired; Hugo needs to be run on that day or later for it to appear); summary text that appears on the overview page; location of the illustration to be displayed on the overview page (`banner`); localtion of the illustration on the page of the news item (`banner_large`); caption text that will appear under the illustration on the page of the news item. +## Ongoing research projects + +We maintain a dataset of currently ongoing research projects on COVID-19. These are displayed under `/research_projects/`. The data is stored in JSON format in `data/research_projects.json`. Below is the format used for each entry. All required fields have to be filled out and in some cases there is a specific format. Note that for the field _topic_ you should choose one or more topics corresponding of the call. + +```JSON +{ + "topic": ["COVID-19"], + "funder": "funder name, required field", + "project_title": "title of the research project, required field", + "project_description": "description of the project, optional field, markdown formatting allowed", + "funding_amount": "funding amount in SEK (e.g., 9.000.000 SEK), optional field", + "pi": "name of the principal investigator responsible for the project, optional field", + "pi_affiliation": "name of the university/institute where the project is carried out, optional field", + "startdate": "project start date in format '2006-01-02', optional field", + "enddate": "expected project end date in format '2006-01-02', required field", + "url": "URL (starting with http:// or https://) where more information about the project can be found, optional field" +} +``` + +At the beginning of the file `data/research_projects.json` there is a field for the last updated date of this file. The date here needs to be updated whenever new projects are added or changes are made. + +```JSON + "last_updated": "2006-01-02", +``` + +## Publications included in a table + +There are multiple places on the portal where relevant publications are shown in tables. In all cases, these publications are taken from a central publications database that is updated monthly by the portal team. The publications are all about COVID-19, and involve at least one author affiliated with a Swedish research organisation. To see this database in full, go to the [Swedish COVID-19 Publications page](http://pathogens.se/publications/). Users cannot add publications, but can email the pathogens portal team at [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se) to add relevant publications. + +## Pandemic preparedness resources + +**Pandemic preparedness resources pages** comprise information about resources that have been developed in Sweden related to pandemic preparedness. Currently, this comprises primarily of information about projects that are part of SciLifeLab's Pandemic Laboratory Preparedness (PLP) program. The pages are found in the `/content/english/resources/` folder. The pages are in markdown format, and have a '.md' file extension. As with other pages on the portal, they comprise of some initial page metadata that is used to set up the page (and lies between two lines of '---'), and then text in markdown format. The format of these pages is shown below. You can copy this to create a new page, or copy and edit an existing page. -## Resources +```Markdown +--- +title: "title" +category: "plp2" used to place in one of the plp categories +resource_info: + name: "project name" + pi: list of PI names, + host_organisation: XXX University + contact: "name
title
Email: [example@example.se](mailto:example@example.se)

A"name
title
Email: [example@example.se](mailto:example@example.se)" +for_background_table: + pi: PI name + pi_affiliation: XXXX University +--- -_Instructions coming soon._ +This is the text of the page. This is the first paragraph. + +This is the second paragraph of the text. Markdown formatting should be used in the text. For example, you can make a piece of text italic by placing an asterisk at the beginning and end, *like this*. You can make a piece of text bold by placing two asterisks at the beginning and end, **like this**. You can also add a link with square brackets following round round brackets, [like this](https://example.com/data/). +``` + +## Topics + +Multiple pandemic preparedness topics are covered on the portal. It is possible for users to filter the content of the portal according to a given topic. Topics include, for example, COVID-19, Antibiotic resistance, Influenza, and Mpox. The current topics can be viewed in the [topics section](http://pathogens.se/topics/). In order to suggest a new topic, please contact the the pathogens portal team at [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se). This will allow us to be able to put in place the background code that is needed to make it possible to filter by this topic. + +### Content of topics files + +The format of the topics pages are similar to other pages on the portal. The pages are in markdown format, and have a '.md' file extension. As with other pages on the portal, they comprise of some initial page metadata that is used to set up the page (and lies between two lines of '---'), and then text in markdown format. The format of these pages is shown below. You can copy this to create a new page, or copy and edit an existing page. They are located in the `/content/english/topics/` folder. The exact format, which you can use to generate a new topics page is shown below. + +```Markdown +--- +title: title +description: Short description of approximately 100 characters +banner: /topic_thumbs/topic_antibiotic.jpg (picture to be associated with the topic as a thumbnail) +credits: +toc: false (false means that no table of contents will be included) +topic: name of topic +menu: + topics_menu: + name: name of topic + identifier: name_of_topic +--- + +This is the text of the page. This is the first paragraph. + +This is the second paragraph of the text. Markdown formatting should be used in the text. For example, you can make a piece of text italic by placing an asterisk at the beginning and end, *like this*. You can make a piece of text bold by placing two asterisks at the beginning and end, **like this**. You can also add a link with square brackets following round round brackets, [like this](https://example.com/data/). +``` diff --git a/CONTRIBUTING/running_a_local_copy.md b/CONTRIBUTING/running_a_local_copy.md index 2cb9ceb25..4f8a42dfe 100644 --- a/CONTRIBUTING/running_a_local_copy.md +++ b/CONTRIBUTING/running_a_local_copy.md @@ -1,18 +1,28 @@ -# Running a local copy of the Portal +# Running a local copy of the portal -Because the Portal is built using [Hugo](https://gohugo.io/), a static site generator, it is quite easy to run a full version of the Portal on your computer and see how your changes look while doing that. +The portal is built using [Hugo](https://gohugo.io/), a static site generator. This makes it relatively easy to run a full version of the portal on your computer (i.e. locally). This means that you can see how the changes that you're making would look on the site. + +#### Clone a copy of the portal code + +All of the code behind the portal is stored in [this GitHub repository](https://github.com/ScilifelabDataCentre/pathogens-portal). There are multiple ways to clone a GitHub repository so that you have your own copy on your computer. Please view the information in [the GitHub documentation](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) in order to do this. #### Using Hugo -To view your changes as they will appear in the final website, you need to install Hugo. You can find instructions on the Hugo website: [https://gohugo.io/](https://gohugo.io/) +In order to run a local copy, you first need to install Hugo on your computer. Instructions for how to do this are available on the Hugo website: [https://gohugo.io/](https://gohugo.io/). -If you're using Mac OSX, it's recommended to use [Homebrew](https://brew.sh/) - if homebrew is already set up, installing Hugo is just a case of: +On Mac OSX, it is recommended to use [Homebrew](https://brew.sh/) to install Hugo. Once Hugo is set up, you can run the following command in a terminal window to install Hugo: ```bash brew install hugo ``` -Once Hugo is installed, navigate to the folder where you cloned this repository and simply run the following command in the repository root directory: +Once Hugo has finished installing, you can use it to view the site locally right away. To do this, first navigate to the folder that holds your copy of the portal code (i.e. the cloned copy of the GitHub repository). You can do this with the cd command in the terminal window e.g. + +```bash +cd FILE_PATH/TO/CLONED/REPOSITORY +``` + +Once you've navigated to the folder that holds your code, you can type 'hugo serve' in your terminal window. You will then see something like this: ```console $ hugo serve @@ -39,8 +49,8 @@ Press Ctrl+C to stop ``` Use the URL printed at the bottom of this message (here, it's `http://localhost:1313/`) to view the site. -Every time you save a file, the page will automatically refresh in the browser. +Every time you save a file, the page will automatically refresh in the browser, so you can see the effect of the changes in real time. #### Using Docker -If you would prefer not to use Hugo, you can use the provided Dockerfile to build and run a container. +If you would prefer not to use Hugo, you can use the provided Dockerfile to build and run a container instead. diff --git a/config.yaml b/config.yaml index 190181f3f..778f110cb 100644 --- a/config.yaml +++ b/config.yaml @@ -10,7 +10,7 @@ params: # There is a hugo i18n package, but it seems like overkill for so few strings lang_strings: en: - home_title: Welcome to the new Swedish Pathogens Portal + home_title: "Swedish Pathogens Portal: supporting pandemic preparedness" enquire_email_footer_dc: "Contact the Swedish Pathogens Portal" support_feedback: Support & Feedback privacy: Privacy Notice @@ -35,7 +35,7 @@ params: twitter_link: Twitter linkedin_link: LinkedIn sv: - home_title: Välkomna till den nya Svenska Patogens Portalen + home_title: "Patogens Portal Sverige: stöd för pandemiberedskap" enquire_email_footer_dc: "Kontakta den Svenska Patogens Portalen" support_feedback: Support och Feedback privacy: Integritetspolicy @@ -97,7 +97,7 @@ permalinks: related: includeNewer: false indices: - - name: tags - weight: 100 - threshold: 80 - toLower: false + - name: tags + weight: 100 + threshold: 80 + toLower: false diff --git a/content/english/citation.md b/content/english/citation.md index 48a741388..6f8669521 100644 --- a/content/english/citation.md +++ b/content/english/citation.md @@ -11,11 +11,11 @@ menu: toc: true --- -In line with the principles of _FAIR_ and _Open Science_, we encourage the reuse of material made available on the Swedish Pathogens Portal. On this page, you will find information about how to cite the portal when reusing/referencing the content. Please note that the information on the portal is updated continuously, therefore it is important to refer to specific versions (or to provide access dates) within citations. +In line with the principles of _FAIR_ and _Open Science_, we encourage the reuse and recognition of material made available on the Swedish Pathogens Portal. On this page, you will find information about how to cite the portal when reusing and referencing the content. Please note that the information on the portal is updated continuously, therefore it is important to refer to specific versions (or to provide access dates) within citations. ## Research community -In this section, you'll find instructions on how to cite the portal website, or underlying code, in reearch publications. +In this section, you'll find instructions on how to cite the portal website, or underlying code, in research publications. ### Citing website content @@ -23,31 +23,31 @@ In this section, you'll find instructions on how to cite the portal website, or The Resource Identification Portal was created in support of the Resource Identification Initiative. It aims to promote the identification, discovery, and reuse of research resources. Research Resource Identifiers (**RRIDs**) are persistent and unique identifiers for referencing a research resource. -The RRID for the Swedish Pathogens Portal is **SCR_024866**. +The RRID for the Swedish Pathogens Portal is [**SCR_024866**](https://scicrunch.org/resources/data/record/nlx_144509-1/SCR_024866/resolver?q=SCR_024866&l=SCR_024866&i=rrid:scr_024866). By citing the portal using the RRID, you will facilitate further reuse of the portal, enable us to track that activity, and allow others to easily find the _Summary Report_ for usage of the Swedish Pathogens Portal. ##### APA format -**In-text citation**: The data was made available on the Swedish Pathogens Portal (RRID: SCR_024866) (year) +For official guidance see the [SciCrunch page on RRID citations](https://scicrunch.org/resources/about/guidelines). -**Reference list** SciLifeLab Data Centre (2024). Swedish Pathogens Portal, version (version number) from , RRID:SCR_024866. +**In-text citation**: Swedish Pathogens Portal, SciLifeLab Data Centre, _version number_, RRID: SCR_024866. (Access date: date of access). -You will find the version of the Portal at the bottom of the footer on any page, or on our Github repository under 'releases'. +**Reference list**: Swedish Pathogens Portal (_access date_), SciLifeLab Data Centre, version (version number) from https://pathogens.se, RRID:SCR_024866. -If you are aiming to cite particular pages of the portal in particular (e.g. the Data Highlights), you may find that an author is mentioned and a date is given. In this case, you should include the appropriate date and author instead, but must still include the RRID. +You will find the version number of the portal at the bottom of the footer on any page, or on our Github repository under 'releases'. + +If you are aiming to cite particular pages of the portal in particular (e.g. the Data Highlights), you may find that an author is mentioned and a date is given. In this case, you should include the appropriate date and author instead, but must still include the RRID. There are also some pages were information for how to cite the data itself is provided (e.g. where a DOI is given on a dashboard page). In those cases, you should use that citation. ### Citing underlying code -From the start, the portal has been operated by the SciLifeLab Data Centre and partners. Many individuals from the wider community have also contributed to the code over time. All of the source code used on the website is available on GitHub. The code used to produce the website is available in our pathogens-portal repository, and all code used for visualisations are in our visualisations repository. All of the code that we have produced is available for reuse under an MIT licence. +The portal is operated by the SciLifeLab Data Centre and partners. Many individuals from the wider community have also contributed to the code over time. All of the source code used on the website is available on GitHub. The code used to produce the website is available in our pathogens-portal repository, and all code used for visualisations are in our visualisations repository. All of the code that we have produced is available for reuse under an MIT licence. ##### APA format -SciLifeLab Data Centre (year) pathogens-portal. version: (version number), DOI: (insert version DOI shown on the badge in the README.md file of our pathogens-portal repository). An example of the bedge is below: - -[![DOI](https://zenodo.org/badge/256458920.svg)](https://zenodo.org/doi/10.5281/zenodo.10629602) +SciLifeLab Data Centre (year) pathogens-portal. version: (version number)[Software]. Zenodo. . -SciLifeLab Data Centre (year) pathogens-portal-visualisations. version: (version number), +SciLifeLab Data Centre (year) pathogens-portal-visualisations. version: (version number)[Software]. Zenodo. _DOI to be confirmed_. ## Journalists diff --git a/content/english/data-management.md b/content/english/data-management.md index 5bc40742c..7edffd8a2 100644 --- a/content/english/data-management.md +++ b/content/english/data-management.md @@ -1,86 +1,78 @@ --- -title: Data management +title: Research data management toc: True aliases: - /support_services/data_management/ - /sv/support_services/data_management/ --- -"Research data management concerns the organization, storage, preservation, and sharing of data that is collected or analysed during a research project. Proper planning and management of research data will make project management easier and more efficient while projects are being performed. It also facilitates sharing and allows others to validate as well as reuse the data. Also, funding agencies are recognizing the importance of research data management and some now request Data Management Plans (DMP) as part of the grant application process." ([NBIS](https://www.nbis.se/infrastructure/data-management/dm-introduction.html)) +_Research data management (RDM)_ refers to how outputs from research projects (e.g. data and code) are handled, organised, and stored. Following best practices in RDM will allow you to better manage your outputs, and will help others to reuse them. In general, it is recommended that research outputs are as open and FAIR (i.e. **F**indable, **A**ccessible, **I**nteroperable, and **R**eusable) as possible. Read the [national guidelines for the promotion of open science in Sweden](https://www.kb.se/samverkan-och-utveckling/nytt-fran-kb/nyheter-samverkan-och-utveckling/2024-01-15-national-guidelines-for-promoting-open-science-in-sweden.html) and [Wilkinson _et al._ (2016)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/pdf/sdata201618.pdf) to learn more about open science and the FAIR principles, respectively. -Here we give an overview of available resources regarding research data management relevant to Swedish researchers working on COVID-19 and Pandemic Preparedness research. +On this page, you will find information about where to access direct, one-on-one support or general guidance for RDM relevant to infectious disease/pandemic preparedness data generated in Sweden. -## Data management resources +## Get direct support -### National Bioinformatics Infrastructure Sweden (NBIS) and SciLifeLab Data Centre +### Swedish Pathogens Portal -
-
- -
-
+Whilst the portal contains many general guidelines for RDM e.g. how to [share data](/share-data/), we also offer direct support for individual projects. To access this, please send an email to pathogens@scilifelab.se or send a messsage via the [contact form](/contact/). You can typically expect a response within one working day. - +### SciLifeLab Data Management Helpdesk -- [ENA submission guidelines](/support_services/tutorial_ena/tutorial_ena_intro/) - guidance developed to aid submissions to the European Nucleotide Archive (ENA) -- [SciLifeLab Data Stewardship Wizard, DSW](http://dsw.scilifelab.se/) - a customised tool for creating Data Management Plans (DMPs) with templates conforming to the requirements of the Swedish Research Council and other stakeholders -- [SciLifeLab Data Management Helpdesk](mailto:data-management@scilifelab.se) - data stewards from NBIS and SciLifeLab Data Centre are available to discuss and provide support regarding data management questions -- [SciLifeLab Research Data Management Guidelines](https://data-guidelines.scilifelab.se) - NBIS and SciLifeLab Data Centre have aggregated resources related to life science reseach data management in Sweden. Guidelines are available for each stage of the data lifecycle +The SciLifeLab Data Management Helpdesk is operated by experts in RDM from the SciLifeLab Data Centre and National Bioinformatics Infrastructure Sweden (NBIS). The purpose of the helpdesk is to provide direct, customised RDM support for those working in life science research in Sweden. To get in touch, send an email to data-management@scilifelab.se. Alternatively, you can go to the [SciLifeLab Research Data Management Guidelines](https://data-guidelines.scilifelab.se) and submit questions via the contact form. + +## Resources for infectious disease data + +### ENA submission guidelines + +The European Nucleotide Archive (ENA) is a repository for DNA and RNA sequences, including those related to pathogens. Using information from ENA, and their experience with submitting data to ENA, SciLifeLab Data Centre and NBIS have developed a tutorial for ENA submissions. NBIS can also help to broker submissions, should you need further support. ### Infectious Disease Toolkit (IDTk) -
-
- -
-
+The Infectious Diseases Toolkit (IDTk) is a community effort to detail best practices in RDM related to infectious disease data, and to showcase solutions developed to deal with the challenges faced during disease outbreaks. This resource was created as part of the Horizon 2020 BY-COVID project, in which the Swedish Pathogens Portal is a partner. The IDTk includes broadly applicable guidance for RDM for multiple different types of data. It also has information on resources specific for infectious disease data in Sweden. + +### Gobal Alliance for Genomics and Health + +The Global Alliance for Genomics and Health (GA4GH) is an international community dedicated to advancing human health through genomic data. This community develops technical standards, policy frameworks, and tools related to the responsible and secure use of genomic- and health-related data. The Swedish Pathogens Portal is part of the [infectious disease community of GA4GH](https://www.ga4gh.org/what-we-do/communities-of-interest/). + +## General RDM resources -The [Infectious Diseases Toolkit (IDTk)](https://www.infectious-diseases-toolkit.org/) is a community effort to expose best practices and showcase solutions to data challenges affecting the response to infectious diseases outbreaks. It was created as part of the [BY-COVID project](https://by-covid.org/). +### SciLifeLab Research Data Management Guidelines -This resource contains information specific for the management of infectious disease data. Specifically, it includes information about management at every part of the data lifecycle for data related to [pathogen characterisation](https://www.infectious-diseases-toolkit.org/pathogen-characterisation/), [socioeconomic data](https://www.infectious-diseases-toolkit.org/socioeconomic-data/), [human biomolecular data](https://www.infectious-diseases-toolkit.org/human-biomolecular-data/), and [human clinical and health data](https://www.infectious-diseases-toolkit.org/human-clinical-and-health-data/). +The SciLifeLab Research Data Management Guidelines is a knowledge hub for the management of life science research data in Sweden. NBIS and SciLifeLab Data Centre have collaborated to bring together information related to best practices for life science RDM in Sweden. Information is available for all stages of the research data lifecycle, from creating a Data Management Plan (DMP) prior to conducting the research to preserving the data after the study concludes. -Information is available specifically about the [resources related to infectious disease for Sweden](https://www.infectious-diseases-toolkit.org/national-resources/sweden). +### SciLifeLab Data Stewardship Wizard (DSW) -### ELIXIR Research Data Management Kit (RDMkit) +Data Management Plans (DMPs) are living documents that are first developed before you even begin your study. They help you to determine how to manage your data throughout your project and beyond. The SciLifeLab Data Stewardship Wizard (DSW) is a tool designed to help you to write a DMP that is compliant with the requirements of the Swedish Research Council and others. -
-
- -
-
+### SciLifeLab FAIR Storage -The [ELIXIR Research Data Management Kit (RDMkit)](https://rdmkit.elixir-europe.org/) is an online guide containing good data management practices applicable to research projects from the beginning to the end. Developed and managed by people who work every day with life science data, the RDMkit has guidelines, information, and pointers to help you with problems throughout the data’s life cycle. RDMkit supports FAIR data — Findable, Accessible, Interoperable and Reusable — by-design, from the first steps of data management planning to the final steps of depositing data in public archives. +SciLifeLab FAIR Storage offers storage resources to support data-driven life science. The service is intended to support data sharing in accordance with the FAIR principles and is available for any project that advances knowledge and discovery within the field of Swedish data-driven life science. The allocation of resources and application process are both managed by SciLifeLab Data Centre. -Information is organised by role (e.g. researcher, data steward), scientific domain (e.g. proteomics, human data, bioimaging data), tasks (e.g. data analysis, data management plan, licensing). +### Research Data Management Kit (RDMkit) -Information is specifically available for [research data management in Sweden](https://rdmkit.elixir-europe.org/se_resources). +The Research Data Management Kit (RDMkit) is similar to IDTk (see above), except that it contains information on RDM related to life science data in general, rather than just infectious disease data. It contains information relevant for all stages of the research data lifecycle, from planning to preservation. ### Swedish National Data Service (SND) -
-
- -
-
+The Swedish National Data Service (SND) has put together guidance for RDM that is applicable to all scientific fields. They also offer relevant tools, events, and training. -The Swedish National Data Service has put together guidelines on research data management in general, applicable to all fields of science. +See the following pages for information relevant for: -- [Plan](https://snd.gu.se/en/manage-data/plan) - Data Management Plan, Funding Application, Ethical Review, Agreements with Other Parties, Research Material with Personal Data, Protect the Data -- [Organize](https://snd.gu.se/en/manage-data/organise) - Folder Structure, File Format -- [Document](https://snd.gu.se/en/manage-data/document) -- [Work with Data](https://snd.gu.se/en/manage-data/work-with-data) - Data Loss, Data Errors, Well-Organised Data, Access, Shared Work Files, Software -- [Prepare and Share](https://snd.gu.se/en/manage-data/prepare-and-share) - Reuse, Documentation for Reuse, The FAIR Data Principles, PID (persistent identifiers), Licenses, Embargos, and Restrictions, Publication and Open Access -- [Guides](https://snd.gu.se/en/manage-data/guides) - Checklist for Data Management Plan, Choosing a File Format +- Planning studies e.g. DMPs, funding applications, ethical reviews, agreements with other parties, and handling personal data. +- Organising data e.g. folder structures and file formats. +- Documenting data. +- Working with data e.g. dealing with errors, loss, and accessing data. +- Preparing and sharing data e.g. reusing data, the FAIR principles, PID (persistent identifiers), licences, embargos, publication, and open access. +- Guides for RDM. ### Other useful resources -- [Research Data Management 1 Day Workshop](https://zenodo.org/record/4562630#.YnjAIPNBzlw) (Sara El-Gebali, DOI: 10.5281/zenodo.4562630) - Materials for 1 full-day workshop on Research Data Management basics covering topics including FAIR and Open data, and electronic lab books. -- [SciLifeLab tutorial videos on Data Management](https://www.youtube.com/playlist?list=PL1nnHOyxN_WdqnzLqbmWJz_i0f2anT9cS) - on what is Data Management, Data Management Plan, introduction to Data Stewardship Wizard. +- Tutorial videos for multiple RDM topics are available on the SciLifeLab Data Management YouTube channel. -## Hands-on data management support +- Materials from a day-long workshop on the basics of RDM (Sara El-Gebali, DOI: 10.5281/zenodo.4562630). It covers topics including FAIR and open data, as well as electronic lab books. -All researchers affiliated with a university or research institute in Sweden working on research topics relevant to pandemic preparedness can receive free individual consultations and hands-on help within reasonable bounds from the _Swedish Pathogens Portal_ team. Simply send an email to [data-management@scilifelab.se](mailto:data-management@scilifelab.se) or [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se). Your question will be assigned to a data steward with relevant expertise who can either help you directly or point you to the correct tool or service. +- SciLifeLab Data Centre and NBIS host events to provide support with RDM. Previous presentations from these sessions are available on the SciLifeLab Data Repository. -You are welcome to send both general questions about best approaches to research data management, data management plans (DMPs), reproducibility, FAIR, and open science as well as specific questions about your research projects such as which repository to choose to deposit data, what the suitable metadata standards would be, which file formats to use, etc. In some cases the data stewards can act as brokers and submit data to repositories on your behalf. +- RDM-related events are listed on the [events page](https://data.scilifelab.se/events/) of the SciLifeLab Data Platform. -Send an email with your data management question +- Services, tools, and other resources related to RDM are listed on the [services page](https://data.scilifelab.se/services/) of the SciLifeLab Data Platform. diff --git a/content/english/funding/_index.md b/content/english/funding/_index.md index 3b7172e2b..a74e7c735 100644 --- a/content/english/funding/_index.md +++ b/content/english/funding/_index.md @@ -4,7 +4,6 @@ menu: research_menu: identifier: funding name: Funding opportunities - weight: 20 aliases: - /projects/funding/ - /sv/projects/funding/ diff --git a/content/english/publications/_index.md b/content/english/publications/_index.md index 50d28b02a..977c79215 100644 --- a/content/english/publications/_index.md +++ b/content/english/publications/_index.md @@ -4,7 +4,6 @@ menu: research_menu: identifier: publications name: Swedish COVID-19 Publications - weight: 40 --- This section presents a list of published scientific journal articles and preprints on COVID-19 and SARS-CoV-2 where at least one author has an affiliation with a Swedish research institute. Note that this database is primarily a manually curated database and thus it may not be exhaustive. Note that from May 2023, we began to use the Europe PMC REST API to idenfy publications. The scripts that we use to do this are [openly available on GitHub](https://github.com/ScilifelabDataCentre/pathogens-portal-scripts/tree/main/All_publications) and can be reused with other pathogens. diff --git a/content/english/register-based-research.md b/content/english/register-based-research.md index e1b8dba83..8c5b135c1 100644 --- a/content/english/register-based-research.md +++ b/content/english/register-based-research.md @@ -5,7 +5,6 @@ menu: research_menu: name: Register-based research identifier: register-based - weight: 15 aliases: - /data_types/health_data/register_based_research/ - /sv/data_types/health_data/register_based_research/ diff --git a/content/english/research_projects/_index.md b/content/english/research_projects/_index.md index 057f1a779..4e1a7372c 100644 --- a/content/english/research_projects/_index.md +++ b/content/english/research_projects/_index.md @@ -5,7 +5,6 @@ menu: research_menu: name: Ongoing research projects identifier: ongoing_projects - weight: 10 aliases: - /projects/ - /sv/projects/ diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_contact.md b/content/english/support_services/tutorial_ena/tutorial_ena_contact.md index 3a7d93e57..606b273c0 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_contact.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_contact.md @@ -12,7 +12,7 @@ menu: ### About the use of this tutorial/Brokering to ENA -If you have any questions/comments regarding the tutorial itself, need general advice with submission, or would like the Swedish COVID Data Portal team to broker a submission, please contact us [using this form](https://www.covid19dataportal.se/contact/) or send an email to us at [datacentre@scilifelab.se](mailto:datacentre@scilifelab.se). +If you have any questions/comments regarding the tutorial itself, need general advice with submission, or would like the Swedish Pathogens Portal team to broker a submission, please contact us [using this form](https://www.pathogens.se/contact/) or send an email to us at [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se). ### Whilst making a submission diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_faqs.md b/content/english/support_services/tutorial_ena/tutorial_ena_faqs.md index c979010d5..c99fb61a9 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_faqs.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_faqs.md @@ -1,11 +1,11 @@ --- title: Tutorial for SARS-CoV-2 genome data submission to ENA -toc : false # in case of the ena tutorial pages the table of contents is inserted inside the template, ena_tutorial +toc: false # in case of the ena tutorial pages the table of contents is inserted inside the template, ena_tutorial type: ena_tutorial menu: - ena_tutorial: - name: FAQs - weight: 80 + ena_tutorial: + name: FAQs + weight: 80 --- ## Frequently Asked Questions (FAQs) @@ -20,7 +20,7 @@ By making sequences openly available, and adhering to the FAIR principles (see b The FAIR principles were established in 2016. They were established to increase the **F**indability, **A**ccessibility, **I**nteroperability, and **R**eusability of data. -By submitting data that is FAIR, submitters facilitate the reuse of their data. This is not the same as making data 'open', which refers instead to making openly accessible. +By submitting data that is FAIR, submitters facilitate the reuse of their data. This is not the same as making data 'open', which refers instead to making data openly accessible. For more information on the FAIR principles, please see the [go-fair website](https://www.go-fair.org/fair-principles/). @@ -28,22 +28,20 @@ For more information on the FAIR principles, please see the [go-fair website](ht There are two main international databases in which COVID-19 sequences have been made openly available en masse; the Global Initiative on Sharing Avian Influenza Data ([GISAID](https://www.gisaid.org)), and the European Nucleotide Archive ([ENA](https://www.ebi.ac.uk/ena/browser/home)). -So, which should you use? We actually recommend that you submit sequences to both databases where possible (see [information in the Introduction tab](/support_services/tutorial_ena/tutorial_ena_intro) for details), as they each offer relative advantages for research compared to the other. In some cases though, this may not be possible. For example, GISAID only accepts assemblies reflecting a consensus sequence, whereas ENA accepts both 'raw sequences' and assemblies. Thus, in the case of 'raw' sequence data, please submit to ENA. - -Work is ongoing to streamline the process for submitting sequences to both databases. Ultimately, we hope to make it as easy to submit to both databases as it is to submit to just one. +So, which should you use? We actually recommend that you submit sequences to both databases where possible, as they each offer relative advantages for research compared to the other. GISAID contains more SARS-CoV-2 data from all around the world, compared to ENA. However, while GISAID only accepts the consensus sequences of assembled genomes, ENA accepts both consensus sequences and 'raw' sequence data. Further, although the data in GISAID is considered open, access is restricted to individuals with verified accounts, whilst there are no restrictions on who can access the data in ENA. This means that using data from ENA simplifies sharing the data (e.g. between members of your group) and access to the data is less likely to become compromised during a project. ### Who owns/runs ENA? -ENA is maintained by [EMBL-EBI](https://www.ebi.ac.uk/about), and is a core data resource of ELIXIR (the European life-sciences Infrastructure for biological Information). See [here](https://elixir-europe.org/platforms/data/core-data-resources) for more information about what this means. +ENA is maintained by [EMBL-EBI](https://www.ebi.ac.uk/about), and is a [core data resource](https://elixir-europe.org/platforms/data/core-data-resources) of [ELIXIR](https://elixir-europe.org/) (the European life-sciences Infrastructure for biological Information). ENA is part of the [INSDC](https://www.insdc.org/) (International Nucleotide Sequence Database Collaboration), and also indexes data from [NCBI](https://www.ncbi.nlm.nih.gov/) (National Centre for the Biotechnology Information) and [DDBJ](https://www.ddbj.nig.ac.jp/) (DNA Data Bank of Japan). ### Is submitting to ENA secure? -Whilst it is considered openly available, access to data submitted to GISAID is restricted to those with verified accounts. Access to data submitted to ENA is not subject to similar restrictions. Some submitters are therefore concerened that submissions to ENA are somehow less secure. This is not the case though. To access data in GISAID, users must agree to their [terms of use](https://www.gisaid.org/registration/terms-of-use/). This could essentially be considered a licence for use, similar to that used for other types of data (e.g. an MIT licence). ENA can therefore be considered to have a 'more open' licence, which involves fewer restrictions. In theory, the same users can access data in both databases, the difference is that GISAID data cannot be shared as freely as ENA data. In addition, data in GISAID could also be submitted to ENA. +Whilst it is considered openly available, access to data submitted to GISAID is restricted to those with verified accounts. Access to data submitted to ENA is not subject to similar restrictions. Some submitters are therefore concerened that submissions to ENA are somehow less secure. This is not the case though. To access data in GISAID, users must agree to their [terms of use](https://www.gisaid.org/registration/terms-of-use/). This could essentially be considered a licence for use, similar to that used for other types of data (e.g. an MIT licence). ENA can therefore be considered to have a 'more open' licence, which involves fewer restrictions. In theory, the same users can access data in both databases, the difference is that GISAID data cannot be shared as freely as ENA data. ### Can I get help submitting my data to ENA? Absolutely, please refer to the [Get Help tab](/support_services/tutorial_ena/tutorial_ena_contact) to find where you can get support for your issue. -### Can I make the sequence data submitted to ENA visible on the Swedish COVID-19 Data Portal? +### Can I make the sequence data submitted to ENA visible on the Swedish Pathogens Portal? -Yes, the Swedish COVID-19 Data Portal is happy to display information about sequences deposited by researchers affiliated to a Swedish research institution. If you would be interested in this, please get in touch with the team by e-mailing [datacentre@scilifelab.se](mailto:datacentre@scilifelab.se) after you submit your sequences. +Yes, the Swedish Pathogens Portal is happy to display information about sequences deposited by researchers affiliated to a Swedish research institution. If you would be interested in this, please get in touch with the team by e-mailing [pathogens@scilifelab.se](mailto:pathogens@scilifelab.se) after you have submitted your sequences. diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_intro.md b/content/english/support_services/tutorial_ena/tutorial_ena_intro.md index 70a55009f..a066cd822 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_intro.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_intro.md @@ -11,19 +11,25 @@ menu: type: ena_tutorial --- + +
+ ENA logo +
+ ## About this tutorial -The research community has put considerable effort into research on the SARS-CoV-2 virus and COVID-19. Fast and open access to different data types (societal, molecular, epidemiological, among others) has been key to the swift development and deployment of, for example, preventative measures, tests, vaccines, and treatments for COVID-19. The pandemic has thus further highlighted how important making data open and [FAIR (Findable, Accessible, Interoperable, Reusable)](https://www.go-fair.org/fair-principles/) is in facilitating research efforts. +Fast and open access to different data types (societal, molecular, epidemiological, among others) was key to the swift development and deployment of, for example, preventative measures, tests, vaccines, and treatments for COVID-19. The pandemic has thus further highlighted how important making data open and [FAIR (Findable, Accessible, Interoperable, Reusable)](https://www.go-fair.org/fair-principles/) is in facilitating research efforts. Many SARS-CoV-2 genome sequences have been made openly available in international databases, such as the Global Initiative on Sharing Avian Influenza Data ([GISAID](https://www.gisaid.org)), and the European Nucleotide Archive ([ENA](https://www.ebi.ac.uk/ena/browser/home)). + +The aim of this tutorial is to assist researchers in submitting SARS-CoV-2 sequence data to ENA. This should ultimately lead to an increased availability of open data, including ‘raw’ sequence data, thus faciliting greater reproducibility as well as providing more opportunity for reusing the data to address new scientific questions. -Thanks to efforts globally, many SARS-CoV-2 genome sequences have been made openly available in international databases, such as the Global Initiative on Sharing Avian Influenza Data ([GISAID](https://www.gisaid.org)), and the European Nucleotide Archive ([ENA](https://www.ebi.ac.uk/ena/browser/home)). The ENA is part of the International Nucleotide Sequence Database Collaboration ([INSDC](https://www.insdc.org)), and also indexes data from the National Centre for the Biotechnology Information ([NCBI](https://www.ncbi.nlm.nih.gov/)) and [DDBJ](https://www.ddbj.nig.ac.jp). -Both GISAID and ENA constitute valuable resources, each with distinct relative advantages for those performing research. For example, as of February 2022, GISAID contains more SARS-CoV-2 data from all around the world. Specifically, GISAID contained almost 8 million SARS-CoV-2 sequences, whereas ENA contained around 800,000 sequences. The data in GISAID thus enables more reliable insights to be made into the situation globally. However, GISAID only accepts the consensus sequences of assembled genomes, whilst ENA accepts both consensus sequences and 'raw' sequence data. Further, although the data in GISAID is considered open, access is restricted to individuals with verified accounts, whilst there are no restrictions on who can access the data in ENA. This means that using data from ENA simplifies sharing the data (e.g. between members of your group) and access to the data is less likely to become compromised during a project. +## Overview -The aim of this tutorial is to assist researchers in submitting SARS-CoV-2 sequence data to ENA. This should ultimately lead to an increased availability of open data, including ‘raw’ sequence data. This would not only facilitate greater reproducibility, but also provide more opportunity for reusing the data to address new scientific questions. +This tutorial is separated into tabs to aid users in moving through the tutorial. If you are unfamiliar with ENA, we recommend reading the [Terminology and Metadata tab](/support_services/tutorial_ena/tutorial_ena_terminology) before commencing with the tutorial. -
- -
+Multiple routes of submission are possible with ENA. We describe two complete routes that can be used for submission. Some preparatory steps are common to both routes. These steps are described in the [Preparations for Submissions tab](/support_services/tutorial_ena/tutorial_ena_subprep). We explain how to determine which of the routes is most likely to work best for you in the [Select Submission Route tab](/support_services/tutorial_ena/tutorial_ena_selectsub). The [Submission Route 1](/support_services/tutorial_ena/tutorial_ena_subroute1) and [Submission Route 2](/support_services/tutorial_ena/tutorial_ena_subroute2) tabs explain different routes to completing submissions to ENA. + +Information about where to get further guidance is given in the [Get Help tab](/support_services/tutorial_ena/tutorial_ena_contact). For answers to frequently asked questions (FAQs) regarding submissions, please see the [FAQs tab](/support_services/tutorial_ena/tutorial_ena_faqs). ## Learning outcomes @@ -41,14 +47,6 @@ By the end of this tutorial you will: No specific knowledge is needed before starting this tutorial. -## Overview - -This tutorial is separated into tabs to aid users in moving through the tutorial. If you are unfamiliar with ENA, we recommend reading the [Terminology and Metadata tab](/support_services/tutorial_ena/tutorial_ena_terminology) before commencing with the tutorial. - -Multiple routes of submission are possible with ENA. We describe two complete routes that can be used for submission. Some preparatory steps are common to both routes. These steps are described in the [Preparations for Submissions tab](/support_services/tutorial_ena/tutorial_ena_subprep). We explain how to determine which of the routes is most likely to work best for you in the [Select Submission Route tab](/support_services/tutorial_ena/tutorial_ena_selectsub). The [Submission Route 1](/support_services/tutorial_ena/tutorial_ena_subroute1) and [Submission Route 2](/support_services/tutorial_ena/tutorial_ena_subroute2) tabs explain different routes to completing submissions to ENA. - -Information about where to get further guidance is given in the [Get Help tab](/support_services/tutorial_ena/tutorial_ena_contact). For answers to frequently asked questions (FAQs) regarding submissions, please see the [FAQs tab](/support_services/tutorial_ena/tutorial_ena_faqs). - ## References used for this tutorial Multiple sources of information were used to build this tutorial. Links to the reference material are listed below: diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_selectsub.md b/content/english/support_services/tutorial_ena/tutorial_ena_selectsub.md index b62ab724a..7a9e5c172 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_selectsub.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_selectsub.md @@ -1,16 +1,16 @@ --- title: Tutorial for SARS-CoV-2 genome data submission to ENA -toc : false # in case of the ena tutorial pages the table of contents is inserted inside the template, ena_tutorial +toc: false # in case of the ena tutorial pages the table of contents is inserted inside the template, ena_tutorial type: ena_tutorial menu: - ena_tutorial: - name: Select Submission Route - weight: 40 + ena_tutorial: + name: Select Submission Route + weight: 40 --- ## Introduction to submission routes and methods -There are multiple ways to submit data and sequences into ENA. In order to do a submission of of 'raw' sequences and/or assemblies, you need to use a combination of interactive interfaces in ENA, command-line based software, and APIs for scripts/custom software. The prospect of using multiple tools and determining which tools are better to use for a given part of a submission can seem daunting, especially if you are unfamiliar with the tools. In this tutorial, we have devised two submission 'routes' that enable you to do a complete submission, and explain how to to use the tools. Which route will work best for you will depend on your comfort with using the different tools provided by ENA and the size of your submission (i.e. the number of sequences included). +There are multiple ways to submit data and sequences into ENA. In order to do a submission of of 'raw' sequences and/or assemblies, you need to use a combination of interactive interfaces in ENA, command-line based software, and APIs for scripts/custom software. The prospect of using multiple tools and determining which tools are better to use for a given part of a submission can seem daunting, especially if you are unfamiliar with the tools. In this tutorial, we have devised two submission 'routes' that enable you to do a complete submission, and explained how to use the tools. Which route will work best for you will depend on your comfort with using the different tools provided by ENA and the size of your submission (i.e. the number of sequences included). On this page, we first briefly summarise how ENA refers to [different submission methods](/support_services/tutorial_ena/tutorial_ena_selectsub/#submission-methods-described-by-ena). Understanding the descriptions of methods used by ENA will be useful when referencing any resources that they have produced. @@ -18,13 +18,13 @@ Then, we describe the [two routes of submission](/support_services/tutorial_ena/ ## Submission methods described by ENA -ENA described three methods of submission, none of which can be used in isolation to complete all parts of a submission: +ENA describes three methods of submission, none of which can be used in isolation to complete all parts of a submission: -* **Interactive Submission Method** - involves filling out web forms directly in the browser and downloading template spreadsheets that can be completed off-line and uploaded later to ENA. This is the easiest method to use when getting started, but quickly becomes time-consuming with bulk submission (> 50 records). +- **Interactive Submission Method** - involves filling out web forms directly in the browser and downloading template spreadsheets that can be completed off-line and uploaded later to ENA. This is the easiest method to use when getting started, but quickly becomes time-consuming with bulk submission (> 50 records). -* **Command-Line Submission Method** - uses ENA’s **Webin-CLI program**. Submissions require the preparation of text (manifest) files that are validated before submissions are completed. +- **Command-Line Submission Method** - uses ENA’s **Webin-CLI program**. Submissions require the preparation of text (manifest) files that are validated before submissions are completed. -* **Programmatic Submission Method** - requires the preparation of XML documents that are sent to ENA using cURL or the Webin Portal of ENA. +- **Programmatic Submission Method** - requires the preparation of XML documents that are sent to ENA using cURL or the Webin Portal of ENA. ## Submission routes devised for this tutorial @@ -32,10 +32,10 @@ As mentioned above, a combination of these methods is needed to complete a submi Choose [**Route 1**](/support_services/tutorial_ena/tutorial_ena_subroute1) if: -* You have little to no knowledge of using command line tools. -* You are submitting a small number of sequences (typically one to ten, but could be used for more). +- You have little to no knowledge of using command line tools. +- You are submitting a small number of sequences (typically one to ten, but could be used for more). Choose [**Route 2**](/support_services/tutorial_ena/tutorial_ena_subroute2) if: -* You have advanced knowledge of using command line tools. -* You are doing a bulk submission of sequences (typically more than 50). +- You have advanced knowledge of using command line tools. +- You are doing a bulk submission of sequences (typically more than 50). diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_subprep.md b/content/english/support_services/tutorial_ena/tutorial_ena_subprep.md index cbb768631..f4550d8f9 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_subprep.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_subprep.md @@ -23,7 +23,7 @@ In order to do a submission, you need an account in ENA. To create an account, p
- + ENA login

@@ -33,7 +33,7 @@ In order to do a submission, you need an account in ENA. To create an account, p
- + ENA login details

@@ -42,7 +42,7 @@ Once you have filled in all of the required fields, you can log in to the submis ## Obtaining example data -You are welcome to follow this tutorial using your own data. However, you can instead use example data, which you can download [here](/ENA_tutorial_data/example_data.zip). Using the example can be helpful for familiarising yourself with how to structure your data for submission and the steps required to complete a submission. +You are welcome to follow this tutorial using your own data. However, you can instead use example data, which you can download as a [single zip file](/ENA_tutorial_data/example_data.zip). Using the example can be helpful for familiarising yourself with how to structure your data for submission and the steps required to complete a submission. The example data was originally produced by [ENA](https://www.ebi.ac.uk/ena/browser/home), but we have restructured it for use with the two submission routes described in this tutorial. With the example data, you can submit 3 different samples to an example project, together with raw read data and sequences associated with each sample. @@ -54,13 +54,11 @@ Whether you use your own data or the example data, we recommend that you write t To ensure that sample data is registered with at least a minimum amount of metadata, ENA provides “Sample Checklists” which are used during registration to tailor the sample descriptions to fit minimum standards. The most appropriate checklist for SARS-CoV-2 viral submissions is the *[ENA virus pathogen reporting standard checklist (ERC000033)](https://www.ebi.ac.uk/ena/browser/view/ERC000033)*. This includes 9 mandatory, 15 recommended, and 11 optional fields (along with additional user-defined fields that can be used). Please note that some fields are free text, while others have controlled vocabulary. -In order to ensure that your SARS-CoV-2 data is properly described, we recommend downloading and filling in [this metadata template](/ENA_tutorial_data/metadata_template_ERC000033.xlsx). This template contains not only the ENA checklist for the data, but also describes all levels of metadata required for a submission. The template is divided into five sheets (each related to a type of metadata object), namely; **study**, **sample**, **experiment**, **run** and **analysis**. The experiment and run sheets are used for describing the raw reads, and the analysis sheet is used for describing the sequence assembly. It is good practice to fill in all relevant sheets in the template, as having all the metadata collected in one place eases the submission process. +In order to ensure that your SARS-CoV-2 data is properly described, we recommend downloading and filling in [this metadata template](/ENA_tutorial_data/metadata_template_ERC000033.xlsx) (Excel). This template contains not only the ENA checklist for the data, but also describes all levels of metadata required for a submission. The template is divided into five sheets (each related to a type of metadata object), namely; **study**, **sample**, **experiment**, **run** and **assemblies**. The experiment and run sheets are used for describing the raw reads, and the assemblies sheet is used for describing the sequence assembly. It is good practice to fill in all relevant sheets in the template, as having all the metadata collected in one place eases the submission process. - - In the sample sheet, the first row is the **ENA virus sample checklist field**, the second row is the **ENA definition** (provides a description of the field), and the third row is **ENA requirement status** (whether the field is mandatory, recommended or optional). When you populate the sheet with your metadata, you can delete the second and third row. The default values for SARS-CoV2 submissions are pre-filled (in red) for relevant fields. Some fields have controlled vocabulary, which are available in the template as drop-down lists (the lists become visible when you click on a cell). -For your convenience, we also provide [this pre-filled version of the metadata template](/ENA_tutorial_data/metadata_template_ERC000033_filled.xlsx), so that you can see how the template should be populated for use with the example data. This can also help with understanding how to fill in such a template for your own data. +For your convenience, we also provide [this pre-filled version of the metadata template](/ENA_tutorial_data/metadata_template_ERC000033_filled.xlsx) (Excel), so that you can see how the template should be populated for use with the example data. This can also help with understanding how to fill in such a template for your own data. **Note**: It is *strongly* recommended that you provide as much information as possible in the metadata sheets. This will increase the [FAIRness](https://www.go-fair.org/fair-principles/) of your data, and thus the probability that it will be useful in future research efforts. diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_subroute1.md b/content/english/support_services/tutorial_ena/tutorial_ena_subroute1.md index 403dbf155..4173b7131 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_subroute1.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_subroute1.md @@ -10,13 +10,13 @@ menu: ## When to use this route -Route 1 is recommended for users with little to no experience using the command line, and for small scale (usually 1-10, but could be more) or infrequent submissions. It makes use of a combination of a web submission interface (Webin submission portal) and a command-line tool (Webin-CLI). +Route 1 is recommended for users with little to no experience using the command line, and for small scale (usually 1-10 samples, but could be more) or infrequent submissions. It makes use of a combination of a web submission interface (Webin submission portal) and a command-line tool (Webin-CLI). ## Data required for route 1 -All of the data required to complete this submission can be downloaded together in a single zip file by clicking [here](/ENA_tutorial_data/example_data.zip). In this part of the tutorial, we will make use of the information in the '01-route' and 'data' subfolders. +All of the data required to complete this submission can be downloaded together in a [single zip file](/ENA_tutorial_data/example_data.zip). In this part of the tutorial, we will make use of the information in the '01-route' and 'data' subfolders. -**Note:** If you use your own data, you can fill in the metadata template, for instructions on how to do this, please see [this section in the preparation for submissions tab](/support_services/tutorial_ena/tutorial_ena_subprep/#preparing-the-metadata). +**Note:** If you use your own data, you can fill in the metadata template, for instructions on how to do this, please see the [Preparation for Submissions](/support_services/tutorial_ena/tutorial_ena_subprep/#preparing-the-metadata) tab. ## Doing a test submission vs a 'real' submission @@ -30,30 +30,31 @@ The Webin command line submission interface (Webin-CLI) is used to validate, upl * Download [Zulu Open JDK for the Java Runtime Environment (JRE)](https://www.azul.com/downloads/?package=jdk). -* Download the Webin-CLI Java jar file from [this GitHub repository](https://github.com/enasequence/webin-cli/releases). Put it in an easily accessible folder, e.g. Downloads. For convenience, create a path variable named DOWNLOADS in a terminal window using the below code: +* Download the Webin-CLI Java jar file from [ENA GitHub repository](https://github.com/enasequence/webin-cli/releases). Put it in an easily accessible folder, e.g. Downloads. For convenience, create a path variable named DOWNLOADS in a terminal window using the below code: > export DOWNLOADS="/path/to/Downloads/" ## The Webin submission portal -The Webin submission portal will be required to complete a Route 1 submission. You will need an account to access the portal. You can register for an account [here](https://www.ebi.ac.uk/ena/submit/webin/accountInfo). After logging in, you will see the landing page (shown below) that includes multiple options for completing your submission. In this section, you can get more information on what each of the options do. The options that you will need to complete a submission using Route 1 are explained in the subsequent sections of this page. +The Webin submission portal will be required to complete a Route 1 submission. You will need an account to access the portal, for instructions on how to do this, please see the [Preparation for Submissions](/support_services/tutorial_ena/tutorial_ena_subprep.md#obtaining-an-ena-webin-account) tab. After logging in, you will see the landing page (shown below) that includes multiple options for completing your submission. In this section, you can get more information on what each of the options do. The options that you will need to complete a submission using Route 1 are explained in the subsequent sections of this page.
- + ENA landing page

- -
-
-
+ +
- - + + * **Header Bar (Label 1 in the above figure)**: Use the **Support** option to request assistance from the ENA helpdesk. Click **Manage Account** to e.g. change your contact information or centre name. * **Studies (Projects) (Label 2 in the above figure)**: All related options are coloured yellow. The **Register Study** option leads to an interface where you can register new studies. The **Submit XMLs** interface enables studies to be submitted in XML format. The **Studies Report** option allows you to review and edit previously submitted studies, and to change their release date. @@ -64,9 +65,8 @@ The Webin submission portal will be required to complete a Route 1 submission. Y * **Data Analyses (Label 5 in the above figure)**: All related options are coloured blue. The **Create annotated sequence spreadsheet** option allows you to select, download, and customise a template that can be used to make a submission via Webin-CLI. The **Submit XMLs** interface enables analyses to be submitted in XML format. The **Analyses Report** option enables you to review and edit previously submitted analyses. The **Analysis File Report** option enables you to review the status of files associated with previously submitted analyses. The **Analysis Processing Report** option enables you to review the processing status of files associated with previously submitted analyses. -You can gain more information about how to use the options to complete the functions outlined above by clicking on them in the landing page. - - +You can gain more information about how to use the options to complete the functions outlined above by clicking on them in the landing page at ENA submission portal. +
@@ -79,23 +79,23 @@ First, log in to the [Webin submission portal test service](https://wwwdev.ebi.a
- + ENA Dashboard

- + ENA Studies section

-Second, enter the details of the project, such as title and description. Asterisks (*) denote mandatory fields. The 'Release date' is the date that the record should become publicly available. This can be updated later, so if you are unsure on a precise date, you can provide an estimated date. If you do this though, please remember to update it accordingly. +Second, enter the details of the project, such as title and description. Asterisks (*) denote mandatory fields. The 'Release date' is the date that the record should become publicly available, provide an estimated date. The release date can be set to maximum of two years in the future, and is easy to update in order to either extend or shorten. When the release date is approaching (i.e. two weeks left), you will get an email from ENA with a heads up notification.
- + ENA register study

@@ -106,23 +106,23 @@ Please see [this video by ENA](https://youtu.be/3nArbshyzIk) for further guidanc ### Prepare information for samples -Before registering the samples, you must prepare the spreadsheets for submission. For instructions on how to do this, please see [this section on preparing for submissions](/support_services/tutorial_ena/tutorial_ena_subprep). +Before registering the samples, you must prepare the spreadsheets for submission. For instructions on how to do this, please see the [Preparation for Submissions](/support_services/tutorial_ena/tutorial_ena_subprep#preparing-the-metadata) tab. -The information that you need to provide will differ dependent on e.g. the type of sampling done. Some fields are mandatory, whilst others are recommended or optional (summarised in the below image). The ENA provides a virus pathogen metadata checklist ([ERC000033](https://www.ebi.ac.uk/ena/browser/view/ERC000033)) that should be used as guidance for the submission of samples for SARS-CoV2. +The information that you need to provide will differ depending on e.g. the type of sampling done. Some fields are mandatory, whilst others are recommended or optional (summarised in the below image). The ENA provides a virus pathogen metadata checklist ([ERC000033](https://www.ebi.ac.uk/ena/browser/view/ERC000033)) that should be used for submission of SARS-CoV2 samples.
- -
+ Checklist example fields +
+ The above image is adapted from one produced by Sam Holt for the ENA Facility Day 2020. -*The above image is adapted from one produced by Sam Holt for the ENA Facility Day 2020.* - -

+
- Some metadata fields are mandatory. However, in some cases, data for these fields is unavailable for some reason. This will not prevent submission, but any such missing data must be reported appropriately in ENA. In order to do this, users should fill the respective fields using the appropriate INSDC-approved term, which are used to indicate not only that data is missing, but also why. For more information about reporting missing metadata values and the INSDC-approved terms that can be used, see here. + Some metadata fields are mandatory. However, in some cases, data for these fields is unavailable for some reason. This will not prevent submission, but any such missing data must be reported appropriately in ENA. In order to do this, users should fill the respective fields using the appropriate INSDC-approved term, which are used to indicate not only that data is missing, but also why. For more information about reporting missing metadata values and the INSDC-approved terms that can be used, see ENA on Reporting Missing Values.
@@ -135,19 +135,19 @@ Both of the above options lead to the same place, which gives two options: (1) D
- + ENA Register samples

-Select the latter and upload the filled sample template that you made when [preparing your submission](/support_services/tutorial_ena/tutorial_ena_subprep). Click on **Submit Completed Spreadsheet**, verify that the submission was successful in the pop-up Submission window, and then click **Close**. +Select the latter and upload the filled sample template that you made when [preparing your submission](/support_services/tutorial_ena/tutorial_ena_subprep#preparing-the-metadata). Click on **Submit Completed Spreadsheet**, verify that the submission was successful in the pop-up Submission window, and then click **Close**. **Note:** Example data for three samples is provided in the 'data/samples/' folder of the example data provided with this tutorial in both .xlsx and .tsv formats. 'sample_spreadsheet.xlsx' is annotated such that different features are colour coded, and important features are highlighted. 'sample_spreadsheet.tsv' contains the same data in a tab-separated format, which is the format accepted for submission. Each row of these datasets represents a sample, while each column represents a metadata field.
- + Register sample success

@@ -156,9 +156,9 @@ Select the latter and upload the filled sample template that you made when [prep #### Explore the example data -In this section, we will use the materials in the *runs* subfolder of the route-01 folder of the example data that you downloaded earlier. +In this section, we will use the materials in the *runs* subfolder of the `route-01` folder of the example data that you downloaded earlier. -To view the contents of this folder, you can open it using a file explorer or using a command line interface. To do the latter, open a command prompt window (Terminal) on your computer and navigate to the *runs* subfolder. You can do this by typing `cd` in the terminal followed by the filepath (depicted as '$WORKSHOP/01-route/runs/' in the example below). Then, on a new line (that you can create by pressing 'enter' on your keyboard), type `ls`. The content of the subfolder will then be printed. +To view the contents of this folder, you can open it using a file explorer or using a command line interface. To do the latter, open a command prompt window (Terminal) on your computer and navigate to the *runs* subfolder. You can do this by typing `cd` in the terminal followed by the filepath (depicted as '$WORKSHOP/01-route/runs/' in the example below). Then, on a new line (that you can create by pressing 'enter' on your keyboard), type `ls`. The content of the subfolder will then be listed. >cd $WORKSHOP/01-route/runs/
ls @@ -197,12 +197,12 @@ The field values for STUDY and SAMPLE are taken from the study and samples metad #### Submit using Webin-CLI -Open a command prompt window and navigate to the *runs* subfolder of the 01-route folder of the example data files, using the `cd` command (e.g. `cd $WORKSHOP/01-route/runs/`). Remember to use `\` instead on Windows. +Open a command prompt window and navigate to the *runs* subfolder of the `01-route` folder of the example data files, using the `cd` command (e.g. `cd $WORKSHOP/01-route/runs/`). Remember to use `\` instead on Windows. -The Webin-CLI requires that you provide information about who you are and what you want to do before use. You can do this by using one of the available **’options’**. In the command prompt window, type in the appropriate command from the list below to see the available options. Please note that the filepath should give the location of Webin-CLI. Below, it is assumed that a path variable ('DOWNLOADS') has been created that contains the full path to the folder where the program was downloaded. See [this section in the preparation for submissions tab](/support_services/tutorial_ena/tutorial_ena_subprep/#obtaining-example-data) to see how to do this. +The Webin-CLI requires that you provide information about who you are and what you want to do before use. You can do this by using one of the available **’options’**. In the command prompt window, type in the appropriate command from the list below to see the available options. Please note that the filepath should give the location of Webin-CLI. Below, it is assumed that a path variable ('DOWNLOADS') has been created that contains the full path to the folder where the program was downloaded. See the [Preparation for Submissions](/support_services/tutorial_ena/tutorial_ena_subprep/#obtaining-example-data) tab on how to do this. -* **On Windows** - java -jar $DOWNLOADS\webin-cli-4.2.3.jar –help -* **On Mac** - java -jar $DOWNLOADS/webin-cli-4.2.3.jar -help +* **On Windows** - java -jar $DOWNLOADS\webin-cli-7.0.1.jar -help +* **On Mac** - java -jar $DOWNLOADS/webin-cli-7.0.1.jar -help You'll see that multiple options are available. You will use the following: @@ -216,13 +216,20 @@ You'll see that multiple options are available. You will use the following: * `-submit`: validates and submits the files defined in the manifest file. * `-test`: use Webin test service instead of the production service. -*The above definitions were taken from [ENA documentation](https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html#command-line-options), where you can also find more information about the command line options available for Webin-CLI.* +*The above definitions were taken from [ENA documentation](https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html#command-line-options), where you can also find more information about the command line options available for Webin-CLI.* + +You will first use the `-validate` and `-test` options to perform a validation against the test server. Use the `$WORKSHOP\data\raw` folder as the input directory, and `$WORKSHOP\01-route\runs` as output directory. Use the commands below to do this. Remember to modify the commands according to your filepath, and to replace `Webin-XXX` and `myPassword` with your own account credentials: -You will first use the `-validate` and `-test` options to perform a validation of the test server. Use the `$WORKSHOP\data\raw` folder as the input directory, and `$WORKSHOP\01-route\runs` as output directory. Use the commands below to do this. Remember to modify the commands according to your filepath, and to replace `Webin-XXX` and `myPassword` with your own account credentials: +* **Windows** -* **Windows** - `java -jar $DOWNLOADS\webin-cli-4.2.3.jar -context reads -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP\01-route\runs\paired_fastq_manifest_sample1.txt -outputDir $WORKSHOP\01-route\runs -inputDir $WORKSHOP\data\raw -validate -test` + ``` + java -jar $DOWNLOADS\webin-cli-7.0.1.jar -context reads -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP\01-route\runs\paired_fastq_manifest_sample1.txt -outputDir $WORKSHOP\01-route\runs -inputDir $WORKSHOP\data\raw -validate -test + ``` -* **Mac** - `java -jar $DOWNLOADS/webin-cli-4.2.3.jar -context reads -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP/01-route/runs/paired_fastq_manifest_sample1.txt -outputDir $WORKSHOP/01-route/runs -inputDir $WORKSHOP/data/raw -validate -test` +* **Mac** + ``` + java -jar $DOWNLOADS/webin-cli-7.0.1.jar -context reads -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP/01-route/runs/paired_fastq_manifest_sample1.txt -outputDir $WORKSHOP/01-route/runs -inputDir $WORKSHOP/data/raw -validate -test + ``` Once your command is entered correctly into your command prompt window, press 'Enter' on your keyboard. If the validation was successful, the last line of the resultant output will read: @@ -238,9 +245,14 @@ If the submission is successful the last output row will read: The assigned accession number will also be displayed. As with failed validation, a failed submission results in the provision of directions to a report file that explains the errors. -When doing a real submission to ENA, you use the same commands as those given above (modified as indicated) but with the `-test` option removed from the commands. In the event that you have multiple manifest files, you can generate a file script that would enable more automation. See for example: +When doing a real submission to ENA, you use the same commands as those given above (modified as indicated) but with the `-test` option removed from the commands. + + ### Submit sequence assemblies @@ -248,7 +260,7 @@ When doing a real submission to ENA, you use the same commands as those given ab In this section, we will use the materials in the *sequences* subfolder of the route-01 folder in the example data that you downloaded earlier. -To view the contents of this folder, open it using the file explorer or a command line interface. To do the latter, open a command prompt window (Terminal) on your computer and navigate to the *sequences* subfolder. Do this by typing `cd` followed by the filepath (depicted as $WORKSHOP/01-route/sequences/ below). Press enter and then type `ls` and press enter again. The content of the subfolder will then be printed in the window. +To view the contents of this folder, open it using the file explorer or a command line interface. To do the latter, open a command prompt window (Terminal) on your computer and navigate to the *sequences* subfolder. Do this by typing `cd` followed by the filepath (depicted as $WORKSHOP/01-route/sequences/ below). Press enter, then type `ls` and press enter again. The content of the subfolder will then be listed in the window. >cd $WORKSHOP/01-route/sequences/
ls @@ -256,7 +268,7 @@ Windows users should replace `/` in the above with `\`. In the folder, you'll see that there are three manifest files (named 'hCov-19_isolate_\*_manifest.txt', where \* is a number between 1 and 3); one for each of the 3 example sample data files. The manifest files contain information about the corresponding sequence, including the name of the FASTA file containing that sequence. -There are 6 files in the subfolder in the example data `$WORKSHOP/data/sequences/`; one FASTA sequence file and one chromosome list file for each of the three samples. The chromosome list file defines the list of 'chromosomes', in the case of SARS-CoV-2, we simply list the sequences as 'chromosome 1'. Please remember that the sequences must be given a unique name within the submission that is provided in the FASTA files. It is also essential that the sequence names are consistent between files, for example, the chromosome list file must refer to the chromosome sequences using the unique sequence names. +There are 6 files in the subfolder in the example data `$WORKSHOP/data/sequences/`; one FASTA sequence file and one chromosome list file for each of the three samples. The chromosome list file defines the list of 'chromosomes', in the case of SARS-CoV-2, we simply list the sequences as 'chromosome 1'. Please remember that the sequences must be given a unique name within the submission that is provided in the FASTA files. It is essential that the sequence names in the FASTA files are used also in the corresponding chromosome list files. #### Prepare manifest files @@ -264,7 +276,6 @@ Manifest files, which provide information about data files, need to be prepared * **STUDY** - Study accession or unique name (alias). * **SAMPLE** - Sample accession or unique name (alias). -* **RUN_REF** - Comma separated list of Run accession number(s) corresponding to the raw read data it was built from (if available). * **ASSEMBLYNAME** - Unique assembly name. * **ASSEMBLY_TYPE** - For SARS-CoV2 assembly_type is ‘COVID-19 outbreak’. * **COVERAGE** - The estimated depth of sequencing coverage. @@ -273,10 +284,11 @@ Manifest files, which provide information about data files, need to be prepared * **MINGAPLENGTH** - Minimum length of consecutive Ns to be considered a gap (optional). * **MOLECULETYPE** - For SARS-CoV2 molecule type will be either genomic RNA or viral cRNA, depending on your library preparation strategy. * **DESCRIPTION** - Free text description (optional). +* **RUN_REF** - Comma separated list of Run accession number(s) corresponding to the raw read data it was built from (if available). * **FASTA** - Single FASTA file (compressed format). * **CHROMOSOME_LIST** - Chromosome list file (compressed format). -*The above information was adapted from [documentation from ENA](https://buildmedia.readthedocs.org/media/pdf/ena-docs/latest/ena-docs.pdf)* +*The above information was adapted from [documentation from ENA](https://ena-docs.readthedocs.io/en/latest/submit/assembly/genome.html#manifest-files)* The manifest file can be created in a text editor (e.g. Notepad) of your choice. You can use one of the manifest files in the example data folder (that you downloaded earlier for this tutorial) as a reference when making your own manifest file for SARS-CoV-2 sequence data. Please note though, that the field values might differ for your own project. @@ -288,7 +300,7 @@ The field values for STUDY, SAMPLE, and RUN_REF (if raw reads have been submitte * Go back to the Dashboard menu (top left of the Webin Submissions Portal) and click on **Samples Report**. Find the accession number (starting with ERS) for the sample that the sequence file belongs to. Copy this and paste it into the manifest file as the SAMPLE field value. -* **Only needed if you have previously submitted the related raw reads** - go to the Dashboard menu (top left of the Webin Submissions Portal) and click on **Runs Report**. Locate the accession number(s) (starting with ERR) for the raw read(s) that the assembly sequence file is based on. Copy the number(s), and paste it/them into the manifest file as the RUN_REF field value. +* **Only needed if you have previously submitted the related raw reads** - go to the Dashboard menu (top left of the Webin Submissions Portal) and click on **Runs Report**. Locate the accession number(s) (starting with ERR) for the raw read(s) that the assembly sequence file is based on. Copy the number(s), and paste it/them into the manifest file as the RUN_REF field value (separate with comma). #### Submit using Webin-CLI @@ -296,10 +308,10 @@ In a command prompt window, type the command `cd` followed by the filepath to th You then need to 'tell' Webin-CLI who you are and what you want to do. This can be done using 'options'. You can use the below commands to view the available options: -* **On Windows** - java -jar $DOWNLOADS\webin-cli-4.2.3.jar –help -* **On Mac** - java -jar $DOWNLOADS/webin-cli-4.2.3.jar -help +* **On Windows** - java -jar $DOWNLOADS\webin-cli-7.0.1.jar –help +* **On Mac** - java -jar $DOWNLOADS/webin-cli-7.0.1.jar -help -In the above commands, you need to provide the location of Webin-CLI. Here we assume that 'DOWNLOADS' has been created as a path variable, which contains the full path to the folder where the program was downloaded. See [this section in the preparation for submissions tab](/support_services/tutorial_ena/tutorial_ena_subprep/#obtaining-example-data) for information on how to do this. +In the above commands, you need to provide the location of Webin-CLI. Here we assume that 'DOWNLOADS' has been created as a path variable, which contains the full path to the folder where the program was downloaded. See the [Preparation for Submissions](/support_services/tutorial_ena/tutorial_ena_subprep/#obtaining-example-data) tab for information on how to do this. You will make use of following options: @@ -313,13 +325,19 @@ You will make use of following options: * `-submit`: validates and submits the files defined in the manifest file. * `-test`: use Webin test service instead of the production service. -*Adapted from [documents by ENA](https://nbisweden.github.io/module-repository) that also provide more detail regarding the command line options of Webin-CLI* +*Adapted from [ENA documentation](https://ena-docs.readthedocs.io/en/latest/submit/general-guide/webin-cli.html#command-line-options) that also provides more details regarding the command line options of Webin-CLI.* You will use the `-validate` and `-test` options to complete a validation to the test server. When doing this, use `$WORKSHOP\data\sequences` folder as the input directory, and `$WORKSHOP\01-route\sequences` as the output directory. Input the appropriate command from below into your command prompt window. Remember to replace both `Webin-XXX` and `myPassword`, with your own account credentials: -* **Windows** - `java -jar $DOWNLOADS\webin-cli-4.2.3.jar -context genome -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP\01-route\sequences\hCoV-19_isolate_1_manifest.txt -outputDir $WORKSHOP\01-route\sequences -inputDir $WORKSHOP\data\sequences -validate -test` +* **Windows** + ``` + java -jar $DOWNLOADS\webin-cli-7.0.1.jar -context genome -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP\01-route\sequences\hCoV-19_isolate_1_manifest.txt -outputDir $WORKSHOP\01-route\sequences -inputDir $WORKSHOP\data\sequences -validate -test + ``` -* **Mac** - `java -jar $DOWNLOADS/webin-cli-4.2.3.jar -context genome -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP/01-route/sequences/hCoV-19_isolate_1_manifest.txt -outputDir $WORKSHOP/01-route/sequences -inputDir $WORKSHOP/data/raw -validate -test` +* **Mac** + ``` + java -jar $DOWNLOADS/webin-cli-7.0.1.jar -context genome -userName Webin-XXXXX -password myPassword -manifest $WORKSHOP/01-route/sequences/hCoV-19_isolate_1_manifest.txt -outputDir $WORKSHOP/01-route/sequences -inputDir $WORKSHOP/data/raw -validate -test + ``` Press **Enter**. If the validation was successful, the last row of the output will read: @@ -333,12 +351,12 @@ When you move on to doing a real submission, use the same commands as those give #### What happens after submission -Once the submitted sequence assemblies have been processed, they will be distributed as **EMBL flat files**, see [this example from ENA](https://ena-docs.readthedocs.io/en/latest/submit/fileprep/flat-file-example.html) to understand the format. These files largely comprise of: +Once the submitted sequence assemblies have been processed, they will be distributed as **EMBL flat files**, see [this example from ENA](https://ena-docs.readthedocs.io/en/latest/submit/fileprep/flat-file-example.html) to understand the format. These files largely consist of: 1. Metadata, such as author names and addresses (contained in lines beginning with `R`, e.g. `RA`, `RL`, `RG`) 2. Sample information, located inside a `source` block 3. The sequence itself -Upon generation of the EMBL file, sequences also acquire a sequence accession number. This accession number will comprise of 2 upper case letters followed by 6 numbers, e.g. [LR991698](https://www.ebi.ac.uk/ena/browser/api/embl/LR991698.2?lineLimit=1000). +Upon generation of the EMBL file, sequences also acquire a sequence accession number. This accession number will consist of 2 upper case letters followed by 6 numbers, e.g. [LR991698](https://www.ebi.ac.uk/ena/browser/api/embl/LR991698.2?lineLimit=1000). For each assembly submission, Webin will report a unique accession number (starting with ERZ). For most assemblies, this number is only used for internal processing and will not be visible in the browser. However, for SARS-CoV-2 assemblies, the ERZ records will also be available in the browser to provide a point of access for the submitted file(s). diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_subroute2.md b/content/english/support_services/tutorial_ena/tutorial_ena_subroute2.md index 8cac0f62e..2938fd2ac 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_subroute2.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_subroute2.md @@ -10,9 +10,9 @@ menu: ## When to use this route -Route 2 is recommended for those with advanced knowledge of using command line, and for those doing bulk submissions. It is the most suitable route for high-throughput, frequent submissions and automated systems. For this route, the metadata is typically provided to ENA in XML documents. This is true in all cases except for sequence assemblies, where the metadata is instead provided in JSON format. In either case, the metadata are submitted using cURL. +Route 2 is recommended for those with advanced knowledge of using command line, and for those doing bulk submissions. It is the most suitable route for high-throughput, frequent submissions and automated systems. For this route, the metadata is typically provided to ENA in XML documents. This is true in all cases except for sequence assemblies, where the metadata is instead provided in JSON format. In either case, the metadata is submitted using cURL. -## Necessary preparation +## Necessary preparations Before commencing with the submission, the following things are required: @@ -20,13 +20,13 @@ Before commencing with the submission, the following things are required: 2. **cURL** - A command line facility with cURL installed. -**Note:** This submission route works well in a Mac or Linux environment. For Windows users, we recommend downloading a Ubuntu app or a virtual Linux machine for smooth submissions. +**Note:** This submission route works well in a Mac or Linux environment. For Windows users, we recommend downloading an Ubuntu app or a virtual Linux machine for smooth submissions. ### Data required for route 2 -All of the data required to complete this submission can be downloaded together in a single zip file by clicking [here](/ENA_tutorial_data/example_data.zip). In this part of the tutorial, we will make use of the information in the '02-route' and 'data' subfolders. +All of the data required to complete this submission can be downloaded as a [single zip file](/ENA_tutorial_data/example_data.zip). In this part of the tutorial, we will make use of the information in the '02-route' and 'data' subfolders. -**Note:** If you use your own data, you can fill in the metadata template, for instructions on how to do this please see [this section in the preparation for submissions tab](/support_services/tutorial_ena/tutorial_ena_subprep/#preparing-the-metadata). +**Note:** If you use your own data, you can fill in the metadata template, for instructions on how to do this please see the [Preparation for Submissions](/support_services/tutorial_ena/tutorial_ena_subprep/#preparing-the-metadata) tab. ### Create checksums (MD5) @@ -48,7 +48,7 @@ Sequence files and the MD5 checksum files must be uploaded before starting the s {{< tutorial_ena_subroute2_upload >}} -For other options to upload, more detailed instructions, or troubleshooting, please see [this section of ENA's tutorial](https://ena-docs.readthedocs.io/en/latest/submit/fileprep/upload.html). +For other options to upload, more detailed instructions, or troubleshooting, please see [ENA documentation on Uploading Files](https://ena-docs.readthedocs.io/en/latest/submit/fileprep/upload.html). **Note**: Always keep a local copy of the uploaded files until the files have been successfully submitted and archived. The ENA dropbox is a temporary transit area and is not backed up. @@ -56,29 +56,33 @@ For other options to upload, more detailed instructions, or troubleshooting, ple ### Register a study -In this section, we will use the materials in the *02-route/study/* subfolder of the example data that you downloaded earlier. Use the following commands in the command line to navigate to the folder and view its contents: +In this section, we will use the materials in the `02-route/study/` subfolder of the example data that you downloaded earlier. Use the following commands in the command line to navigate to the folder and view its contents: > cd $WORKSHOP/02-route/study/
ls -In the *study* subfolder, you will find two example XML files (the file type required for this kind of submission). For this type of submission, we will need 2 XML files: +Two XML files (the file type required for this kind of submission) are required for this type of submission: * **project.xml** - This XML file contains the metadata for the study, including e.g. title of the study. -* **submission.xml** - This file declares one or more Webin submission service actions. The action can be ``, which is used to submit new objects. The user can decide the study release date, which is the date that the study will become public, along with all data associated with it. By default, the release date will be set as two months after the date of submission, but the submitter can select any date within 2 years of the present date. This can be done using the `` action, and may look e.g. ``. You can modify release date by replacing `` with ``. The submission.xml file in the $WORKSHOP folder that you set up, defines the `` action that will allow you to create a new study. +* **submission.xml** - This file declares one or more Webin submission service actions. In this example, the action is ``, which is used to submit new objects, followed by `` action, which is used to set a release date. The release date is the date that the study will become public, along with all data associated with it. If no release date is set, the default is two months after the date of submission, but the submitter can select any date within 2 years of the present date. You can later modify the release date by replacing action `` with ``. -Edit *project.xml* to create a project alias that is unique to you. Once complete, you are ready to send the project.xml and submission.xml files (using the `` action) to the test service using the following cURL command: +Edit *submission.xml* and set an appropriate release date, then edit *project.xml* to create a project alias that is unique to you. Once complete, you are ready to send the project.xml and submission.xml files (using the `` action) to the test service using the following cURL command: -`curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"*` +``` +curl -u username:password -F "SUBMISSION=@submission.xml" -F "PROJECT=@project.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" +``` **Note**: Replace `username` with your Webin username (starting with Webin-), and `password` with your Webin password. After you run the above command in your command prompt, you will receive a ‘receipt.xml’ file. This file will contain information about the contents and success of your submission, as well as your study accession. -The attribute 'success' in the receipt file will have a value of either true or false. If the value is false, it indicates that submission was unsuccessful. In this case, please check the rest of the receipt file for error messages. When you have resolved the errors indicated, you can try the submission again. If the value is true, then it indicates that the submission was successful. The receipt will contain the accession number of the study metdata object that you submitted. The accession number generated will be the one you will include in a publication. Please take note of your study accession number at this stage, we will use this later to submit other objects to this study. +The attribute 'success' in the receipt file will have a value of either true or false. If the value is false, it indicates that submission was unsuccessful. In this case, please check the rest of the receipt file for error messages. When you have resolved the errors indicated, you can try the submission again. If the value is true, then it indicates that the submission was successful. + +The receipt will contain the accession number of the study you submitted, and this is the number you will include in a publication. Please take note of your study accession number at this stage, we will use this later to submit other objects to this study. ### Prepare information for samples -In this section, we will use the materials in the *02-route/samples/* subfolder of the example data that you downloaded earlier. This subfolder contains multiple XML files. Use the following commands in the command line prompt to navigate to the folder and view its contents: +In this section, we will use the materials in the `02-route/samples/` subfolder of the example data that you downloaded earlier. This subfolder contains multiple XML files. Use the following commands in the command line prompt to navigate to the folder and view its contents: >cd $WORKSHOP/02-route/samples/
ls @@ -105,7 +109,9 @@ Sample aliases are defined within the `` tag, e.g. `` tag for each submitted sample (when using the sample data for this tutorial, you should see 3). Please take note of each sample alias and accession, as you will need these later when submitting sequence files. @@ -117,7 +123,9 @@ Sometimes, previously uploaded metadata needs to be updated. You can do this by * This time, we will use the `submission_modify.xml` in the submission. This file instructs the service to update an existing sample. The update uses the alias to detect existing samples, so it is important not to change the alias. -`curl -u username:password -F "SUBMISSION=@submission_modify.xml" -F "SAMPLE=@samples.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"` +``` +curl -u username:password -F "SUBMISSION=@submission_modify.xml" -F "SAMPLE=@samples.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" +``` Check the receipt file to see whether your update was successful. Note that the receipt file will also report which samples have not been updated. @@ -125,11 +133,11 @@ Check the receipt file to see whether your update was successful. Note that the As in all previous steps, this type of submission is performed using XML files. In the case of raw reads, we must submit two types of object: experiments and runs. Experiments hold information about library preparation and sequencing protocols, and also provide a link to the appropriate study and samples. Runs simply link experiments and data files. -Submissions are defined using different metadata objects. To know more about metadata objects, please [read this section](/support_services/tutorial_ena/tutorial_ena_terminology/) of this tutorial. +Submissions are defined using different metadata objects. To know more about metadata objects, please read [the terminology section](/support_services/tutorial_ena/tutorial_ena_terminology/) of this tutorial. Please go to the example data that you downloaded earlier, and locate example XMLs for both experiments and runs in the `02-route/runs` directory. Make the following edits: -* In **experiments.xml**, replace all occurrences of PRJEB#### with your study accession number, and all occurrences of SAME###### with the equivalent sample accessions. +* In **experiments.xml**, replace all occurrences of PRJEB#### with your study accession number, and all occurrences of ERS###### with the equivalent sample accessions. * In **runs.xml**, replace the checksum field in each `` tag with those that you computed earlier. These will be used to check for file corruption during upload. @@ -142,17 +150,19 @@ Note that runs reference experiments by their aliases. For example: We will send these XML files to the test service using cURL with the following command: -`curl -u username:password -F "SUBMISSION=@submission.xml" -F "EXPERIMENT=@experiments.xml" -F "RUN=@runs.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/"` +``` +curl -u username:password -F "SUBMISSION=@submission.xml" -F "EXPERIMENT=@experiments.xml" -F "RUN=@runs.xml" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" +``` -For additional information, please see [this section of the documentation from ENA](https://ena-docs.readthedocs.io/en/latest/submit/reads/programmatic.html). +For additional information, please see ENA documentation on [Submit Raw Reads Programmatically](https://ena-docs.readthedocs.io/en/latest/submit/reads/programmatic.html). ### Validate and submit sequence assemblies A new JSON-based REST service was introduced by ENA during 2021 specifically for the submission of SARS-CoV-2 sequences. Sequences for submissions made using this service are not held in FASTA files. Rather, they are included directly in the JSON itself, thus greatly simplifying the process of submission. This is only made possible due to the small size and relatively low complexity of the genome. For more information on this system, including useful code snippets, please see [ENA documentation on this topic](https://ena-covid19-docs.readthedocs.io/en/latest/help_and_guides/webin-cli-rest.html). -In this section, we will use the materials in the 02-route/sequences/ subfolder of the example data that you downloaded earlier. Use the following commands in the command line prompt to navigate to the folder and view its contents: +In this section, we will use the materials in the `02-route/sequences/` subfolder of the example data that you downloaded earlier. Use the following commands in the command line prompt to navigate to the folder and view its contents: -```bash +``` cd $WORKSHOP/02-route/sequences/ ls cat hCoV-19_isolate_1.json @@ -164,12 +174,12 @@ This service also provides both validation and submission services, as with the * Validation: [https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19/validate](https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19/validate). -* Submission: [https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19](https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19/validate). +* Submission: [https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19](https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19/). -Let's validate the contents of `hCoV-19_isolate_1.json`. Remember to replace `-u user:pass` with your webin credentials, and to substitute in your own accessions into the JSON object: +Let's validate the contents of `hCoV-19_isolate_1.json`. Remember to replace `-u user:password` with your webin credentials, and to substitute in your own accessions (study, sample and runRef) into the JSON object: -```bash -curl -X 'POST' -u user:pass +``` +curl -X 'POST' -u user:password 'https://wwwdev.ebi.ac.uk/ena/submit/webin-cli/api/v1/genome/covid-19/validate' -H 'accept: application/json' -H 'Content-Type: application/json' diff --git a/content/english/support_services/tutorial_ena/tutorial_ena_terminology.md b/content/english/support_services/tutorial_ena/tutorial_ena_terminology.md index 73adea9d6..d4e40b189 100644 --- a/content/english/support_services/tutorial_ena/tutorial_ena_terminology.md +++ b/content/english/support_services/tutorial_ena/tutorial_ena_terminology.md @@ -14,17 +14,17 @@ menu: * **Analysed sequence data** - Sequence data that has been processed in some way after being obtained from a sequencing instrument. Such data has been normalised, and perhaps also subject to other processing (e.g. removal of outliers, calculation of expression measurements, and statistical analyses). -* **Metadata** - Description of the data that gives, at a minimum, sufficient information to reproduce the data collection method (e.g. description of how the source material was obtained and details about the sequencing process, such as library preparation and the instruments used). In ENA, all metadata related to a research project is represented by different types of metadata objects. See [below](/support_services/tutorial_ena/tutorial_ena_terminology/#definition-of-metadata-objects) for an explanation of different types of object. +* **Metadata** - Description of the data that gives, at a minimum, sufficient information to reproduce the data collection method (e.g. description of how the source material was obtained and details about the sequencing process, such as library preparation and the instruments used). In ENA, all metadata related to a research project is represented by different types of metadata objects. See [below](/support_services/tutorial_ena/tutorial_ena_terminology/#definition-of-metadata-objects) for an explanation of different types of objects. ## Definition of metadata objects -ENA recognises multiple 'levels'/'types' of metadata related to sequencing projects. These different 'levels'/'types' of metadata are represented by different metadata objects. For example, general project information (such as the project title) is defined in the **study (project)** object, sequenced source material is included in the **samples** object, and details about the sequencing experiment are captured in the **experiment** object. +ENA recognises multiple 'levels'/'types' of metadata related to sequencing projects. These different 'levels'/'types' of metadata are represented by different metadata objects. For example, general project information (such as the project title) is defined in the **study (project)** object, sequenced source material is included in the **sample** object, and details about the sequencing experiment are captured in the **experiment** object. Below is more information on what each type of metadata object comprises: * **Study** - A study (project) object is used to group together all data submitted to ENA about a given study and to control its release date. A study accession number is typically used to cite data submitted to ENA. Note that all data and metadata associated with a study are made public together with the study when it is released. -* **Sample** - A sample object contains information about the sequenced source material. Checklists are in place to define which fields should be filled when annotating samples. Note that a taxonomic classification system is used to refer to biological organisms; the accepted organism name and classification hierarchy are used, see [here](https://www.gbif.org/dataset/6b6b2923-0a10-4708-b170-5b7c611aceef) for further details. +* **Sample** - A sample object contains information about the sequenced source material. Checklists are in place to define which fields should be filled when annotating samples. Note that a taxonomic classification system is used to refer to biological organisms; the accepted organism name and classification hierarchy are used, see [ENA taxonomy](https://www.gbif.org/dataset/6b6b2923-0a10-4708-b170-5b7c611aceef) for further details. * **Experiment** - An experiment object contains all the details about the metholodology used for sequencing, including library and instrument details. @@ -35,7 +35,7 @@ Below is more information on what each type of metadata object comprises: The different metadata objects relate to different stages of the sequencing process. A summary of which metadata objects relate to which stages is shown in the below figure.
- + ENA metadata objects
For more information on how metadata objects are used and how they are related to one another, see this [video](https://youtu.be/M9srsSieEB4) produced by ENA. @@ -47,16 +47,17 @@ For more information on how metadata objects are used and how they are related t A text-based file format used for nucleotide, peptide, or protein sequences. The sequences themselves are preceded by the sequence name, which comprises a single line. The sequence name is preceded by a single right-facing arrow (‘>’) that distinguishes it from the sequence itself. A sequence in FASTA format may look e.g.: - -> \>ENA|MT192765|MT192765.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/PC00101P/2020, complete genome. +``` +>ENA|MT192765|MT192765.1 Severe acute respiratory syndrome coronavirus 2 isolate SARS-CoV-2/human/USA/PC00101P/2020, complete genome. GTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCT... AAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCT... +``` -For more information on this file format, see [here](https://learn.gencore.bio.nyu.edu/ngs-file-formats/fastaa-format/). +For more information on this file format, see [Wikipedia on FASTA format](https://en.wikipedia.org/wiki/FASTA_format). ### FASTQ -A FASTQ file is a text-based file format that contains sequence data (including a sequence name) in a similar way to FASTA files (described above). The main difference between the two is that FASTQ files include information about data quality, whilst FASTA files do not. +A FASTQ file is a text-based file format that contains sequence data (including a sequence name) in a similar way as FASTA files (described above). The main difference between the two is that FASTQ files include information about data quality, whilst FASTA files do not. FASTQ is one of the most widely used formats in sequence analysis and is often the format outputted by sequencing machines. Most analysis tools prefer FASTQ files to FASTA files because they contain more information. @@ -70,4 +71,4 @@ The syntax used in FASTQ files is slightly different to that in FASTA files. Eac * **Quality Scores** - The fourth line contains scores related to data quality. -For more information on this file format, see [here](https://learn.gencore.bio.nyu.edu/ngs-file-formats/fastq-format/). +For more information on this file format, see [Wikipedia on FASTQ format](https://en.wikipedia.org/wiki/FASTQ_format). diff --git a/layouts/ena_tutorial/single.html b/layouts/ena_tutorial/single.html index 316cf5cca..e96079704 100644 --- a/layouts/ena_tutorial/single.html +++ b/layouts/ena_tutorial/single.html @@ -13,7 +13,7 @@
diff --git a/layouts/index.html b/layouts/index.html index f7af08dd3..1f3c97132 100644 --- a/layouts/index.html +++ b/layouts/index.html @@ -171,8 +171,8 @@
Biobanks
{{ if ne $.Site.Language.LanguageName "Svenska" }}
-

Topics

-

Click on one of the pandemic preparedness topics below to see only the content that is related to that topic.

+

Pandemic preparedness topics

+

Click on one of the topics below to see only the content that is related to that topic.

{{ range .Site.Menus.topics_menu }} {{ .Name }} {{ end }} diff --git a/layouts/partials/navbar.html b/layouts/partials/navbar.html index 226dd9541..8b9b1d577 100644 --- a/layouts/partials/navbar.html +++ b/layouts/partials/navbar.html @@ -150,15 +150,22 @@ Research & Funding
- +