Skip to content
petermr edited this page Aug 27, 2020 · 6 revisions

Presentation to COAR 2020-09-10

Invitation

COAR> Thursday, September 10th for a 2-hour forum highlighting the role of open science and open repositories in the context of COVID-19.

The forum will showcase several collaborative initiatives from around the world aiming to improve the discovery of and access to COVID-19 research outputs. Participants will learn about the critical need for open science in the time of the pandemic, new workflows and practices that can be adopted in you own local context, and identify possible areas of international collaboration.

OpenVirus is a joint UK-India, open repository-based project to extract multidisciplinary semantic knowledge about viral epidemics (including COVID-19) through analysing tens of thousands or articles we can find clues to predict/prevent/mitigate viral epidemics.

OpenAIRE COVID-19 Gateway is a portal that provides access to publications, research data, projects and software that may be relevant to the Corona Virus Disease (COVID-19). The OpenAIRE COVID-19 Gateway aggregates COVID-19 related records, links them and provides a single access point for discovery and navigation.

Canadian COVID-19 Open Repository Initiative is a collaborative project led by the Canadian Association of Research Libraries to identify and make as many Canadian research outputs related to COVID-19 available through major discovery systems including the OpenAIRE COVID-10 Gateway.

And more…

The forum will take place from 14h-16h UTC time (see https://time.is/UTC) on Thursday, September 10, 2020

Dates , constraints

Kathleen Shearer

  • Yes, indeed, this is a virtual event and the time shouldn’t be too painful for your Indian colleagues.
  • I will also be inviting people from Africa, Europe, North America, and Latin America.

comments

  • The aim is to see if we can continue the momentum for open access brought about by the pandemic and discuss how to keep this going afterwards.

KS> Can I suggest that you spend more time on the Introduction part of the presentation? Why are you doing this? What problems are you solving? Who is using the content?

KS>I’d like to use this forum as a way to make the case about the value of repositories for open science (in contrast to the commercial publishers who will close content in October), and how we can work together rather than in competition.

KS>You will have about 20 minutes in total, followed by questions for about 5 or 10 minutes.

KS>Also I’d like to put some information about the presentation on the COAR website. Can I re-use your description below?

correspondence

PMR

see https://www.slideshare.net/petermurrayrust/openvirus-tools-for-discovering-literature-on-viruses for a quick overview I did last week for Peter Kraker./ Open Knowledge Maps.

Andy Jackson BL has liberated 100,000 theses and Clyde davies has indexed millions of abstracts from DOAJ. We're building a universal scraper and want to scrape journals and publishers. https://blogs.bl.uk/digital-scholarship/2020/05/bringing-metadata-full-text-together.html

Can we extend this to DOAR/COAR ? We have 5000 repos and we should be able to search them at a single click. It would need repo managers to think in terms of aggressively opening their repos to citizens. The wins will come when policy makers, health care professionals, are able to search for "face masks" or "social distancing" and find useful resources in the literature.

It should be possible for each repo managed to make their search and landing pages open to the world (frictionless) so that there's a universal approach. For example HAL does this, but CORE does not as it requires personal details (no idea why).

========= Gita and I have hosted an ongoing really exciting research project on Viral epidemics based on Open Access and repositories . We've been host to 14 wonderful Indian students from INYAS (Young Academy) and Nat Inst for Plant Genome Research. This was a "normal internship" until COVID hit and we turned it into a virtual "project" with a major goal of showing that Open Access was USEFUL for tackling the pandemic. The students are sensational and they should be the star of the show. Trust me. Possibly two of them and Gita and/or me can give context.

They have taken about 7000 Open Access papers from bioscience repositories, and we've built tools that make them semantic - bringing these dusty articles to semi-intelligent life. The ontology is based on dictionaries from Wikidata which provides a global approach. some of the students are making this multilingual - Hindi, Tamil, Urdu.

This is a real case where those in the Global South are showing how valuable the Open Access literature is.

I am assuming this is fully virtual. The students are very practicised at presenting that way.

  • Gita should introduce the presentation along the lines of what she has written .
  • we ask one of the interns to present what they have done. They are very disciplined and enthusiastic in their presentations. Gita has recorded a session yesterday where they gave short presentations and we hope we can show you this in a day or two
  • PMR finish up with a slide or two of key technical and academic points that COAR might find useful.
    ==== BTW We appreciate this is about Repositories. We are using this to mean one or a very small number of places where we can find useful and usable content. Open Access per se does not achieve this as it's spread over many hundreds of sites with incompatible formats. The places we have found useful are:
  • (Europe)PMC. This is perhaps technically not a repo but it idealises the concept
  • Wikimedia ==== [A sort-of abstract, that Kathleen will be using] We'd like to present "openVirus", an OpenRepository-based project to extract multidisciplinary semantic knowledge about viral epidemics - not just COVID-19. The hope is that by analysing tens of thousands or articles we can find clues to predict/prevent/mitigate viral epidemics. Over the last 10 weeks a team of 14 interns , along with Gita and PMR, and some technical volunteers have built a platform to automate this process. WE've based this on 8 facets: Country, Diseases, Drugs, Funders, Viruses, Test and Trace, Zoonosis and Non-Pharmaceutical Interventions. The project was managed by 8 teams of 2 , with an experienced intern mentoring a newer one. Each team managed facet X and:
  • collected 1000 Open articles on "viral epidemics and X"
  • built a dictionary (ontology) for X based on Wikidata, usually with thousands of terms.
  • validated and documented the Open toolkit we use Each has validated their corpus by manual Binary classification (False/True papers) and extending this with machine-learning to extract patterns from the papers. Several team have extended their dictionaries to Hindin, Tamil, Urdu and other languages. This is an important step for empowering the GlobalSouth.

We have also Zoom-met twice weekly and are very used to the format,

We believe this is a novel use of Open Repositories and opens up new ways of working and creating science. We'd like to provide a live presentation of what we have done.

Suggested schedule Introduction (Gita) on the political and scientific background (10%) Technical presentation (Ambreen) (70%) Suggestions for Open Repositories (PMR) (10%) Questions 10%

This week we livestreamed the Zoom meeting - 1h:22 - without problems. One feature is that several interns work from mobiles and have adjusted the resources to make this possible. We will have a backup strategy if any speaker gets cut off.

My suggestions will highlight the importance of bringing repositories to world citizens outside academia. The critical aspects are a single point for search, no friction (e.g. no logins) and also full-text indexing.

==== Happy to think about this, but also who are the "audience"? If the event is widely livestreamed or captured then it can be addressed to a wide audience. I don't think I've been to a COAR before - I've been to a few OAI meetings. There the main audience was repo managers, library staff, LIS, and maybe some funders. Probably very few scientists or medics. Probably very few government /policy people. Some educators. Very few undergraduates.

Zoom/online changes that. This work is being done by undergraduates/Masters so that's a potential audience.

Why are we doing this?

  • To create new knowledge that helps in the fight against epi/pandemics. This is a very long shot but we might just luck out.
  • to show the world that Open knowledge resources - when used systematically and automatically - are useful for science. A certain amount of this is done in bioscience but it's very rare. It should be the norm. It also shows that the current method of publication is totally - totally - unsuited for the modern world. Unless the papers are annotated , the data are semantic and published, the figures are vector diagrams - there is very little the world can do. In simple terms ony 2-3 repos are suitable for modern science.
  • to show that undergraduates can make make use of this science
  • to show that research can be done during the pandemic.
  • to promote Wikidata are the primary tool for scientific metadata and bibliography.
  • as a learning/teaching environment for Young Scientists in the Global South and elsewhere.
  • to create a self-sustaining Open Source project

What problems are we solving?

  • We're a creating a completely new approach to computable metadata (bibliographic and scientific). Computable dictionaries can be created in minutes rather than months.
  • the problem of automated discovery and indexing. We've semantically/scientifically indexed 100,000 Theses from the British Library. We've indexed 4 million abstracts from DOAJ. We've built automated pipelines for this.
  • we're making search and indexing multilingual.

Who is using this? We are. We are only 8-10 weeks old and having to create our own material so not yet anyone else. It's partly a question on "marketing".

KS>I’d like to use this forum as a way to make the case about the value of repositories for open science (in contrast to the commercial publishers who will close content in October), and how we can work together rather than in competition.

PMR>Who do you want to make the case to? This is one of our motivations for this presentation. Is there someone/organization who will be listening? I've spent years trying to promote these ideas and the general response is "Very interesting but we are engaged in buying information systems from Clarivate and Elsevier and we can't do anything ourselves because it will upset publishers". If you think there is a clear target for people to work together then maybe we'll try to address that. But too many repositories are aimed at authors and librarians, not automated reading. The only repos that we can reasonably work with are:

  • Europe/PMC
  • Wikimedia
  • HAL
  • Redalyc (LatAm, we are in close touch with Arianna Becerril-Garcia and will be implementing this
  • arXiv, and various other preprint servers (non-trivial)

PMR>Almost all other repos - including CORE - have friction - login gateways, reuse permissions etc. But maybe I have missed some

Zenodo and Figshare are fine but less structured and might come later.

There are no technical problems of scale. Wikimedia run the world's 12th search site and could certainly manage the whole of scientific publications. It's a tragedy that we pay zillions to companies instead.

So if you/we want to make a case for citizen-usable repositories the most important thing is a single search site - like Wikipedia - with HTML content and full-text indexing. And the synergy would transform scientific "publication".

Gita

Interestingly, we are currently collaborating with the Indian State of Kerala’s Higher Education Council and Directorate of Research Monitoring and Advisory council, by participating in a One-week online workshop on Research and Publication Ethics during 17 - 22 August 2020, where I shall present our group’s experience on the “Importance of Open Science during Covid-19 Pandemic”.

Initial thoughts

PMR

We have been invited to present openVirus to the Coalition Of Open Access Repositories (COAR) on Sept 10. It will be presented by @Ambreen H with a short intro from Gita and a very short postscript from @Peter Murray-Rust. You are all authors. *

  • We are particularly looking for engaging slides, videos, demonstrations (i.e. not just text)

  • stories that show the value of repositories (EPMC and Wikidata are repositories) any interesting "results" - hard, because we are only just starting to get results.

  • mini-hypotheses. "our team wanted to study the geographical location of funders"

  • anything that shows the value of repositories.

audience

I have asked for the likely demographic of the audience. My guess is that on the day it will be

  • university librarians
  • people running repositories
  • scholarly funders (university infrastructure)
  • funders

However the slides will be available to everyone in the world. We want them to say

  • WOW!!
  • what a great group of people
  • what good progress in 2 months
  • can we use their ideas and code? and collaborate

It's possible we can work out a semi-interactive program, or make a short video, instead of just slides.

visual material

In this presentation it's a good idea to have some instantly visual material relating to your topic. So, for example, if your miniproject is on drugs, have a picture of some pills in the background. Zoonosis, show pictures of some of the common animal hosts, etc. The brain takes in pictures many times faster than words. It may also be useful to browse some (not all 1000!) of your key articles in your minicorpus and select a few that have a clear message. For example http://europepmc.org/article/PMC/PMC7419072 has pictures of animal hosts...

Any figures you use MUST be from Open source and must be acknowledged. Wikipedia is one of the safest places to use.

Kathleen 2020-08-20

Dear COAR members and partners,

Join us on Thursday, September 10th for a 2-hour forum highlighting the role of open science and open repositories in the context of COVID-19.

The forum will showcase several collaborative initiatives from around the world aiming to improve the discovery of and access to COVID-19 research outputs. Participants will learn about the critical need for open science in the time of the pandemic, new workflows and practices that can be adopted in you own local context, and identify possible areas of international collaboration.

OpenVirus is a joint UK-India, open repository-based project to extract multidisciplinary semantic knowledge about viral epidemics (including COVID-19) through analysing tens of thousands or articles we can find clues to predict/prevent/mitigate viral epidemics.

OpenAIRE COVID-19 Gateway is a portal that provides access to publications, research data, projects and software that may be relevant to the Corona Virus Disease (COVID-19). The OpenAIRE COVID-19 Gateway aggregates COVID-19 related records, links them and provides a single access point for discovery and navigation.

Canadian COVID-19 Open Repository Initiative is a collaborative project led by the Canadian Association of Research Libraries to identify and make as many Canadian research outputs related to COVID-19 available through major discovery systems including the OpenAIRE COVID-10 Gateway.

And more…

PMR last slide

  • value of young scientists internationally and multicultural
  • one-click fulltext search and download
  • semantic federation (JATS and Wikidata)
  • fully open non-proprietary infrastructure
Clone this wiki locally