Skip to content

Records and Reports

ShweataNHegde edited this page Dec 3, 2020 · 74 revisions

Meeting Record I

Date

1st june 2020 Monday

Partcipants

P+K+PMR

Key points

  • Discussion about the agenda behind openVirus , to build a system so that anybody can understand the science behind the current pandemic
  • To write and document everything for InternX
  • Installation of ferret to retrieve papers from medrxiv
  • To document a wiki about 'how to get started with ferret '
  • To create a 'hello mask' program for ferret

Meeting Record II

Date

4th June 2020 Thursday

Participants

PMR+G+P+K+Pruthiv

Key Points

  • Introduction to the "standup" routine followed by P+K
  • Introduction given to Pruthivrajan and brief overview of his role in project
  • Run getpapers to collect 200 papers on viral epidemics
  • Discussion on installation and running of ami
  • Search with ami using country disease funders
  • To read TIGR2ESS on using dictionaries
  • To read about Wikidata
  • The current team goal is to run ferret against medrxiv and retrieve papers from it

Agenda

Meeting Record III

Date

8th June 2020 Monday

Participants

P+K+PMR+PR+G+Ambreen

Agenda

  1. Welcome to new members
  2. Allocation of regular responsibilities
  3. Standup (for those present last meeting) what did you do in the last 4 days that helped the team ? what are you going to do in the next 3 days that helped the team? are you blocked on anything?
  4. Record of last meeting
  5. Priorities:
  • installing and running ami
  • management of dictionaries (one per member)
  • documenting
  • potential miniprojects
  1. Testing medrxiv on getpapers and comparing with ami download

Key Points

  • General instructions given to all about Gitanjali maam being the Personal and Academic Manager for the interns. And PMR being the Project Manager of openVirus.
  • Welcome new intern Ambreen
  • Standup by P+K+Pruthiv and introduction by Ambreen
  • We are going to have Hello Mask program for getpapers and ami of 100 papers
  • Work assigned to each intern
    • Kareena- Meeting record and documentation
    • Ambreen- beta testing
    • Rajan- Technical support for organising biweekly meetings
    • Priya- ????
  • All interns should documents their 100 papers on viral epidemics on wiki in a paragraph
  • Current goal is to install and run ami Followed by allocation of different project dictionaries to all. One per member.
  • Aim to start "miniprojects" - where different viral epidemics will be given to all of us for documentation , process called "SCOPING"
  • Retrieve papers from medrxiv using getpapers by creating a query on PMC so that only medrxiv papers are downloaded.
  • Everyone must run git . Required for running ami Install ami using git
  • Clone the new ami repository
  • Dictionaries assigned: Create a Wiki of your dictionary.
  • countries (Ambreen)
  • diseases (Priya)
  • viruses (Kareena)
  • drugs (Rajan)
  • funders (Vaishali)

Meeting Record IV

Date

11th June 2020 Thursday

Participants

PMR+P+K+Pruthiv+Ambreen

Agenda

  1. Allocation of regular responsibilities
  2. Standup (for those present last meeting) what did you do in the last 4 days that helped the team ? what are you going to do in the next 3 days that helped the team? are you blocked on anything?
  3. Record of last meeting
  4. Priorities:
  • everyone able to install and run getpapers
  • installing and running amisearch with builtin dictionaries
  • running amisearch with local dictionaries
  • creating dictionaries with amidict from lists, wikipedia categories

Key Points

  • Welcome new intern- Vaishali
  • Work assigned to Priya- Support for new interns
  • Discussion on running getpapers to retrive papers from medrxiv
  • Running maven to build ami
  • Issues faced by interns while running ami after building it succesfully , documentation error " How to run ami on output of getpapers ?" Report your issues well so that you can receive accurate guidance. How to report problems;
  • what were you trying to do?
  • what did you do?
  • what happened?
  • your assessment of the problem
  • Document your installation steps followed by usage and also issues, if any on the Wiki
  • CProject directory and its relation with getpapers FAQ taken up by Ambreen
  • Explanation of a dictionary given by PMR , "Ocimum sanctum" in reference to TIGR2ESS. Introducton to 'xml' markup language and other components such as 'elements', 'attributes' and Q number for wikidata
  • Next task
  1. Go to the TIGR2ESS tutorials and read about dictionaries. "What do dictionaries do?"
  2. Try and bring the tutorials across to viral epidemics in openVirus. Search briefly about your particular dictionary. Write your own idea about your dictionary. "What does your dictionary do?" "How do you use it?"
  3. Install and run ami and search about your dictionary through it.

Meeting Record V

Date

15th June 2020 Monday

Participants

PMR+P+K+Pruthiv+Ambreen+Vaishali

Agenda

  1. Allocation of regular responsibilities
  2. Standup (for those present last meeting)
  • what did you do in the last 4 days that helped the team ?
  • what are you going to do in the next 3 days that helped the team?
  • are you blocked on anything?
  1. Record of last meeting
  2. Update on bringing Vaishali into synchronization for the project
  • Dictionary assigned - ????
  1. Priorities:Each intern will have particular responsibility for:
  • a dictionary
  • a exploration / project

DICTIONARY Most of you will have a nearly correct dictionary, but it will need cleaning and updating. The tasks include:

  • checking title of dictionary is the same as filename (else it will fail)
  • for each entry:
  • checking that Wikipedia links are present
  • checking Wikidata links
  • checking that term is a useful noun of phrase Much of this can be done automatically

PROJECT This project is primarily to test the software. DO NOT ASSUME THE RESULTS ARE MORE GENERALLY USEFUL (i.e. don't tell the world you have made a medical breakthrough - we don't have enough data or knowledge.) The project consists of:

  • creating a query, running it, and refining the query iteratively.
  • downloading up to 1000 articles (your CProject)
  • searching them with 3-6 dictionaries for co-occurrence
  • manually evaluating how useful co-occurrence is
  • refining dictionaries
  • repeat

Key Points

  1. Dictionary assigned to Vaishali- 'funders'
  2. Discussion on INYAS and interns who will be joining us.
  3. Review of the individual tasks, Interns to come up with their own dictionaries and projects.
  4. Dictionaries:
  • Creating your own dictionary and provide answers to "How many entries does your dictionary have?" "Where was it created from?" Each Intern should have AUTHORITY for their dictionary. For eg: country - ISO
  • However, one issue we all may face is SYNONYMS, each term in the dictionary has potential for synonyms such as UK/ England/ Democratic Republic of Great Britain. Wikidata may solve this issue.
  • All the dictionaries to be placed here https://github.com/petermr/openVirus/tree/master/dictionaries Everyone to create your dictionary folder in it. ( One folder per dictionary lower case names)
  • "Why are we creating dictionaries?" FAQ taken up by Vaishali. "How can we update the dictionaries"? FAQ by Ambreen. You can edit these in the FAQ page.
  1. Projects
  • Each intern to think and decide of a project which relates to viral epidemics for your use. (Personal interest) https://github.com/petermr/openVirus/tree/master/miniproject for example: face masks in viral epidemics, drugs used in viral epidemics, vaccines and viral epidemics, organizations and funders in viral epidemics, timeline of usage of dictionary terms in scholpub (cf Google trends).
  • Create a new dictionary for your project
  • To create your project, you will need indexing, information retrieval and information extraction
  • Indexes to be used solr lucene ami
  1. Next tasks:
  • Create your dictionary folders in the link given above.
  • Download and mechanically upload your search query results. You can create, read, update, delete and tidy up your dictionaries.
  • Decide a particular project you would like to work on.

Meeting Record VI

Date

18th June 2020 Thursday

Participants

PMR+P+K+Pruthiv+Ambreen+Vaishali

Agenda

  1. Allocation of regular responsibilities
  2. Standup (for those present last meeting)
  • what did you do in the last 4 days that helped the team ?
  • what are you going to do in the next 3 days that helped the team?
  • are you blocked on anything?
  1. Record of last meeting
  2. KARYA students
  3. Priorities:
  • Scientific Strategy.
  • General principles for what we hope to do. Systematic reviews, Discovery of hidden knowledge.
  • Projects. Formalize titles and project owners
  • Technology . amidict, SPARQL
  • Machine Learning.

Key Points

  1. Discussion on KARYA students affiliated to DST Rajasthan who will be joining us from the next week. Each intern will be assigned for one student and both of you will be working on the project. 2 interns from there will be working for long term. 5 interns from Indian National Young Academy Scientists. We are taking 5 students, for 1 month and each intern will mentor one. The disciplines range over physics, chemistry, maths, bioscience.

  2. Scientific strategy discussion over "Why we are doing these Projects?" To build an organised system using technological tools in order to create informative site related to Viral Epidemics.

  3. Systematic review- It is an action that brings research papers together and look for common words and phrases to present a systematic review of our search. Done using ami

4. Formalizing the Project-

  • Each project should have a scientific target. It must involve technology development.

5. How will you work on the project?

  • Creating a spreadsheet would be the first thing to begin with.
  • Mostly, manual work is involved and we would be starting with limited number of papers i.e; 50. You have to be able to go through all the papers, look for what word or phrase (must not be a false positive) you are trying to find, and produce your search results.
  • This can be made easier by SECTIONING your search as each paper contains an introduction, methods followed by conclusion.
  • Decide the tools which you will need. (An exercise for machine learning)

6. AIM- The aim is to come up with an appropriate project plan to achieve its target. Following are the targets for our interns:

  • AMBREEN: Determine the role of country in viral epidemics.
  • PRIYA: Which diseases co-occur in viral epidemics? (Whether the viral spread causes other diseases as well. Example: Spanish flu 1919 caused number of bacterial infections too. )
  • RAJAN: Which drugs are regularly used for treating viral epidemics? Particularly what drugs are used to treat symptoms, and not the virus.
  • VAISHALI: What funders are the most active in funding research during viral epidemics? The papers contains a particular section about its funding.
  • KAREENA: Which viruses are reported as being involved to cause viral epidemics? Not all viruses cause a pandemic or an epidemic. To find out which viruses can cause or have caused an epidemic.
  • All projects have an element of machine classification ("learning") and natural language processing (NLP). The main uses are: is this paper really/mainly about viral epidemics?, does your concept (above) co-occur in the same sentence as the virus/disease - i.e. is it tightly coupled? For example is "India" related to "virus in India" or is it unrelated (e.g. the reagent came from an Indian supplier?)
  • The main packages will be: ami for sectioning in CProjects and dictionary searching, KNIME for workflow and analytical tools, R for workflow and analytical tools, Keras for machine learning, Jupyter for logging and reusable scripts

7. PROJECT PROPOSAL

  • Each intern should come up with half a page project proposal on what you plan to do on your project. It should be believable and compact mentioning your strategies and goals. Prepare your own queries, plans and mention the tools which you might require. Basically, how you plan to work in order to achieve your target.
  • This project proposal will be presented to the fellow KARYA students so that they can choose what they would like to work on and with who.

8. What are tools which we are going to use in these projects?

  • Firstly we are going to create a COMMUNAL PROJECT CORPUS related to viral epidemics called the epidemic5050 papers on viral epidemics that allow us to test our software and ideas. Everyone will use this to get trained on software, and all software should be able to use it. There will be false positives in it and also problem files.Later we have to analyse it independently and come up with our own corpus.
  • ami based tools for retrieving documents and parts of documents
  • ami sectioning
  • beta testing
  • Then, we would need WORKFLOWS. Something like "workflows-->ami-->commandline(CLI)-->KNIME-->GUI"
  • Each of you should install and try KNIME an alternate for amibut consists of more tools to work with. Contact the Expert- Clyde.
  • ENTITY EXTRACTION : finding particular words or phrases in papers, done using ami or KNIME
  • Natural Language Processing (NLP) but with few aspects it has.
  • R : Contains tools for summarizing things
  • Jupyter
  • Keras
  • Excel
  • SPARQL
  1. Create a Wiki page for each of these technologies for simplification on Github. (installation and usage)

10. Steps to do the project:

  • RETRIEVAL of papers
  • BINARY CLASSIFICATION
  • SECTIONING
  • IDENTIFY & EXTRACT your information

11. Prepare the following:

  • Spreadsheets
  • Data Displays for scoping review ( Histogram, Timeline, Pie charts etc )

12. NEXT TASKS:

  • Go to the TIGR2ESS tutorial on SPARQL & WikiData - an easy way to create dictionaries.
  • Create your project proposal for the volunteers.
  • Get up to speed with Binary classification using Python/Keras, KNIME and R. Create a wiki page for Binary Classification.

Meeting Record 7

Date

22nd June 2020

Participants

PMR+P+K+Rajan+Ambreen+ Vaishali+ New interns

Agenda

  1. welcome new collaborators Zeyang Charles Li and Vanisha Arora
  2. getting started. This is for interns to add documentation to (https://github.com/petermr/openVirus/wiki/GETTING-STARTED)
  3. Standup (for those present last meeting)
  • what did you do in the last 4 days that helped the team ?
  • what are you going to do in the next 3 days that helped the team?
  • are you blocked on anything?
  1. review of minutes
  2. miniprojects review

Key Points

  1. Welcome new interns Charles and Vanisha. Brief introduction given to both about getting started, projects and dictionaries. Projects assigned
  • Charles - Non pharmacological interventions
  • Vanisha - Testing and Tracing of Viral epidemics
  1. Install and use KNIME followed by its documentation on Wiki

3. Common exercise given to all the interns (to be done individually):

  • Analyse the 50 papers on viral epidemics given here https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov
  • Create your own spreadsheet in csv format using Excel and extract valuable information from each papers. Tutorial given by PMR using screen sharing during the meet.
  • About each paper, create question for analyzing such as "Is the paper about Viral Epidemics?", "Does mention the country where the epidemic took place?", "Does it talk about the drugs used?", "Does it mentions the diseases which co-occur?", "Does it involve other viruses?", "Does it talk about the funders?"
  • These spredsheets will be key for comparing results with others. Assess without dictionaries.
  • Create queries such as "Does the paper contain annotations (features)?" Features include diagrams, pictures or images.
  • Mention human blind annotations: (A) Viral epidemics - yes/no (B) 1 to 7 features present - yes/no (C) Metadata- year of publication (D) Type of paper- research article/abstract/review
  1. Each intern to create a corpus of 950 articles individually using amisearch
  2. Create wiki pages for machine learning, workflows and data analysis tools. Data formats to work on - R , Keras and spreadsheets.
  3. All interns to publish their tool set in miniproject wiki. Create an inventory of tools (for the next meet).
  4. Explanation by PMR about using xml and json.

7. Next tasks:

  • Create your own spreadsheets.
  • Install and run KNIME. Document your experience.
  • Devise your project plan and update it on wiki. Create your project tool set.
  • Install R , Keras

Meeting record 8

Date

25th June 2020 Thursday

Participants

PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha

Agenda

  1. Allocation of regular responsibilities
  2. Standup (for those present last meeting)
  • what did you do in the last 4 days that helped the team ?
  • what are you going to do in the next 3 days that helped the team?
  • are you blocked on anything?
  1. Record of last meeting
  2. Priorities:
  • nonCov50 data set.
  • review of (5) miniprojects (each person to report on their page, resources, progress)
  • widening projects to include 3 new participants
  • review of (5) dictionaries (each owner to report). Discussion of further dictionaries
  • workflow tools (KNIME, Jupyter, ami, etc.). inventory of experience.
  • sectioning (PMR) . ami section
  • review of strategy.

Key Points

  1. Welcome new intern Sana. Brief introduction given about getting started.
  2. Task assigned to Vanisha, to create a spreadsheet containing all intern names and their project details.

3. Review of 50 papers

  • discussion about false positives
  • BLIND assessment of papers
  • Types of papers we all came across- Scientific article/ Abstract only/ Review paper/ Case study, clinical trial or others.
  • For the miniproject, get 1000 papers , develop a classification scheme to divide papers into categories, so that people can know.

4. CLASSIFICATION:

  • For doing Binary classification , we have to split the data set into- "training" , "testing" and "validation"
  • Find a tool like KNIME or Keras
  • Test your classifier and the improve your algorithm
  • Perform the classification for features (words, data, diagrams)
  1. Review of each intern's miniproject- goals, strategies, progress, queries.

6. Review of each intern's dictionary:

  • Create a dictionary of your miniproject.
  • create a communal dictionary (builtin)
  • Each dictionary should have "find ability" , "comprehensiveness" , "syntax" , "maintenance" , "documentation"
  • Dictionaries can be created using 3 options: copy from authority (such as ISO for country), copy from wikidata, SPARQL, using list of terms.
  • Dictionaries can be found as: inbuilt, contentmine dictionary.
  • Every dictionary should have wiki page (total 8) documented the by intern about about how they created it.

7. SECTIONING:

  • run by ami section (automatic) works on PMC papers. It converts JATS-ami->sections
  • 3 sections of paper- front (bibliography, abstract, journal, title, author, DOI), body (intro, background, methods, experimental, discussion), back (funders, admin, ethics, references, citations, acknowledgment)
  • ami search annotates body. we need to use a new approach called 'xpath' - a way of navigating sections in a paper, eg: front/article/title

8. Next tasks:

  • vanisha to create spreadsheet
  • Charles and Sana to come up with an introduction for interns
  • Each intern to begin with building their dictionary and create a wiki page to document their methods and experience.
  • Each intern to update their miniproject wiki page, what you did? , what is your next steps?, what are you blocked on?
  • Perform Binary classification using tools which you prefer
  • Create your corpora of 950 papers

Meeting Record 9

Date

29th June 2020

Participants

PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha+Sana

Agenda

  • Allocation of regular responsibilities
  • record of last meeting
  • Standup by each intern
  • dictionaries. Please report on 7 public facing dictionaries, especially the core 5 (which INYAS will use) . country, disease, drugs, viruses, funders.
  • noncov50 dataset. Report any ongoing problems
  • miniprojects . Please report public facing project pages.
  • brief review of workflow
  • brief review of sectioning (PMR)
  • review of strategy.
  • multilingual dictionaries (Hindi?)

Key Points

  • Preparing introduction for INYAS interns (getting started)
  • To create wiki tool of ami dict for creating dictionaries
  • To create SPARQL wiki tool for extracting wikidata search attributes
  • Discussion on false positives and how they can be classified during binary classification
  • Review of each intern's dictionary. Each one to create their dictionary's own wiki page. Assigned Ambreen as maintainer of index of dictionaries.
  • Review of each intern's miniproject wiki page. Display your corpus of 950 articles on github. (to be continued.)

Meeting record 10

Date

2nd July 2020

Participants

PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha+Sana+INYAS interns

Agenda

  • Allocation of regular responsibilities
  • record of last meeting
  • Standup by each intern
  • welcome to INYAS interns. Review of induction and any problems. Getting_started materials
  • review of miniprojects, especially so INYAS can appreciate their roles.
  • review of dictionaries. INYAS can immediately have a role in checking dictionaries.
  • problems and debugging. (out of memory error)

Key Points

  • Welcome to INYAS interns: Pooja, Urja, Simranleen, Dheeraj and Jitu. Brief introduction by each intern and getting started.
  • Discussion on creating dictionary using a text file containing list of terms. Explained by Ambreen and PMR.
  • Allocating INYAS interns to their miniprojects.
  • Review of PMC papers to explain the interns about sectioning and output of xml files.

Meeting Record 11

Date 6th July 2020 Monday

  • HACKATHON over slack #coordination
  • Review of all miniprojects and progress made by mentors and their mentees

Summary

  • each INYAS student and their mentor should discuss how to create a single page , in Markdown, for Thursday which reports the work to their classmates. The INYAS student should create this but ask for help whenever they need it. It can address:
  • what is the aim of the miniproject?
  • what resources are you using (don't just give a list; try to write something they would understand).
  • what has been done so far (again in terms they would understand)
  • mentors now each have a miniproject and a dictionary (possibly two). These should all have the same format and be organized in a consistent directory structure. Work between yourselves to ensure this (i.e. look at each others' miniprojects and dictionaries).

Meeting Record 12

Date

9th July 2020 Thursday

Participants

PMR,Priya,Kareena,Rajan,Ambreen,Sana,Vaishali,Vanisha,Charles,INYAS interns,GY

Agenda

  • Record of last hackathon
  • Standup by each intern
  • INYAS presentations 1-page summary of the project (by INYAS) intern
  • summary of progress (by mentor) on (miniproject page)

Priorities

  • KNIME
  • ami section/search/amidict
  • machine-learning

Key Points

  • Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
  • Presentation by each INYAS intern on Monday, To present their review in such a way so that their classmates are able to understand it.
  • Rajan to coordinate with people using different OS (Windows10, Mac/Unix, Windows7, Mobile etc). INYAS interns Dheeraj and Om to create a wiki page summarizing the mobile properties (Github and Slack)
  • Review of each miniproject (update your project pages with progress made, Create pages for tools which you use)
  • Everyone to commit their miniproject data on Github
  • Everyone to identify true negatives manually (papers not about viral epidemics)
  • Update your ami problems on PMR TODO for PMR to take action.

Meeting Record 13

Date

13th july 2020 Monday

Participants

PMR,priya,kareena,rajan,ambreen,sana,vaishali,vanisha,charles,INYAS students (6),Clyde

Agenda

  • Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
  • Record of last meeting
  • Standup by each intern

Key Points

  • Review of all dictionaries by core miniproject owners given together here https://github.com/petermr/openVirus/tree/master/dictionaries/test
  • Review of all miniprojects and the progress made (release of dictionary, release of corpus950, release of full.data.Tables using amisearch )
  • Discussion on bringing up a communal project for all INYAS students to create a new dictionary of Indian geo-names (states, cities, etc.) . This will allow us to pinpoint papers describing viral epidemics in Indian regions. Vanisha and Sana will coordinate this. The resulting dictionary will have permanent value as it can support a wide range of projects (e.g. TIGR2ESS crops, climate change, etc.) INYAS students will also continue to be associated with their core project.

Meeting Record 14

Date

16th July 2020 Thursday

Participants

PMR,priya,kareena,rajan,sana,vaishali,vanisha,charles,INYAS interns

Key Points

  • Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
  • Record of last meeting
  • Standup by each core intern + inyas intern
  • Discussion on Wikidata volunteers, for starting communication with wikipedia people and editing page(that means getting a wikpedia name)
  • Dheeraj, Jitu and Om - to create a wiki page on using software on mobile. Priya to coordinate with them.
  • SPARQL and Wikidata issues (if any) and progress made by interns
  • Use of wikibase language and label
  • Sparql tutorial to be created by all the interns who have used it.
  • any other technical issues faced by anyone

Midweek update (debug)

  • For all those who faced issue with empty cooccurence (bug) after running ami search on corpus, To Re-run git pull and re-install ami for getting the desired output in cooccurence.

Meeting Record 15

Date

20th July 2020 Monday

Participants

PMR,Priya,Ambreen,Vaishali,Charles,Sana,INYAS interns(Urja,Dheeraj,Pooja,Simranleen)

Key points

  1. Reviewing each of their mini-projects by interns => Project review (similar to Code review). The other attendees critiqued the wiki and the presentation.
  2. Ambreen and PMR explained the importance and use of Smoke test and ML technique in mini-projects.
  3. Vaishali and Priya done Smoke test for KNIME. Ambreen pursuing with ML technique for her mini-project.
  4. A separate project for the INYAS interns was called off and were told to pursue with their mentors in their mini-projects.
  5. Standups given by INYAS interns. They are evolved as "middle management" and the beta-testers in their mini-projects.
  6. The issues regarding dictionaries were reported.
  7. Needs for additional tools, especially (AMI, AMIDict) <---> toolBox (Jupyter, R, KNIME) as feature requests will be submitted as Issues.
  8. The presentations are considered to make as short video clips as part of the output.

Meeting Record 16

Date

23rd July 2020 Thursday

Participants

PMR,GY,priya,kareena,rajan,ambreen,sana,vaishali,vanisha,inyas interns(dheeraj,pooja,jitu,simranleen)

Key Points

  • Brief project review by PMR to GY regarding the progress made and upcoming tasks. Each intern to record a 2 minute clip on his/her miniproject and their learning experiences after they joined openVirus. To describe the work they did in this project and how it can help the world. Planning to conduct a live video con session on youtube.
  • Review by each core intern and inyas interns about their experiences and ideas if any
  • Discussion and solving ami issues faced by few
  • Discussion on SPARQL queries and downloading the .xml file

Meeting Record 17

Date

27th July 2020 Monday

Participants

PMR,priya,kareena,rajan,sana,ambreen,vaishali,vanisha,charles, inyas interns (dheeraj,pooja,urja,jitu)

Agenda

  1. Review of the 5 main projects:
  • country
  • disease
  • drug
  • funder
  • virus
  1. Please be prepared to report:
  • analysis of corpus950 (or smaller)
  • manual classification
  • creation of dictionary
  • machine learning
  • notebooks
  1. DICTIONARY
  • We should now put our dictionaries in one place, separate from ami3 and check out regularly. I have started this . It has a semi structured directory of dictionaries in a repository,a symbolic reference that AMI can use (maybe) referencing dictionaries through URLs NOTE: I haven't finished mapping ami names to SPARQL names. We should discuss having default names in SPARQL. Also we should review progress in terms in Hindi, Tamil and other languages. We should now converge on the essential parts of a SPARQL query.

Key Points

  • Allocation of regular responsibilities
  • Standup by everyone
  • Review of five main mini projects PROGRESS AND PLANNING - country(ambreen), disease (priya), drug (rajan), funder (vaishali), virus (kareena) followed by zoonosis (sana), testing and tracing (vanisha)
  • People to report on issues faced during uploading corpus or in getpapers so that PMR can fix it
  • If everybody is able to use new ami release
  • Review of each dictionary- if each contains name, term, wikidata ID, wikidata label, description, wkipedia URL. Specific entries include ISO3166 code (country), ICD10 code (disease), CrossRef ID (funder), ICTV virus ID (virus)
  • Review from each INYAS student about their learning experience, work, progress, if any blockers, work on mobile for jitu, dheeraj and om
  • Discussion on open access projects. Arianna Becerril Garria from Mexico.
  • Final video clips to be created by each intern and submitted to GY for review by the date 7th Aug
  • Discussion on machine learning tools and NLP. Hindi part of speech (POS) tagging to sentences
  • Extended discussion on dictionaries, editing the wikipage dict schema, different terms and elements of wikidata
  • ami commands to create a wikisparql dictionary

Meeting Record 18

Date

30th July 2020

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,INYAS interns- urja,pooja,dheeraj,simranleen

Key Points

  • Last official meeting with the INYAS students as they complete their four weeks internship. Review by each INYAS student about their learning experiences in this project. GY told them that they can informally continue with their work and attend the meetings if willing to do so in future.
  • All interns to prepare for the live videocon meeting to be streamed on youtube on 6th August 2020 Thursday.
  • All interns to prepare a 2 minute video clip about their own experiences and review of the project followed by their work. (all videos to be compiled together and directed by Simranleen)
  • Discussion by PMR on Open Access, what scientists do and why, how search engines work, other resources such as Redalyc/MX, India and Indonesia rxiv, theses in repositories, data scrape/clean and lots more. We are iterating in the design <-> prototype <-> deployment chain. We have advanced designs for dictionaries and sectioned documents. We have built prototypes and are testing them. This means a small amount of redesign. We now try to share all development on the wiki. Interns to put queries in on the wiki and everyone can comment. This will be particularly important for NLP and machine learning. Most of the NLP and ML tasks can be supported by packages and libraries.

Meeting Record 19

Date

6th August 2020 Thursday

Participants

PMR, GY, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj

Key Points

  • Update on dictionaries by each intern (use of wikisparql) (amidict update) To create dictionaries containing synonyms of terms
  • LIVE streaming on youtube in the coming week, 1 minute intro/slides.github wiki to be prepared by the interns to give an overview of their miniproject (inyas youtube channel)
  • Discussion on retrieval of material/information from pre-existing literature. To clean and annotate the data using tools for analyse and display and later- publish
  • Discussion on creating sparql query for languagees other than english MULTILINGUALITY (taken up by rajan in tamil)
  • Everyone to re-run ami for updates
  • Discussion on creating sparql dict for inclusing AltLabel and synonyms.
  • Overview by PMR on machine learning tools and progress

Meeting Record 20

Date

10th August 2020 Monday

Participants

PMR, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj

Key Points

  1. Running different queries on sparql for creating a dictionary (SPARQL to AMI)
  2. UPDATE by each intern on miniproject (Google workbook created, taken up by Ambreen) "Our Progress so far" https://docs.google.com/spreadsheets/d/1DI3sJnLq7MntJElah-xD4crHVEF-gLpkAL_-Qp35qx0/edit#gid=0
  3. Machine learning tools, brief explanation given by Ambreen
  4. Data analyses tools discussion by PMR
  5. Discussion on update, progress, usage, blockers in ami section and ami summary

Meeting Record 21 (YOUTUBE LIVE STREAMING OF THE MEETING)

Date

12th August 2020 Wednesday

Participants (16)

PMR, GY, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj,jitu ram,urja,pooja,om prakash, simranleen

Key Points

Meeting Record 22

Date

17th August 2020 Monday

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,sana

Key Points

Meeting Record 23

Date

20th August 2020 Thursday

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles

Key Points

  • standups
  • a brief review of internships and newcomers
  • ideas from interns about the COAR presentation
  • movie status to be updated by Simranleen soon
  • technical issues that affect more than one person
  • Interns facing issue with ami search : to use minicorpus10 for tutorials (testing the dictionary against small corpus)
  • Discussed about ami summary tool
  • About the importance of Open Science
  • PMR: Idea of adding tooltips to ami search tables in different languages.
  • Ambreen to present a workshop on Jupyter Notebook in the next meet.
  • Thanks to Dheeraj for adding the concept of multilinguality to the dictionaries and for staying with us.

Meeting Record 24

Date

24th August 2020 Monday

Participants

PMR,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,dheeraj

Key Points

  1. Review by each intern, how helpful is the project corpus? Any useful information to mention?
  2. ami update: --delete command to clear the edits (not manually)
  3. Representing the results/info graphically eg: name of funder- logo, use of statistics to display data
  4. Important terms/ commonest terms you find in corpus or dictionaries, SUMMARIZE these in pictorial representation form (use wikipedia)
  5. ami problems:
  • install ami
  • run acceptance tests
  • try tutorial for standard dictionaries
  • create own dict and validate
  • run standard corpus against standard dict
  • run standard corpus against own dict
  • Put in automated validation, ami search -> results.xml ( test that results.xml exist, used by cooccurrence and data tables)
  1. Screen sharing by Ambreen, how to execute code in jupyter.(codes, function,cleantext,libraries)
  • Classify data in excel(csv)

Meeting Record 25

Date

27th August 2020 Thursday

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles

Key Points

  • Catchup by all interns and review of each miniproject
  • Discussion on COAR Presentation by Ambreen (reviews by others on ppt)
  • Discussion by PMR on Cambridge Hackathon (online hackathon on genomics, bioinformatics, databses) Project: Cambridge-India "openVirus"
  • Put together software, databases, dictionaries, so that people can see it in virtual environment- (Virtual sensors)
  • PMR Debugging people's problems using share screen

Meeting Record 26

Date

7th September 2020 Monday

Participants

PMR,priya,kareena,rajan,ambreen,vaishali,vanisha,charles, dheeraj, anugrah, shweta

Key Points

  • Welcome new interns Anugrah and Shweta
  • Standup by interns (those who are present)
  • Review for COAR presentation (10th) given by Ambreen. Registration to be done by each intern
  • Changes made in COAR ppt
  • Debugging people's problems and resolving issues

Meeting Record

Date: 12th Oct. 2020

Participants:

PMR, Rajan, Vanisha, Shweata, Ambreen, Anugrah, Ayush, Mukul, Dheeraj

Key points:

  • Future directions to the project. Labelling dictionaries with associated concepts(relative terms, broader terms). Finding the main subject in relevant papers.
  • Preprint, Introduction of Hypergraph.
  • Review of Ambreen's Jupyter Notebook(ML).
  • New members were allocated mini-projects. Ayush to work with Ambreen on Countries, and Mukul to work with Kareena on Virus.
  • openVirus repository is becoming very huge. New GitHub repository specially dedicated to Dictionaries and Mini-corpora.

Meeting Record 28

Date: 19th Oct. 2020

Participants: PMR, GY, Rajan, Shweata, Ambreen, Ayush, Vanisha, Vaishali, Priya, Dheeraj

Key points

  • New Dictionary Manager - Rajan
  • New repository exclusively for dictionaries
  • Ayush went over the codes he had written to display frequency from results.xml
  • Enhance the dashboard to include links to Wikidata, and make it multilingual.
  • Discussed the rough outline for Wikicite presentation. More information can be found here

Meeting Record

Date: 2nd November 2020

Participants:

PMR, GY, Shweata, Aishwarya, Ambreen, Ayush, Vanisha, Vaishali, Dheeraj, Kareena, Rajan, Anugrah

Key Points:

  1. Discussed on the new and exciting directions to the project. The mini-projects would continue to be worked upon. Along with that, we would also have the Plant Science component to the project, in the future.

  2. We will also have new software projects:

  • getpapers in Python
  • ami-search in Python
  • ami-words in Python
  • display in Python
  • containerisation using Docker
  • Dictionary testing

Each of these software projects will have an issue, and it will have to:

  • collect mini teams
  • specify goals in detail
  • propose an architecture
  • build proof-of-concept (PoC)
  • Test-driven development (TDD)
  1. Briefly went over the test Jupyter Notebook, PMR had written. We then reviewed various libraries useful for our purposes of text mining. Link to the notebook discussed can be found, here

Meeting Record

Date: 5th Nov. 2020

Participants

PMR, Ayush, Dheeraj, Vanisha, Vaishali, Ayush, Shweata, Mukul, Rajan

Key Points

  • Getting to know each other's computational backgrounds.
  • New Repository for development purposes

Technical Discussion

  • Algorithms + Data Structure(ami dictionaries, CProject, in our case) = Programs

  • JATS(Journal Archiving and Interchange Tag Set) https://jats.nlm.nih.gov/archiving/tag-library/1.3d1/element/arc-elem-sec-intro.html

  • What are the problems that we run into when we use terms(given by the dictionary) to search the papers?

    • Synonyms- That's where Wikidata is helpful
    • Not knowing the context in which the terms are used
    • General concepts(like 'illness' or anything that can't be represented in a term) can't be retrieved from papers easily
  • EPMC(where you get data from, in JATS) -> clean, classify -> text mining
    Text mining tools in python- nltk textblob , glob

  • Went over PMR's Jupyter Notebook. https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb

Meeting Record

Date: 09th Nov. 2020

Participants: PMR, GY, Anugrah, Ayush, Dheeraj, Shweata

Key Points

  • Presentation(MPhil Computational Biology Seminar Series) on Wednesday. PMR, Ambreen and Shweata to present.
  • Discussed the current status of each member's project

Technical Discussions

Meeting Record:

Date: 12th Nov. 2020

Participants: PMR, Aishwarya, Ayush, Dheeraj, Mukul, Rajan, Shweata

Key Points:

  • Everybody shared their ideas for new software tools, and describe it in a sentence. The following are the ideas that came up.

New Softwares by each member:

  • Words tool (Keywords and Stopwords) - Ayush and Aishwarya
  • Dictionary Editor - Shweata
  • Sections Search - PMR
  • Summarizer - Mukul
  • Wikipedia links - Rajan
  • Enhanced Display - Dheeraj

Dictionary Editor, Sections Search and Words tool would be the most relevant pertaining to the current requirements of the project.
We first decided to work on Dictionary Editor.

  • Dictionary Editor: Each of the members present in the meeting came up with the below list of items which the Dictionary Editor needs to encompass.
    • Remove unnecessary terms
    • Duplicate terms
    • Heteronyms
    • Collaborative editor
    • Context
    • Versioning system

Tasks:

  • Ayush: To find more about Version Control in GitHub
  • Unit Test

Meeting Record

Date: 16th Nov. 2020

Participants: PMR, Ayush, Aishwarya, Anugrah, Dheeraj, Shweata

Key Points

  • Dictionary Editor- Opened an issue
  • Review of what a Dictionary is
    • Dictionaries are in .XML format
    • Root element is Dictionary, and it must have a title. And it's got a number of entry elements. Entry element has a large number of attributes.
    • Synonyms are child elements under entry
  • update.ipynb A Jupyter Notebook to validate the dictionaries.
  • What we hope to do is to validate our dictionary against the openvirus schema in Jupyter Notebook

Meeting Record

Participants: PMR, Shweata, Rajan, Ambreen, Vanisha, Dheeraj, Anugrah

Date: 19th Nov. 2020

Key Points

  • Find out where the latest dictionaries are. Moved the latest ones to dictionary repository. Moved the new ones to Dictionary repository.
  • We then checked and validated those dictionaries using the Jupyter Notebook Peter had written (available on our dictionary repository.

Meeting Record

Participants: PMR, Ayush, Vanisha, Dheeraj, Anugrah, Aishwarya

Date: 23rd Nov. 2020

Key Points:

  • Dictionary specification: PMR created an issue https://github.com/petermr/dictionary/issues/2
  • What a dictionary contains- Dictionary elements: Attributes, child elements, entry, etc.
  • How software is developed in practice- How "Customers" provide formal requirements and the implementer creates and test the code.
  • A brief discussion on Regular expressions (RegExp)
  • Miniprojects updates:
    • Aishwarya to work with Dheeraj on the miniproject: "Diseases".
    • Ayush to work with Vanisha on "Test and trace".
    • Anugrah to work on " Non-pharmaceutical interventions".

Meeting Record

Participants: PMR, Ambreen, Dheeraj, Shweata, Vanisha, Anugrah

Date: 30th Nov. 2020

Key points:

  • Discussed the recent problem that we encountered with SPARQL query, as reported by Dheeraj. More info, here(https://www.wikidata.org/wiki/Wikidata:Request_a_query#Re-running_queries_on_earlier_versions_of_Wikidata)
  • We should record our Wikidata Queries so that we don't encounter similar problems.
  • Reviewed Test and Trace dictionary created by Vanisha. Synonyms and language equivalents need to be added.
  • Reviewed Country, Organisation and Disease dictionary as well.
  • Related items need to be added in the organisation dictionary.
  • PMR raised several questions about Wikidata to the Wikimedia community.
  • Communal Tasks:
    • Retrieve entries for the list of Q Ids to add synonyms, language equivalents, etc.
  • Ambreen to draft a list of ancillary files for creating and maintaining dictionaries.
  • Examples:
    • Jupyter Notebook
    • MD for explaining the files, names and purposes. (Converge on a communal naming scheme),
    • SPARQL query,
    • SPARQL-XML output

Meeting Record

Participants: PMR, Shweata, Dheeraj, Vanisha, Matthew Dunstan

Date: 3rd Dec. 2020

Key points:

  • We had a chat about the importance of AI in Science. We are building some of the foundations of this revolution. We also discussed about DeepMind, a recent breakthrough that came about in the field of protein folding problem.
  • People were added to the dictionary repository.
  • github.com/petermr/dictionary/issues/3 We now have a way to save all our queries (with the help of RESTful URL) in the dictionary itself. Look at the comment of this issue to know more.
  • We were joined by Matthew Dunstan, today. Peter demonstrated the progress on the battery project so far.
  • We are starting to come up with a unified Dictionary Naming Scheme. Follow the link to know more. https://github.com/petermr/dictionary/wiki/Dictionary:-Naming-Scheme
Clone this wiki locally