-
Notifications
You must be signed in to change notification settings - Fork 17
Records and Reports
1st june 2020 Monday
P+K+PMR
- Discussion about the agenda behind openVirus , to build a system so that anybody can understand the science behind the current pandemic
- To write and document everything for InternX
- Installation of
ferret
to retrieve papers frommedrxiv
- To document a wiki about 'how to get started with
ferret
' - To create a 'hello mask' program for
ferret
4th June 2020 Thursday
PMR+G+P+K+Pruthiv
- Introduction to the "standup" routine followed by P+K
- Introduction given to Pruthivrajan and brief overview of his role in project
- Run
getpapers
to collect 200 papers on viral epidemics - Discussion on installation and running of
ami
- Search with
ami
using country disease funders - To read TIGR2ESS on using dictionaries
- To read about Wikidata
- The current team goal is to run
ferret
againstmedrxiv
and retrieve papers from it
- welcome to Pruthivrajan and brief overview of his role in project
- record of last meeting (June 1 Monday)
- "standup" (2 mins each, P and K). See https://en.wikipedia.org/wiki/Stand-up_meeting#Software_development
This is a no-blame experiment. Let's see how it goes. The current "goal" is to get Ferret running against
medrxiv
8th June 2020 Monday
P+K+PMR+PR+G+Ambreen
- Welcome to new members
- Allocation of regular responsibilities
- Standup (for those present last meeting) what did you do in the last 4 days that helped the team ? what are you going to do in the next 3 days that helped the team? are you blocked on anything?
- Record of last meeting
- Priorities:
- installing and running
ami
- management of dictionaries (one per member)
- documenting
- potential miniprojects
- Testing
medrxiv
ongetpapers
and comparing withami download
- General instructions given to all about Gitanjali maam being the Personal and Academic Manager for the interns. And PMR being the Project Manager of openVirus.
- Welcome new intern Ambreen
- Standup by P+K+Pruthiv and introduction by Ambreen
- We are going to have Hello Mask program for
getpapers
andami
of 100 papers - Work assigned to each intern
- Kareena- Meeting record and documentation
- Ambreen- beta testing
- Rajan- Technical support for organising biweekly meetings
- Priya- ????
- All interns should documents their 100 papers on viral epidemics on wiki in a paragraph
- Current goal is to install and run
ami
Followed by allocation of different project dictionaries to all. One per member. - Aim to start "miniprojects" - where different viral epidemics will be given to all of us for documentation , process called "SCOPING"
- Retrieve papers from
medrxiv
usinggetpapers
by creating a query on PMC so that onlymedrxiv
papers are downloaded. - Everyone must run
git
. Required for runningami
Installami
usinggit
- Clone the new
ami
repository - Dictionaries assigned: Create a Wiki of your dictionary.
-
countries
(Ambreen) -
diseases
(Priya) -
viruses
(Kareena) -
drugs
(Rajan) -
funders
(Vaishali)
11th June 2020 Thursday
PMR+P+K+Pruthiv+Ambreen
- Allocation of regular responsibilities
- Standup (for those present last meeting) what did you do in the last 4 days that helped the team ? what are you going to do in the next 3 days that helped the team? are you blocked on anything?
- Record of last meeting
- Priorities:
- everyone able to install and run
getpapers
- installing and running
amisearch
with builtin dictionaries - running
amisearch
with local dictionaries - creating dictionaries with
amidict
from lists, wikipedia categories
- Welcome new intern- Vaishali
- Work assigned to Priya- Support for new interns
- Discussion on running
getpapers
to retrive papers frommedrxiv
- Running
maven
to buildami
- Issues faced by interns while running
ami
after building it succesfully , documentation error " How to runami
on output ofgetpapers
?" Report your issues well so that you can receive accurate guidance. How to report problems;
- what were you trying to do?
- what did you do?
- what happened?
- your assessment of the problem
- Document your installation steps followed by usage and also issues, if any on the Wiki
- CProject directory and its relation with
getpapers
FAQ taken up by Ambreen - Explanation of a dictionary given by PMR , "Ocimum sanctum" in reference to TIGR2ESS. Introducton to 'xml' markup language and other components such as 'elements', 'attributes' and Q number for wikidata
- Next task
- Go to the TIGR2ESS tutorials and read about dictionaries. "What do dictionaries do?"
- Try and bring the tutorials across to viral epidemics in openVirus. Search briefly about your particular dictionary. Write your own idea about your dictionary. "What does your dictionary do?" "How do you use it?"
- Install and run
ami
and search about your dictionary through it.
15th June 2020 Monday
PMR+P+K+Pruthiv+Ambreen+Vaishali
- Allocation of regular responsibilities
- Standup (for those present last meeting)
- what did you do in the last 4 days that helped the team ?
- what are you going to do in the next 3 days that helped the team?
- are you blocked on anything?
- Record of last meeting
- Update on bringing Vaishali into synchronization for the project
- Dictionary assigned - ????
- Priorities:Each intern will have particular responsibility for:
- a dictionary
- a exploration / project
DICTIONARY Most of you will have a nearly correct dictionary, but it will need cleaning and updating. The tasks include:
- checking title of dictionary is the same as filename (else it will fail)
- for each entry:
- checking that Wikipedia links are present
- checking Wikidata links
- checking that term is a useful noun of phrase Much of this can be done automatically
PROJECT This project is primarily to test the software. DO NOT ASSUME THE RESULTS ARE MORE GENERALLY USEFUL (i.e. don't tell the world you have made a medical breakthrough - we don't have enough data or knowledge.) The project consists of:
- creating a query, running it, and refining the query iteratively.
- downloading up to 1000 articles (your CProject)
- searching them with 3-6 dictionaries for co-occurrence
- manually evaluating how useful co-occurrence is
- refining dictionaries
- repeat
- Dictionary assigned to Vaishali- 'funders'
- Discussion on INYAS and interns who will be joining us.
- Review of the individual tasks, Interns to come up with their own dictionaries and projects.
- Dictionaries:
- Creating your own dictionary and provide answers to "How many entries does your dictionary have?" "Where was it created from?" Each Intern should have AUTHORITY for their dictionary. For eg: country - ISO
- However, one issue we all may face is SYNONYMS, each term in the dictionary has potential for synonyms such as UK/ England/ Democratic Republic of Great Britain. Wikidata may solve this issue.
- All the dictionaries to be placed here https://github.com/petermr/openVirus/tree/master/dictionaries Everyone to create your dictionary folder in it. ( One folder per dictionary lower case names)
- "Why are we creating dictionaries?" FAQ taken up by Vaishali. "How can we update the dictionaries"? FAQ by Ambreen. You can edit these in the FAQ page.
- Projects
- Each intern to think and decide of a project which relates to viral epidemics for your use. (Personal interest) https://github.com/petermr/openVirus/tree/master/miniproject for example: face masks in viral epidemics, drugs used in viral epidemics, vaccines and viral epidemics, organizations and funders in viral epidemics, timeline of usage of dictionary terms in scholpub (cf Google trends).
- Create a new dictionary for your project
- To create your project, you will need indexing, information retrieval and information extraction
- Indexes to be used
solr
lucene
ami
- Next tasks:
- Create your dictionary folders in the link given above.
- Download and mechanically upload your search query results. You can create, read, update, delete and tidy up your dictionaries.
- Decide a particular project you would like to work on.
18th June 2020 Thursday
PMR+P+K+Pruthiv+Ambreen+Vaishali
- Allocation of regular responsibilities
- Standup (for those present last meeting)
- what did you do in the last 4 days that helped the team ?
- what are you going to do in the next 3 days that helped the team?
- are you blocked on anything?
- Record of last meeting
- KARYA students
- Priorities:
- Scientific Strategy.
- General principles for what we hope to do. Systematic reviews, Discovery of hidden knowledge.
- Projects. Formalize titles and project owners
- Technology .
amidict
,SPARQL
- Machine Learning.
-
Discussion on KARYA students affiliated to DST Rajasthan who will be joining us from the next week. Each intern will be assigned for one student and both of you will be working on the project. 2 interns from there will be working for long term. 5 interns from Indian National Young Academy Scientists. We are taking 5 students, for 1 month and each intern will mentor one. The disciplines range over physics, chemistry, maths, bioscience.
-
Scientific strategy discussion over "Why we are doing these Projects?" To build an organised system using technological tools in order to create informative site related to Viral Epidemics.
-
Systematic review- It is an action that brings research papers together and look for common words and phrases to present a systematic review of our search. Done using
ami
- Each project should have a scientific target. It must involve technology development.
- Creating a spreadsheet would be the first thing to begin with.
- Mostly, manual work is involved and we would be starting with limited number of papers i.e; 50. You have to be able to go through all the papers, look for what word or phrase (must not be a false positive) you are trying to find, and produce your search results.
- This can be made easier by SECTIONING your search as each paper contains an introduction, methods followed by conclusion.
- Decide the tools which you will need. (An exercise for machine learning)
6. AIM- The aim is to come up with an appropriate project plan to achieve its target. Following are the targets for our interns:
- AMBREEN: Determine the role of country in viral epidemics.
- PRIYA: Which diseases co-occur in viral epidemics? (Whether the viral spread causes other diseases as well. Example: Spanish flu 1919 caused number of bacterial infections too. )
- RAJAN: Which drugs are regularly used for treating viral epidemics? Particularly what drugs are used to treat symptoms, and not the virus.
- VAISHALI: What funders are the most active in funding research during viral epidemics? The papers contains a particular section about its funding.
- KAREENA: Which viruses are reported as being involved to cause viral epidemics? Not all viruses cause a pandemic or an epidemic. To find out which viruses can cause or have caused an epidemic.
- All projects have an element of machine classification ("learning") and natural language processing (NLP). The main uses are: is this paper really/mainly about viral epidemics?, does your concept (above) co-occur in the same sentence as the virus/disease - i.e. is it tightly coupled? For example is "India" related to "virus in India" or is it unrelated (e.g. the reagent came from an Indian supplier?)
- The main packages will be:
ami
for sectioning in CProjects and dictionary searching,KNIME
for workflow and analytical tools,R
for workflow and analytical tools,Keras
for machine learning,Jupyter
for logging and reusable scripts
- Each intern should come up with half a page project proposal on what you plan to do on your project. It should be believable and compact mentioning your strategies and goals. Prepare your own queries, plans and mention the tools which you might require. Basically, how you plan to work in order to achieve your target.
- This project proposal will be presented to the fellow KARYA students so that they can choose what they would like to work on and with who.
- Firstly we are going to create a COMMUNAL PROJECT CORPUS related to viral epidemics called the
epidemic50
50 papers on viral epidemics that allow us to test our software and ideas. Everyone will use this to get trained on software, and all software should be able to use it. There will be false positives in it and also problem files.Later we have to analyse it independently and come up with our own corpus. -
ami
based tools for retrieving documents and parts of documents -
ami
sectioning - beta testing
- Then, we would need WORKFLOWS. Something like "workflows-->ami-->commandline(CLI)-->KNIME-->GUI"
- Each of you should install and try
KNIME
an alternate forami
but consists of more tools to work with. Contact the Expert- Clyde. - ENTITY EXTRACTION : finding particular words or phrases in papers, done using
ami
orKNIME
- Natural Language Processing (NLP) but with few aspects it has.
-
R
: Contains tools for summarizing things Jupyter
Keras
- Excel
- SPARQL
- Create a Wiki page for each of these technologies for simplification on Github. (installation and usage)
- RETRIEVAL of papers
- BINARY CLASSIFICATION
- SECTIONING
- IDENTIFY & EXTRACT your information
- Spreadsheets
- Data Displays for scoping review ( Histogram, Timeline, Pie charts etc )
- Go to the TIGR2ESS tutorial on SPARQL & WikiData - an easy way to create dictionaries.
- Create your project proposal for the volunteers.
- Get up to speed with Binary classification using
Python/Keras
,KNIME
andR
. Create a wiki page for Binary Classification.
22nd June 2020
PMR+P+K+Rajan+Ambreen+ Vaishali+ New interns
- welcome new collaborators Zeyang Charles Li and Vanisha Arora
- getting started. This is for interns to add documentation to (https://github.com/petermr/openVirus/wiki/GETTING-STARTED)
- Standup (for those present last meeting)
- what did you do in the last 4 days that helped the team ?
- what are you going to do in the next 3 days that helped the team?
- are you blocked on anything?
- review of minutes
- miniprojects review
- Welcome new interns Charles and Vanisha. Brief introduction given to both about getting started, projects and dictionaries. Projects assigned
- Charles - Non pharmacological interventions
- Vanisha - Testing and Tracing of Viral epidemics
- Install and use
KNIME
followed by its documentation on Wiki
- Analyse the 50 papers on viral epidemics given here https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov
- Create your own spreadsheet in csv format using Excel and extract valuable information from each papers. Tutorial given by PMR using screen sharing during the meet.
- About each paper, create question for analyzing such as "Is the paper about Viral Epidemics?", "Does mention the country where the epidemic took place?", "Does it talk about the drugs used?", "Does it mentions the diseases which co-occur?", "Does it involve other viruses?", "Does it talk about the funders?"
- These spredsheets will be key for comparing results with others. Assess without dictionaries.
- Create queries such as "Does the paper contain annotations (features)?" Features include diagrams, pictures or images.
- Mention human blind annotations: (A) Viral epidemics - yes/no (B) 1 to 7 features present - yes/no (C) Metadata- year of publication (D) Type of paper- research article/abstract/review
- Each intern to create a corpus of 950 articles individually using
amisearch
- Create wiki pages for machine learning, workflows and data analysis tools. Data formats to work on -
R
,Keras
and spreadsheets. - All interns to publish their tool set in miniproject wiki. Create an inventory of tools (for the next meet).
- Explanation by PMR about using xml and json.
- Create your own spreadsheets.
- Install and run
KNIME
. Document your experience. - Devise your project plan and update it on wiki. Create your project tool set.
- Install
R
,Keras
25th June 2020 Thursday
PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha
- Allocation of regular responsibilities
- Standup (for those present last meeting)
- what did you do in the last 4 days that helped the team ?
- what are you going to do in the next 3 days that helped the team?
- are you blocked on anything?
- Record of last meeting
- Priorities:
- nonCov50 data set.
- review of (5) miniprojects (each person to report on their page, resources, progress)
- widening projects to include 3 new participants
- review of (5) dictionaries (each owner to report). Discussion of further dictionaries
- workflow tools (KNIME, Jupyter, ami, etc.). inventory of experience.
- sectioning (PMR) .
ami section
- review of strategy.
- Welcome new intern Sana. Brief introduction given about getting started.
- Task assigned to Vanisha, to create a spreadsheet containing all intern names and their project details.
- discussion about false positives
- BLIND assessment of papers
- Types of papers we all came across- Scientific article/ Abstract only/ Review paper/ Case study, clinical trial or others.
- For the miniproject, get 1000 papers , develop a classification scheme to divide papers into categories, so that people can know.
- For doing Binary classification , we have to split the data set into- "training" , "testing" and "validation"
- Find a tool like
KNIME
orKeras
- Test your classifier and the improve your algorithm
- Perform the classification for features (words, data, diagrams)
- Review of each intern's miniproject- goals, strategies, progress, queries.
- Create a dictionary of your miniproject.
- create a communal dictionary (builtin)
- Each dictionary should have "find ability" , "comprehensiveness" , "syntax" , "maintenance" , "documentation"
- Dictionaries can be created using 3 options: copy from authority (such as ISO for country), copy from wikidata, SPARQL, using list of terms.
- Dictionaries can be found as: inbuilt, contentmine dictionary.
- Every dictionary should have wiki page (total 8) documented the by intern about about how they created it.
- run by
ami section
(automatic) works on PMC papers. It converts JATS-ami->sections - 3 sections of paper- front (bibliography, abstract, journal, title, author, DOI), body (intro, background, methods, experimental, discussion), back (funders, admin, ethics, references, citations, acknowledgment)
-
ami search
annotates body. we need to use a new approach called 'xpath' - a way of navigating sections in a paper, eg: front/article/title
- vanisha to create spreadsheet
- Charles and Sana to come up with an introduction for interns
- Each intern to begin with building their dictionary and create a wiki page to document their methods and experience.
- Each intern to update their miniproject wiki page, what you did? , what is your next steps?, what are you blocked on?
- Perform Binary classification using tools which you prefer
- Create your corpora of 950 papers
29th June 2020
PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha+Sana
- Allocation of regular responsibilities
- record of last meeting
- Standup by each intern
- dictionaries. Please report on 7 public facing dictionaries, especially the core 5 (which INYAS will use) . country, disease, drugs, viruses, funders.
- noncov50 dataset. Report any ongoing problems
- miniprojects . Please report public facing project pages.
- brief review of workflow
- brief review of sectioning (PMR)
- review of strategy.
- multilingual dictionaries (Hindi?)
- Preparing introduction for INYAS interns (getting started)
- To create wiki tool of
ami dict
for creating dictionaries - To create SPARQL wiki tool for extracting wikidata search attributes
- Discussion on false positives and how they can be classified during binary classification
- Review of each intern's dictionary. Each one to create their dictionary's own wiki page. Assigned Ambreen as maintainer of index of dictionaries.
- Review of each intern's miniproject wiki page. Display your corpus of 950 articles on github. (to be continued.)
2nd July 2020
PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha+Sana+INYAS interns
- Allocation of regular responsibilities
- record of last meeting
- Standup by each intern
- welcome to INYAS interns. Review of induction and any problems. Getting_started materials
- review of miniprojects, especially so INYAS can appreciate their roles.
- review of dictionaries. INYAS can immediately have a role in checking dictionaries.
- problems and debugging. (out of memory error)
- Welcome to INYAS interns: Pooja, Urja, Simranleen, Dheeraj and Jitu. Brief introduction by each intern and getting started.
- Discussion on creating dictionary using a text file containing list of terms. Explained by Ambreen and PMR.
- Allocating INYAS interns to their miniprojects.
- Review of PMC papers to explain the interns about sectioning and output of xml files.
- HACKATHON over slack #coordination
- Review of all miniprojects and progress made by mentors and their mentees
- each INYAS student and their mentor should discuss how to create a single page , in Markdown, for Thursday which reports the work to their classmates. The INYAS student should create this but ask for help whenever they need it. It can address:
- what is the aim of the miniproject?
- what resources are you using (don't just give a list; try to write something they would understand).
- what has been done so far (again in terms they would understand)
- mentors now each have a miniproject and a dictionary (possibly two). These should all have the same format and be organized in a consistent directory structure. Work between yourselves to ensure this (i.e. look at each others' miniprojects and dictionaries).
9th July 2020 Thursday
PMR,Priya,Kareena,Rajan,Ambreen,Sana,Vaishali,Vanisha,Charles,INYAS interns,GY
- Record of last hackathon
- Standup by each intern
- INYAS presentations 1-page summary of the project (by INYAS) intern
- summary of progress (by mentor) on (miniproject page)
- KNIME
- ami section/search/amidict
- machine-learning
- Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
- Presentation by each INYAS intern on Monday, To present their review in such a way so that their classmates are able to understand it.
- Rajan to coordinate with people using different OS (Windows10, Mac/Unix, Windows7, Mobile etc). INYAS interns Dheeraj and Om to create a wiki page summarizing the mobile properties (Github and Slack)
- Review of each miniproject (update your project pages with progress made, Create pages for tools which you use)
- Everyone to commit their miniproject data on Github
- Everyone to identify true negatives manually (papers not about viral epidemics)
- Update your
ami
problems on PMR TODO for PMR to take action.
13th july 2020 Monday
PMR,priya,kareena,rajan,ambreen,sana,vaishali,vanisha,charles,INYAS students (6),Clyde
- Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
- Record of last meeting
- Standup by each intern
- Review of all dictionaries by core miniproject owners given together here https://github.com/petermr/openVirus/tree/master/dictionaries/test
- Review of all miniprojects and the progress made (release of dictionary, release of corpus950, release of full.data.Tables using
amisearch
) - Discussion on bringing up a communal project for all INYAS students to create a new dictionary of Indian geo-names (states, cities, etc.) . This will allow us to pinpoint papers describing viral epidemics in Indian regions. Vanisha and Sana will coordinate this. The resulting dictionary will have permanent value as it can support a wide range of projects (e.g. TIGR2ESS crops, climate change, etc.) INYAS students will also continue to be associated with their core project.
16th July 2020 Thursday
PMR,priya,kareena,rajan,sana,vaishali,vanisha,charles,INYAS interns
- Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
- Record of last meeting
- Standup by each core intern + inyas intern
- Discussion on Wikidata volunteers, for starting communication with wikipedia people and editing page(that means getting a wikpedia name)
- Dheeraj, Jitu and Om - to create a wiki page on using software on mobile. Priya to coordinate with them.
- SPARQL and Wikidata issues (if any) and progress made by interns
- Use of wikibase language and label
- Sparql tutorial to be created by all the interns who have used it.
- any other technical issues faced by anyone
- For all those who faced issue with empty cooccurence (bug) after running
ami search
on corpus, To Re-rungit pull
and re-installami
for getting the desired output in cooccurence.
20th July 2020 Monday
PMR,Priya,Ambreen,Vaishali,Charles,Sana,INYAS interns(Urja,Dheeraj,Pooja,Simranleen)
- Reviewing each of their mini-projects by interns => Project review (similar to Code review). The other attendees critiqued the wiki and the presentation.
- Ambreen and PMR explained the importance and use of Smoke test and ML technique in mini-projects.
- Vaishali and Priya done Smoke test for KNIME. Ambreen pursuing with ML technique for her mini-project.
- A separate project for the INYAS interns was called off and were told to pursue with their mentors in their mini-projects.
- Standups given by INYAS interns. They are evolved as "middle management" and the beta-testers in their mini-projects.
- The issues regarding dictionaries were reported.
- Needs for additional tools, especially (AMI, AMIDict) <---> toolBox (Jupyter, R, KNIME) as feature requests will be submitted as Issues.
- The presentations are considered to make as short video clips as part of the output.
23rd July 2020 Thursday
PMR,GY,priya,kareena,rajan,ambreen,sana,vaishali,vanisha,inyas interns(dheeraj,pooja,jitu,simranleen)
- Brief project review by PMR to GY regarding the progress made and upcoming tasks. Each intern to record a 2 minute clip on his/her miniproject and their learning experiences after they joined openVirus. To describe the work they did in this project and how it can help the world. Planning to conduct a live video con session on youtube.
- Review by each core intern and inyas interns about their experiences and ideas if any
- Discussion and solving
ami
issues faced by few - Discussion on SPARQL queries and downloading the .xml file
27th July 2020 Monday
PMR,priya,kareena,rajan,sana,ambreen,vaishali,vanisha,charles, inyas interns (dheeraj,pooja,urja,jitu)
- Review of the 5 main projects:
- country
- disease
- drug
- funder
- virus
- Please be prepared to report:
- analysis of corpus950 (or smaller)
- manual classification
- creation of dictionary
- machine learning
- notebooks
- DICTIONARY
- We should now put our dictionaries in one place, separate from
ami3
and check out regularly. I have started this . It has a semi structured directory of dictionaries in a repository,a symbolic reference that AMI can use (maybe) referencing dictionaries through URLs NOTE: I haven't finished mapping ami names to SPARQL names. We should discuss having default names in SPARQL. Also we should review progress in terms in Hindi, Tamil and other languages. We should now converge on the essential parts of a SPARQL query.
- Allocation of regular responsibilities
- Standup by everyone
- Review of five main mini projects PROGRESS AND PLANNING - country(ambreen), disease (priya), drug (rajan), funder (vaishali), virus (kareena) followed by zoonosis (sana), testing and tracing (vanisha)
- People to report on issues faced during uploading corpus or in getpapers so that PMR can fix it
- If everybody is able to use new ami release
- Review of each dictionary- if each contains name, term, wikidata ID, wikidata label, description, wkipedia URL. Specific entries include ISO3166 code (country), ICD10 code (disease), CrossRef ID (funder), ICTV virus ID (virus)
- Review from each INYAS student about their learning experience, work, progress, if any blockers, work on mobile for jitu, dheeraj and om
- Discussion on open access projects. Arianna Becerril Garria from Mexico.
- Final video clips to be created by each intern and submitted to GY for review by the date 7th Aug
- Discussion on machine learning tools and NLP. Hindi part of speech (POS) tagging to sentences
- Extended discussion on dictionaries, editing the wikipage dict schema, different terms and elements of wikidata
-
ami
commands to create a wikisparql dictionary
30th July 2020
PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,INYAS interns- urja,pooja,dheeraj,simranleen
- Last official meeting with the INYAS students as they complete their four weeks internship. Review by each INYAS student about their learning experiences in this project. GY told them that they can informally continue with their work and attend the meetings if willing to do so in future.
- All interns to prepare for the live videocon meeting to be streamed on youtube on 6th August 2020 Thursday.
- All interns to prepare a 2 minute video clip about their own experiences and review of the project followed by their work. (all videos to be compiled together and directed by Simranleen)
- Discussion by PMR on Open Access, what scientists do and why, how search engines work, other resources such as Redalyc/MX, India and Indonesia rxiv, theses in repositories, data scrape/clean and lots more. We are iterating in the design <-> prototype <-> deployment chain. We have advanced designs for dictionaries and sectioned documents. We have built prototypes and are testing them. This means a small amount of redesign. We now try to share all development on the wiki. Interns to put queries in on the wiki and everyone can comment. This will be particularly important for NLP and machine learning. Most of the NLP and ML tasks can be supported by packages and libraries.
6th August 2020 Thursday
PMR, GY, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj
- Update on dictionaries by each intern (use of wikisparql) (amidict update) To create dictionaries containing synonyms of terms
- LIVE streaming on youtube in the coming week, 1 minute intro/slides.github wiki to be prepared by the interns to give an overview of their miniproject (inyas youtube channel)
- Discussion on retrieval of material/information from pre-existing literature. To clean and annotate the data using tools for analyse and display and later- publish
- Discussion on creating sparql query for languagees other than english MULTILINGUALITY (taken up by rajan in tamil)
- Everyone to re-run ami for updates
- Discussion on creating sparql dict for inclusing AltLabel and synonyms.
- Overview by PMR on machine learning tools and progress
10th August 2020 Monday
PMR, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj
- Running different queries on sparql for creating a dictionary (SPARQL to AMI)
- UPDATE by each intern on miniproject (Google workbook created, taken up by Ambreen) "Our Progress so far" https://docs.google.com/spreadsheets/d/1DI3sJnLq7MntJElah-xD4crHVEF-gLpkAL_-Qp35qx0/edit#gid=0
- Machine learning tools, brief explanation given by Ambreen
- Data analyses tools discussion by PMR
- Discussion on update, progress, usage, blockers in
ami section
andami summary
12th August 2020 Wednesday
PMR, GY, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj,jitu ram,urja,pooja,om prakash, simranleen
- First live meeting session conducted by the openVirus team, (INYAS-KARYA) including the INYAS interns, streamed on the INYAS youtube channel https://www.youtube.com/watch?v=XiTngk-POm8
17th August 2020 Monday
PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,sana
- Updates on status
- Standup
- Debugging of dictionaries
- Debugging of search
- Progress on machine learning.
- Invitation from COAR for Sept 10th 2020 (shared by PMR, everyone is welcome to attend, Ambreen to present )
- See https://github.com/petermr/openVirus/wiki/Presentation-COAR
20th August 2020 Thursday
PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles
- standups
- a brief review of internships and newcomers
- ideas from interns about the COAR presentation
- movie status to be updated by Simranleen soon
- technical issues that affect more than one person
- Interns facing issue with ami search : to use minicorpus10 for tutorials (testing the dictionary against small corpus)
- Discussed about ami summary tool
- About the importance of Open Science
- PMR: Idea of adding tooltips to ami search tables in different languages.
- Ambreen to present a workshop on Jupyter Notebook in the next meet.
- Thanks to Dheeraj for adding the concept of multilinguality to the dictionaries and for staying with us.
24th August 2020 Monday
PMR,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,dheeraj
- Review by each intern, how helpful is the project corpus? Any useful information to mention?
-
ami
update: --delete command to clear the edits (not manually) - Representing the results/info graphically eg: name of funder- logo, use of statistics to display data
- Important terms/ commonest terms you find in corpus or dictionaries, SUMMARIZE these in pictorial representation form (use wikipedia)
-
ami
problems:
- install ami
- run acceptance tests
- try tutorial for standard dictionaries
- create own dict and validate
- run standard corpus against standard dict
- run standard corpus against own dict
- Put in automated validation, ami search -> results.xml ( test that results.xml exist, used by cooccurrence and data tables)
- Screen sharing by Ambreen, how to execute code in jupyter.(codes, function,cleantext,libraries)
- Classify data in excel(csv)
27th August 2020 Thursday
PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles
- Catchup by all interns and review of each miniproject
- Discussion on COAR Presentation by Ambreen (reviews by others on ppt)
- Discussion by PMR on Cambridge Hackathon (online hackathon on genomics, bioinformatics, databses) Project: Cambridge-India "openVirus"
- Put together software, databases, dictionaries, so that people can see it in virtual environment- (Virtual sensors)
- PMR Debugging people's problems using share screen
7th September 2020 Monday
PMR,priya,kareena,rajan,ambreen,vaishali,vanisha,charles, dheeraj, anugrah, shweta
- Welcome new interns Anugrah and Shweta
- Standup by interns (those who are present)
- Review for COAR presentation (10th) given by Ambreen. Registration to be done by each intern
- Changes made in COAR ppt
- Debugging people's problems and resolving issues
PMR, Rajan, Vanisha, Shweata, Ambreen, Anugrah, Ayush, Mukul, Dheeraj
- Future directions to the project. Labelling dictionaries with associated concepts(relative terms, broader terms). Finding the main subject in relevant papers.
- Preprint, Introduction of Hypergraph.
- Review of Ambreen's Jupyter Notebook(ML).
- New members were allocated mini-projects. Ayush to work with Ambreen on Countries, and Mukul to work with Kareena on Virus.
- openVirus repository is becoming very huge. New GitHub repository specially dedicated to Dictionaries and Mini-corpora.
- New Dictionary Manager - Rajan
- New repository exclusively for dictionaries
- Ayush went over the codes he had written to display frequency from results.xml
- Enhance the dashboard to include links to Wikidata, and make it multilingual.
- Discussed the rough outline for Wikicite presentation. More information can be found here
PMR, GY, Shweata, Aishwarya, Ambreen, Ayush, Vanisha, Vaishali, Dheeraj, Kareena, Rajan, Anugrah
-
Discussed on the new and exciting directions to the project. The mini-projects would continue to be worked upon. Along with that, we would also have the Plant Science component to the project, in the future.
-
We will also have new software projects:
-
getpapers
in Python -
ami-search
in Python -
ami-words
in Python - display in Python
- containerisation using Docker
- Dictionary testing
Each of these software projects will have an issue, and it will have to:
- collect mini teams
- specify goals in detail
- propose an architecture
- build proof-of-concept (PoC)
- Test-driven development (TDD)
- Briefly went over the test Jupyter Notebook, PMR had written. We then reviewed various libraries useful for our purposes of text mining. Link to the notebook discussed can be found, here
PMR, Ayush, Dheeraj, Vanisha, Vaishali, Ayush, Shweata, Mukul, Rajan
- Getting to know each other's computational backgrounds.
- New Repository for development purposes
-
Algorithms + Data Structure(
ami
dictionaries, CProject, in our case) = Programs -
JATS(Journal Archiving and Interchange Tag Set) https://jats.nlm.nih.gov/archiving/tag-library/1.3d1/element/arc-elem-sec-intro.html
-
What are the problems that we run into when we use terms(given by the dictionary) to search the papers?
- Synonyms- That's where Wikidata is helpful
- Not knowing the context in which the terms are used
- General concepts(like 'illness' or anything that can't be represented in a term) can't be retrieved from papers easily
-
EPMC(where you get data from, in JATS) -> clean, classify -> text mining
Text mining tools in python-nltk
textblob
,glob
-
Went over PMR's Jupyter Notebook. https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb
- Presentation(MPhil Computational Biology Seminar Series) on Wednesday. PMR, Ambreen and Shweata to present.
- Discussed the current status of each member's project
- https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
- https://docs.python.org/3/howto/unicode.html
- Have classes in Python. Classes have attributes and methods.
- Everybody shared their ideas for new software tools, and describe it in a sentence. The following are the ideas that came up.
- Words tool (Keywords and Stopwords) - Ayush and Aishwarya
- Dictionary Editor - Shweata
- Sections Search - PMR
- Summarizer - Mukul
- Wikipedia links - Rajan
- Enhanced Display - Dheeraj
Dictionary Editor, Sections Search and Words tool would be the most relevant pertaining to the current requirements of the project.
We first decided to work on Dictionary Editor.
- Dictionary Editor: Each of the members present in the meeting came up with the below list of items which the Dictionary Editor needs to encompass.
- Remove unnecessary terms
- Duplicate terms
- Heteronyms
- Collaborative editor
- Context
- Versioning system
Tasks:
- Ayush: To find more about Version Control in GitHub
- Unit Test
- Dictionary Editor- Opened an issue
- Review of what a Dictionary is
- Dictionaries are in .XML format
- Root element is Dictionary, and it must have a title. And it's got a number of entry elements. Entry element has a large number of attributes.
- Synonyms are child elements under entry
-
update.ipynb
A Jupyter Notebook to validate the dictionaries. - What we hope to do is to validate our dictionary against the openvirus schema in Jupyter Notebook
- Find out where the latest dictionaries are. Moved the latest ones to
dictionary
repository. Moved the new ones to Dictionary repository. - We then checked and validated those dictionaries using the Jupyter Notebook Peter had written (available on our dictionary repository.
- Dictionary specification: PMR created an issue https://github.com/petermr/dictionary/issues/2
- What a dictionary contains- Dictionary elements: Attributes, child elements, entry, etc.
- How software is developed in practice- How "Customers" provide formal requirements and the implementer creates and test the code.
- A brief discussion on Regular expressions (RegExp)
- Miniprojects updates:
- Aishwarya to work with Dheeraj on the miniproject: "Diseases".
- Ayush to work with Vanisha on "Test and trace".
- Anugrah to work on " Non-pharmaceutical interventions".
- Discussed the recent problem that we encountered with SPARQL query, as reported by Dheeraj. More info, here(https://www.wikidata.org/wiki/Wikidata:Request_a_query#Re-running_queries_on_earlier_versions_of_Wikidata)
- We should record our Wikidata Queries so that we don't encounter similar problems.
- Reviewed Test and Trace dictionary created by Vanisha. Synonyms and language equivalents need to be added.
- Reviewed Country, Organisation and Disease dictionary as well.
- Related items need to be added in the organisation dictionary.
- PMR raised several questions about Wikidata to the Wikimedia community.
- Communal Tasks:
- Retrieve entries for the list of Q Ids to add synonyms, language equivalents, etc.
- Ambreen to draft a list of ancillary files for creating and maintaining dictionaries.
- Examples:
- Jupyter Notebook
- MD for explaining the files, names and purposes. (Converge on a communal naming scheme),
- SPARQL query,
- SPARQL-XML output
- We had a chat about the importance of AI in Science. We are building some of the foundations of this revolution. We also discussed about DeepMind, a recent breakthrough that came about in the field of protein folding problem.
- People were added to the dictionary repository.
- github.com/petermr/dictionary/issues/3 We now have a way to save all our queries (with the help of RESTful URL) in the dictionary itself. Look at the comment of this issue to know more.
- We were joined by Matthew Dunstan, today. Peter demonstrated the progress on the battery project so far.
- We are starting to come up with a unified Dictionary Naming Scheme. Follow the link to know more. https://github.com/petermr/dictionary/wiki/Dictionary:-Naming-Scheme
- One person running a project is always fragile. Communal projects are the way to go.
- Mini-Tech Project:
- Dictionary-based search
- Updating dictionaries
- Divide ourselves into groups to work on the mini-tech project.
Updating Dictionary | Dictionary-Based Search |
---|---|
Shweata | Aishwarya |
Dheeraj | Ayush |
Ambreen | Anugrah |
Rajan | Vanisha |
New PlantScience Intern | Vaishali |
- Ayush, Ambreen: Tech-Lead
- Aishwarya, Shweata: Project Manager ( Record, Keeping things up to date. Are the unit test* passings? Tutorials? Where people are at, right now? and so on)
- *Unit test is important. Automated tests on an application to ensure that the application meets the intended design.
- alpha testing -> Preliminary tests for software.
- beta testing -> Find and report errors. The errors could be Keyboards, Character, Time Zones, date, File system.
- REPORT PROBLEM. Don't try to fix it yourself.
- Wiki pages for each of the two our tech projects
- Requirments for Search Mini-Project:
- a way of determining the type of an article (scientific article, review, comment, editorial ...).
- a way of identifying and (re)naming sections
- PMR: We need to agree on data and not the code.
- PMR suggested the use of PyCharm for writing code
- Ambreen demonstrated Smart Sheep Breeder (Decision Support System developed by her), received feedback.
- All new topics will be discussed in discussions rather than slack.
- Discussed the use of unittests and PMR demonstrated with an example (test code)
- Jupyter Notebook, though useful in several fronts, isn't scalable.
- https://dev.to/codemouse92/series/290 -> We will follow the structures given in this series.
- Structuring projects is really important and often not talked.
- PMR created a new project in dictionary repository.
- unittest
Records for the meetings are now moved to https://github.com/petermr/dictionary/wiki/Records-of-Meetings
All further meetings are recorded in the dictionary repository.