Skip to content

Coding Sessions: Meeting Record

ShweataNHegde edited this page Jun 23, 2021 · 14 revisions

Meeting Record 1

Date: June 1 2021

Participants: PMR, Shweata, Chaitanya, Sagar, Radhu

Key Points

  • Changes were made to ami_gui.py and search_lib.py in order to add functionality to the pre-existing code.
  • PMR talked about the importance of debugging and logging in python.
  • Different development methodologies were discussed, pair programming being the primary one. Test-driven development was also discussed briefly.
  • ami_gui.py was tested with different ami dictionaries (such as plant activity and plant compound). A python dictionary was created to store the data about hits/matches and where they occur in different ctrees and sections, which was then printed out. We used icecream, a python package to pretty-print the python dictionary we created earlier.
  • The next step would be to store the data on hits in a .xml or .json format.

Immediate Tasks

  • Radhu and others: Smoke test the latest ami_gui.py
  • Chaitanya: Explore pygetpapers

Meeting Record 2

Date: June 7 2021

Attendees: PMR, Kanishka, Radhu, Sagar, Shweata, Bhavini, Chaitanya, Vasant

Key Points:

  • For those interns, whose internship is about to come to an end, focus on your deliverables and thesis. Also, prepare a 2-3 minute video presentation explaining your mini project /literature for interns who join us in the future(Keep it simple, so that it can be understood easily by people from all domains).
    "Your research is only as important as your documentation of it." -PMR

  • Initiated the process of preparing a workflow for the machine learning mini project which aims to classify text, develop labels and cluster similar scientific literature.

  • We are, potentially, looking to do text classification at:

    • document level: how is a given paper similar to other papers
    • section level: how are sections within a paper different from each other based on the words used.
  • Paragraphs are well-defined, both in EPMC and PDFs, and will be our basic unit.

  • Standard machine learning methods will be used as scientific literature is relatively much more structured as compared to some other text online, such as tweets(which might require deep learning models). Tools such as TF-IDF and count vectorizer might be potentially used for our purposes.

  • Step 1 of workflow: Come up with a list of useful labels.

TODO:

  • Improve documentation of pyami. Emphasis on debugging.
  • Jupyter Notebooks can come in handy for the machine learning project. Merits: Ease of packaging.

Meeting Record 3

Date : June 16 2021

Participants : Shweata, Bhavini, Ayush, Radhu, Sagar, PMR

Key Points :-

  • Code review of ethic statements named entity recognition miniproject using spacy by Shweata. Debugging and logging of the code was performed. Docstrings were added to improve documentation.

Date: June 23, 2021

Agenda

PLEASE SUBMIT REQUESTS FOR ITEMS

  • These can be:
    • things you have done (e.g. systems to present)
    • code reviews
    • major problems
    • discussions of style and conformance
  • Please support these with Wiki pages, Github code, etc.
  • Possible items (please indicate whether you wish to present.) Arbitrary order.
  • pygetpapers (@Ayush Garg)
  • pyami config.ini files and strategy (@Peter Murray-Rust); these will provide symbolic names for
    • dictionaries
    • projects
    • support files (e.g. stopwords)
  • These will form the basis of pyami software and require individual users to use config files that they can configure to point to the projects and dictionaries.
  • review of ML software (@Chaitanya Sharma). Please include what data will be input and what ancillary files are needed
  • review of data display/analysis (@Bhavini Malhotra). Please include what data will be input and what ancillary files are needed

Bug tracking

Participants: PMR, Ayush, Bhavini, Chaitanya, Radhu, Sagar, Shweata,

Clone this wiki locally