Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Automated Layouts for Pathways with Large-Language Model (e.g., ChatGPT) Planning #232

Open
cannin opened this issue Jan 12, 2024 · 3 comments

Comments

@cannin
Copy link

cannin commented Jan 12, 2024

Background

Pathway diagrams help researchers understand complex biological processes (i.e., pathways). The Systems Biology Graphical Notation (SBGN, https://sbgn.github.io/) is a formalism with a set of interconnected tools and file formats (SBGNML) for generating diagrams of these processes. A lot of pathway content exists in textual databases and automated layout of this pathway content can be challenging. Manually laid out pathways tend to convey a specific narrative that is lost when using automated layout algorithms that lack understanding of biology (https://academic.oup.com/bib/article/22/5/bbab103/6217719).

Large-Language Models (LLMs, e.g., ChatGPT, LLaMA) and Multimodal (GPT-4V, LLaVA) have been used for a variety of tasks: responding to questions, writing content, etc thanks to the huge abundance of text content on which it has been trained. Using text-based formats, LLMs can also generate diagrams (https://www.mermaidchart.com/blog/posts/mermaid-chart-chatgpt-plugin-combines-generative-ai-and-smart-diagramming). Separately, ChatGPT and related models have included in their training data SBGN content thanks to diagrams rendered in the SBGNML format.

Recent research has shown that LLMs can be leveraged to aid in diagram generation and layout (https://github.com/aszala/DiagrammerGPT) through a two-stage process (planning then generation).

Goal

The goal is to utilize LLMs (e.g., ChatGPT) to work on a pipeline to aid in the automatic layout of SBGN diagrams.

Difficulty

Easy-Medium; Easy to start, difficult to produce well

Size and Length of Project

medium: 175 hours
12 weeks preferred

Skills

Python

Public Repository

Potential Mentors

Augustin Luna
Adrien Rougny

@Raya679
Copy link

Raya679 commented Jan 13, 2024

Hello @cannin,
I am Raya Chakravarty, currently pursuing my BTech in Computer Science. I am particularly interested in this issue and would like to contribute to this project during the GSOC program.

I have prior experience with Large Language Models (LLMs) and have developed a Healthcare Chatbot by fine-tuning LLMs, specifically Llama.

I am going through the resources and links you have provided above. Currently, I am exploring the SBGN Documentation.
Are there any additional tasks you would like me to undertake apart from these?

@7070Shreyash
Copy link

Hey @cannin , My name is Shreyash, and I'm a B.Tech CSE student with proficiency in python and Machine Learning/ Deep Learning. I also have experience with Large Language Models (LLMs).

Having reviewed the project goal and provided resources, I'm keenly interested in contributing to this issue through the GSoC program. I'm currently immersed in the documentation and links, and I'm eager to put my skills to use.

Thanks

@khanspers
Copy link
Contributor

khanspers commented Feb 22, 2024

NRNB has been accepted as a mentoring organization for GSoC 2024. The contributor application period is March 18 – April 2. Here are some useful links:

GSoC contributor guide
NRNB project proposal template
Eligibility requirements
Full program timeline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants