[2024.12.04] Our PodGPT preprint is available online! Please check it!
[2024.7.14] Our AI Platform PodGPT is publicly available. It is an online platform for deploying our latest multimodal foundation models for STEMM education and research. Please try it out if you are interested!
[2024.7.12] We are releasing a new benchmark encompassing the latest USMLE Step 1, Step 2, Step 3, and Ethics to further advance the filed. Check our database here.
[2024.7.11] We open-sourced the source codes of our PodGPT: STEMM LLMs in your pocket and benchmarking multilingual STEMM LLMs.
- PodGPT
- Installation
- Quick Start
- Performance Evaluation
- Dataset Description
- Benchmarks and Results
- Real-world Deployment
- Automatic Speech Recognition
- Dataset Builder
- Upload and Download Models
- Structure of the Code
- Citation
- Contact
- Contribution
- Acknowledgement
Our proposed PodGPT computational framework for research and education
pip install -r requirements.txt
For lightweight models (2B, 7B, 8B, and 9B), we optimize the entire model. Please check and setup hyperparameters and Hugging Face READ/WRITE Tokens in config_small.yml.
python main_small.py
For lager and heavy models (>9B), we optimize the Low-rank Adapter (LoRA). Please check and setup hyperparameters and Hugging Face READ/WRITE Token in config_large.yml.
python main_large.py
We also provide support for quantizing larger models, e.g., LLaMA 3.1 70B model, using the GPTQ algorithm and then optimizing the LoRA. The large models can be deployed on consumer GPUs after quantization.
We can directly use the Hugging Face transformers package to conduct quantization.
python quantization_HF.py --repo "meta-llama/Meta-Llama-3.1-70B-Instruct" --bits 4 --group_size 128
Or, we enable the Python GPTQModel package to conduct quantization.
pip install -v gptqmodel --no-build-isolation
Then,
python quantization_GPTQModel.py "meta-llama/Llama-3.3-70B-Instruct" "./gptq_model" --bits 4 --group_size 128 --seqlen 2048 --damp 0.01 --desc_act 1 --dtype bfloat16
Alternatively, we also provide a quantization script using the Python AutoGPTQ package.
python quantization.py "meta-llama/Meta-Llama-3.1-70B-Instruct" "./gptq_model" --bits 4 --group_size 128 --desc_act 1 --dtype bfloat16 --seqlen 2048 --damp 0.01
Then, we need to upload the model to Hugging Face, for example,
python upload_quantized_model.py --repo "shuyuej/MedLLaMA3-70B-BASE-MODEL-QUANT" --folder_path "./gptq_model"
Lastly, we optimize the LoRA module,
python main_quantization.py
Special Notice:
- Check this solution if you cannot successfully start the model training.
- Check this solution if your adapters cannot be saved due to PEFT.
- There are many unexpected issues for model quantization as well as model training, checkpoint saving, and vLLM inference. Please submit a GitHub issue if you cannot solve it. We should meet all the problems before in terms of single-GPU and distributed-GPU, e.g., 4 A100 80G GPUs, settings.
All inferences are conducted using the vLLM engine.
We use inference.py to sequentially evaluate the performance of multiple checkpoints (models).
Please check here for more information.
We simply use Directly answer the best option:
instead of Answer:
to better guide LLMs to generate the best option
and to easier extract the best option from the responses.
Please modify these lines if you wanna try other prompts.
english_prompt = "Directly answer the best option:"
english_prompt_pubmedqa = "Directly answer yes/no/maybe:"
hindi_prompt = "सीधे सबसे अच्छे विकल्प के साथ जवाब दें:"
french_prompt = "Répondez directement avec la meilleure option:"
spanish_prompt = "Responde directamente con la mejor opción:"
chinese_prompt = "直接回答最优选项:"
Sequentially evaluate the performance of multiple checkpoints (models).
Please note that we use --eval_pretrain
to indicate whether to evaluate the original pre-trained model.
python inference.py --mode small --eval_pretrain True --id 35166 52749 70332 87915
We also offer support for running OpenAI ChatGPT inference using API. Please enter your OpenAI API Key here.
Warning
Please note that OpenAI ChatGPT API is extremely expensive.
Please only use it if you have a budget for it!
python inference.py --mode chatgpt
Please follow our instructions to transcribe your own podcasts and build your own dataset.
The podcasts data used for the continual pre-training of PodGPT:
We utilized a comprehensive set of medical benchmarks from the most widely spoken languages in the world, including English, Mandarin, French, Spanish, and Hindi.
Language | Dataset | # test examples | # of choices | Link | Ref |
---|---|---|---|---|---|
English | MedExpQA | 125 | 5 | Link | Paper |
MedQA | 1273 | 4 | Link | Paper | |
MedMCQA | 4183 | 4 | Link | Paper | |
PubMedQA | 500 | 3 | Link | Paper | |
MMLU - Anatomy | 135 | 4 | Link | Paper | |
MMLU - Clinical Knowledge | 265 | 4 | Link | Paper | |
MMLU - College Biology | 144 | 4 | Link | Paper | |
MMLU - College Medicine | 173 | 4 | Link | Paper | |
MMLU - Medical Genetics | 100 | 4 | Link | Paper | |
MMLU - Professional Medicine | 272 | 4 | Link | Paper | |
French | MedExpQA | 125 | 5 | Link | Paper |
MedMCQA | 622 | 5 | Link | Paper | |
MMLU - Anatomy | 135 | 4 | Link | Paper | |
MMLU - Clinical Knowledge | 265 | 4 | Link | Paper | |
MMLU - College Biology | 144 | 4 | Link | Paper | |
MMLU - College Medicine | 173 | 4 | Link | Paper | |
MMLU - Medical Genetics | 100 | 4 | Link | Paper | |
MMLU - Professional Medicine | 272 | 4 | Link | Paper | |
Spanish | HEAD-QA | 2742 | 4 | Link | Paper |
MedExpQA | 125 | 5 | Link | Paper | |
MMLU - Anatomy | 135 | 4 | Link | Paper | |
MMLU - Clinical Knowledge | 265 | 4 | Link | Paper | |
MMLU - College Biology | 144 | 4 | Link | Paper | |
MMLU - College Medicine | 173 | 4 | Link | Paper | |
MMLU - Medical Genetics | 100 | 4 | Link | Paper | |
MMLU - Professional Medicine | 272 | 4 | Link | Paper | |
Chinese | MedQA-MCMLE | 3426 | 4 | Link | Paper |
CMMLU - Anatomy | 148 | 4 | Link | Paper | |
CMMLU - Clinical Knowledge | 237 | 4 | Link | Paper | |
CMMLU - College Medicine | 273 | 4 | Link | Paper | |
CMMLU - Medical Genetics | 176 | 4 | Link | Paper | |
CMMLU - Traditional Chinese Medicine | 185 | 4 | Link | Paper | |
CMMLU - Virology | 169 | 4 | Link | Paper | |
Hindi | MMLU - Anatomy | 135 | 4 | Link | Paper |
MMLU - Clinical Knowledge | 265 | 4 | Link | Paper | |
MMLU - College Biology | 144 | 4 | Link | Paper | |
MMLU - College Medicine | 173 | 4 | Link | Paper | |
MMLU - Medical Genetics | 100 | 4 | Link | Paper | |
MMLU - Professional Medicine | 272 | 4 | Link | Paper |
For real-world deployment, please refer to the vLLM Distributed Inference and Serving and OpenAI Compatible Server.
In this file, we provide Automatic Speech Recognition (ASR) service.
python audio2text.py
We used the following codes to pre-process our transcripts and generate the training dataset.
python database_builder.py
In the scripts folder, we offer support for both uploading and downloading models.
To upload your checkpoints to Hugging Face model repo,
python upload_model.py --repo "shuyuej/DrGemma2B" --id 35166 52749 70332 87915
To download your model or files from Hugging Face repo,
python download_model.py --repo "shuyuej/DrGemma2B" --repo_type "model" --save_dir "./save_folder"
At the root of the project, you will see:
├── config_benchmark.yml
├── config_chatgpt.yml
├── config_large.yml
├── config_quantization.yml
├── config_small.yml
├── main_large.py
├── main_quantization.py
├── main_small.py
├── lib
│ ├── data_manager.py
│ ├── evaluation.py
│ ├── model_loader_large.py
│ ├── model_loader_quantization.py
│ └── model_loader_small.py
├── inference
│ └── inference.py
├── quantization
│ ├── model_split.py
│ ├── quantization.py
│ ├── quantization_HF.py
│ └── upload_quantized_model.py
├── download_files
│ ├── download_model_from_hf.py
│ └── download_model_to_local.py
├── requirements.txt
├── benchmark
├── results
├── save_folder
├── scripts
│ ├── audio2text.py
│ ├── database_builder.py
│ ├── download_model.py
│ └── upload_model.py
└── utils
├── answer_utils.py
├── benchmark_utils.py
├── eval_utils.py
└── utils.py
If you find our work useful in your research, please consider citing it in your publications. We provide a BibTeX entry below.
@article {Jia2024podgpt,
author = {Jia, Shuyue and Bit, Subhrangshu and Searls, Edward and Lauber, Meagan V. and Claus, Lindsey A. and Fan, Pengrui and Jasodanand, Varuna H. and Veerapaneni, Divya and Wang, William M. and Au, Rhoda and Kolachalama, Vijaya B.},
title = {{PodGPT}: An audio-augmented large language model for research and education},
elocation-id = {2024.07.11.24310304},
year = {2024},
doi = {10.1101/2024.07.11.24310304},
publisher = {Cold Spring Harbor Laboratory Press},
URL = {https://www.medrxiv.org/content/early/2024/11/27/2024.07.11.24310304},
eprint = {https://www.medrxiv.org/content/early/2024/11/27/2024.07.11.24310304.full.pdf},
journal = {medRxiv}
}
Core Contributor and Maintainer (Equal Contributions):
Database Contributor and Maintainer:
If you have any questions, please drop us an email at [email protected], [email protected], and [email protected].
We always welcome contributions to help make PodGPT better. If you would like to contribute, please submit a pull request.
This repository is maintained by members of the Kolachalama Laboratory.