chemlift

Chemical language interfaced predictions using large language models.

💪 Getting Started

With ChemLIFT you can use large language models to make predictions on chemical data. You can use two different approaches:

Few-shot learning: Provide a few examples in the prompt along with the points you want to predict and the model will learn to predict the property of interest.
Fine-tuning: Fine-tune a large language model on a dataset of your choice and use it to make predictions.

Fine-tuning updates the weights of the model, while few-shot learning does not.

Few-shot learning

from chemlift.icl.fewshotclassifier import FewShotClassifier
from langchain.llms import OpenAI

llm = OpenAI()
fsc = FewShotClassifier(llm, property_name='bandgap')

# Train on a few examples
fsc.fit(['ethane', 'propane', 'butane'], [0,1,0])

# Predict on a few more
fsc.predict(['pentane', 'hexane', 'heptane'])

Fine-tuning

from chemlift.finetuning.classifier import ChemLIFTClassifierFactory

model = ChemLIFTClassifierFactory('property name',
                                    model_name='EleutherAI/pythia-1b-deduped').create_model()
model.fit(X, y)
model.predict(X)

🚀 Installation

The most recent code and data can be installed directly from GitHub with:

$ pip install git+https://github.com/lamalab-org/chemlift.git

👐 Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.

👋 Attribution

⚖️ License

The code in this package is licensed under the MIT License.

📖 Citation

Citation goes here!

@article{Jablonka_2023,
    doi = {10.26434/chemrxiv-2023-fw8n4},
    url = {https://doi.org/10.26434%2Fchemrxiv-2023-fw8n4},
    year = 2023,
    month = {feb},
    publisher = {American Chemical Society ({ACS})},
    author = {Kevin Maik Jablonka and Philippe Schwaller and Andres Ortega-Guerrero and Berend Smit},
    title = {Is {GPT}-3 all you need for low-data discovery in chemistry?}
}

🎁 Support

The work of the LAMALab is supported by the Carl-Zeiss foundation.

In addition, the work was supported by the MARVEL National Centre for Competence in Research funded by the Swiss National Science Foundation (grant agreement ID 51NF40-182892). In addition, we acknoweledge support by the USorb-DAC Project, which is funded by a grant from The Grantham Foundation for the Protection of the Environment to RMI’s climate tech accelerator program, Third Derivative.

🛠️ For Developers

See developer instructions

The final section of the README is for if you want to get involved by making a code contribution.

Development Installation

To install in development mode, use the following:

$ git clone git+https://github.com/lamalab-org/chemlift.git
$ cd chemlift
$ pip install -e .

🥼 Testing

After cloning the repository and installing tox with pip install tox, the unit tests in the tests/ folder can be run reproducibly with:

$ tox

Additionally, these tests are automatically re-run with each commit in a GitHub Action.

📖 Building the Documentation

The documentation can be built locally using the following:

$ git clone git+https://github.com/lamalab-org/chemlift.git
$ cd chemlift
$ tox -e docs
$ open docs/build/html/index.html

The documentation automatically installs the package as well as the docs extra specified in the setup.cfg. sphinx plugins like texext can be added there. Additionally, they need to be added to the extensions list in docs/source/conf.py.

📦 Making a Release

After installing the package in development mode and installing tox with pip install tox, the commands for making a new release are contained within the finish environment in tox.ini. Run the following from the shell:

$ tox -e finish

This script does the following:

Uses Bump2Version to switch the version number in the setup.cfg, src/chemlift/version.py, and docs/source/conf.py to not have the -dev suffix
Packages the code in both a tar archive and a wheel using build
Uploads to PyPI using twine. Be sure to have a .pypirc file configured to avoid the need for manual input at this step
Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can use tox -e bumpversion -- minor after.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
docs/source		docs/source
experiments		experiments
src/chemlift		src/chemlift
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.cruft.json		.cruft.json
.gitignore		.gitignore
.langchain.db		.langchain.db
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chemlift

💪 Getting Started

Few-shot learning

Fine-tuning

🚀 Installation

👐 Contributing

👋 Attribution

⚖️ License

📖 Citation

🎁 Support

🛠️ For Developers

Development Installation

🥼 Testing

📖 Building the Documentation

📦 Making a Release

About

Releases 1

Packages

Languages

License

lamalab-org/chemlift

Folders and files

Latest commit

History

Repository files navigation

chemlift

💪 Getting Started

Few-shot learning

Fine-tuning

🚀 Installation

👐 Contributing

👋 Attribution

⚖️ License

📖 Citation

🎁 Support

🛠️ For Developers

Development Installation

🥼 Testing

📖 Building the Documentation

📦 Making a Release

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages