A Python utility for wrapping Rosetta command line tools.
RosettaPy
is a Python module designed to locate Rosetta biomolecular modeling suite binaries that follow a specific naming pattern and execute Rosetta in command line. The module includes:
- An object-oriented
RosettaFinder
class to search for binaries. - A
RosettaBinary
dataclass to represent the binary and its attributes. - A command-line wrapper dataclass
Rosetta
for handling Rosetta runs. - A
RosettaScriptsVariableGroup
dataclass to represent Rosetta scripts variables. - A simplified result analyzer
RosettaEnergyUnitAnalyser
to read and interpret Rosetta output score files. - A series of example applications that follow the design elements and patterns described above.
- PROSS
- FastRelax
- RosettaLigand
- Supercharge
- MutateRelax
- Cartesian ddG (on the way)
- Unit tests to ensure reliability and correctness.
- Flexible Binary Search: Finds Rosetta binaries based on their naming convention.
- Platform Support: Supports Linux and macOS operating systems.
- Customizable Search Paths: Allows specification of custom directories to search.
- Structured Binary Representation: Uses a dataclass to encapsulate binary attributes.
- Command-Line Shortcut: Provides a quick way to find binaries via the command line.
- Available on PyPI: Installable via
pip
without the need to clone the repository. - Unit Tested: Includes tests for both classes to ensure functionality.
The binaries are expected to follow this naming pattern:
rosetta_scripts[[.mode].oscompilerrelease]
- Binary Name:
rosetta_scripts
(default) or specified. - Mode (optional):
default
,mpi
, orstatic
. - OS (optional):
linux
ormacos
. - Compiler (optional):
gcc
orclang
. - Release (optional):
release
ordebug
.
Examples of valid binary filenames:
rosetta_scripts
(dockerized Rosetta)rosetta_scripts.linuxgccrelease
rosetta_scripts.mpi.macosclangdebug
rosetta_scripts.static.linuxgccrelease
Ensure you have Python 3.8 or higher installed.
You can install RosettaPy
directly from PyPI:
pip install RosettaPy -U
RosettaPy
provides a command-line shortcut to quickly locate Rosetta binaries.
After installing RosettaPy
, you can use the whichrosetta
command in your terminal.
whichrosetta <binary_name>
Example:
To find the relax
binary:
relax_bin=$(whichrosetta relax)
echo $relax_bin
This command assigns the full path of the relax
binary to the relax_bin
variable and prints it.
You can also use RosettaPy
in your Python scripts.
from RosettaPy import RosettaFinder, RosettaBinary
# Initialize the finder (optional custom search path)
finder = RosettaFinder(search_path='/custom/path/to/rosetta/bin')
# Find the binary (default is 'rosetta_scripts')
rosetta_binary = finder.find_binary('rosetta_scripts')
# Access binary attributes
print(f"Binary Name: {rosetta_binary.binary_name}")
print(f"Mode: {rosetta_binary.mode}")
print(f"OS: {rosetta_binary.os}")
print(f"Compiler: {rosetta_binary.compiler}")
print(f"Release: {rosetta_binary.release}")
print(f"Full Path: {rosetta_binary.full_path}")
# Imports
from RosettaPy import Rosetta, RosettaScriptsVariableGroup, RosettaEnergyUnitAnalyser,
# Create a Rosetta object with the desired parameters
rosetta = Rosetta(
bin="rosetta_scripts",
flags=[...],
opts=[
"-in:file:s", os.path.abspath(pdb),
"-parser:protocol", "/path/to/my_rosetta_scripts.xml",
],
output_dir=...,
save_all_together=True,
job_id=...,
)
# Run with the Rosetta tasks
tasks = [ # Create tasks for each variant
{
"rsv": RosettaScriptsVariableGroup.from_dict(
{
"var1": ...,
"var2": ...,
"var3": ...,
}
),
"-out:file:scorefile": f"{variant}.sc",
"-out:prefix": f"{variant}.",
}
for variant in variants
]
# Run the tasks
rosetta.run(inputs=tasks)
# Or create a distributed runs with structure labels (-nstruct)
options=[...] # Passing an optional list of options that will be used to all structure models
rosetta.run(nstruct=nstruct, inputs=options)
# Analyze the results
analyser = RosettaEnergyUnitAnalyser(score_file=rosetta.output_scorefile_dir)
best_hit = analyser.best_decoy
pdb_path = os.path.join(rosetta.output_pdb_dir, f'{best_hit["decoy"]}.pdb')
print("Analysis of the best decoy:")
print("-" * 79)
print(analyser.df.sort_values(by=analyser.score_term))
print("-" * 79)
print(f'Best Hit on this run: {best_hit["decoy"]} - {best_hit["score"]}: {pdb_path}')
#
The RosettaFinder
searches the following directories by default:
PATH
, which is commonly used in dockerized Rosetta image.- The path specified in the
ROSETTA_BIN
environment variable. ROSETTA3/bin
ROSETTA/main/source/bin/
- A custom search path provided during initialization.
The project includes unit tests using Python's pytest
framework.
-
Clone the repository (if not already done):
git clone https://github.com/YaoYinYing/RosettaPy.git cd RosettaPy
-
Navigate to the project directory:
cd RosettaPy
-
Run the tests:
python -m pytest ./tests
Contributions are welcome! Please submit a pull request or open an issue for bug reports and feature requests.
This project is licensed under the MIT License.
- Rosetta Commons: The Rosetta software suite for the computational modeling and analysis of protein structures.
For questions or support, please contact:
- Name: Yinying Yao
- Email:yaoyy.hi(a)gmail.com