diff --git a/.nojekyll b/.nojekyll deleted file mode 100644 index e69de29..0000000 diff --git a/404.html b/404.html deleted file mode 100644 index 6478616..0000000 --- a/404.html +++ /dev/null @@ -1,679 +0,0 @@ - - - -
- - - - - - - - - - - - - - -All notable changes to this project will be documented in this file.
-The format is based on Keep a Changelog, -and this project adheres to Semantic Versioning.
-We have completely reworked of the data module. -Depthcharge now uses Apache Arrow-based formats instead of HDF5; spectra are converted either Parquet or streamed with PyArrow, optionally into Lance datasets.
-pyarrow.RecordBatch
objects.custom_fields
parameter.SpctrumDataset
and its subclasses have been moved to the spectra_to_*
functions in the data module.SpectrumDataset
and its subclasses now return dictionaries of data rather than a tuple of data. This allows us to incorporate arbitrary additional dataStreamingSpectrumDataset
for fast inference.spectra_to_df
, spectra_to_df
, spectra_to_stream
to the depthcharge.data
module.FloatEncoder
and PeakEncoder
.tgt_mask
in the PeptideTransformerDecoder
was the incorrect type.
- Now it is bool
as it should be.
- Thanks @justin-a-sanders!spectrum_utils
and pyteomics
.spectrum_utils
💪MassEncoder
is now FloatEncoder
, because its generally useful for encoding floating-point numbers.detokenize()
method now returns a list instead of a string.We as members, contributors, and leaders pledge to make participation in our -community a harassment-free experience for everyone, regardless of age, body -size, visible or invisible disability, ethnicity, sex characteristics, gender -identity and expression, level of experience, education, socio-economic status, -nationality, personal appearance, race, religion, or sexual identity -and orientation.
-We pledge to act and interact in ways that contribute to an open, welcoming, -diverse, inclusive, and healthy community.
-Examples of behavior that contributes to a positive environment for our -community include:
-Examples of unacceptable behavior include:
-Community leaders are responsible for clarifying and enforcing our standards of -acceptable behavior and will take appropriate and fair corrective action in -response to any behavior that they deem inappropriate, threatening, offensive, -or harmful.
-Community leaders have the right and responsibility to remove, edit, or reject -comments, commits, code, wiki edits, issues, and other contributions that are -not aligned to this Code of Conduct, and will communicate reasons for moderation -decisions when appropriate.
-This Code of Conduct applies within all community spaces, and also applies when -an individual is officially representing the community in public spaces. -Examples of representing our community include using an official e-mail address, -posting via an official social media account, or acting as an appointed -representative at an online or offline event.
-Instances of abusive, harassing, or otherwise unacceptable behavior may be -reported to the community leaders responsible for enforcement at wfondrie@uw.edu. -All complaints will be reviewed and investigated promptly and fairly.
-All community leaders are obligated to respect the privacy and security of the -reporter of any incident.
-Community leaders will follow these Community Impact Guidelines in determining -the consequences for any action they deem in violation of this Code of Conduct:
-Community Impact: Use of inappropriate language or other behavior deemed -unprofessional or unwelcome in the community.
-Consequence: A private, written warning from community leaders, providing -clarity around the nature of the violation and an explanation of why the -behavior was inappropriate. A public apology may be requested.
-Community Impact: A violation through a single incident or series -of actions.
-Consequence: A warning with consequences for continued behavior. No -interaction with the people involved, including unsolicited interaction with -those enforcing the Code of Conduct, for a specified period of time. This -includes avoiding interactions in community spaces as well as external channels -like social media. Violating these terms may lead to a temporary or -permanent ban.
-Community Impact: A serious violation of community standards, including -sustained inappropriate behavior.
-Consequence: A temporary ban from any sort of interaction or public -communication with the community for a specified period of time. No public or -private interaction with the people involved, including unsolicited interaction -with those enforcing the Code of Conduct, is allowed during this period. -Violating these terms may lead to a permanent ban.
-Community Impact: Demonstrating a pattern of violation of community -standards, including sustained inappropriate behavior, harassment of an -individual, or aggression toward or disparagement of classes of individuals.
-Consequence: A permanent ban from any sort of public interaction within -the community.
-This Code of Conduct is adapted from the Contributor Covenant, -version 2.0, available at -https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
- - - - - - -First off, thank you for taking the time to contribute.
-The following document provides guidelines for contributing to the -documentation and the code of Depthcharge. No contribution is too small! Even -fixing a simple typo in the documentation is immensely helpful.
-We use mkdocs generate our -documentation and deploy it to this site. Most of the pages on the site are -created from simple text files written in the Markdown markup language. -There are three exceptions to this:
-The API.
-The Vignettes are created from Jupyter notebooks.
-The Code of Conduct, Release Notes, Changlog, and this Contributing document are - markdown files that live in the root of the Depthcharge repository.
-The easiest way to edit a document is by clicking the “Edit on GitHub” like in -the top right hand corner of each page. You’ll be taken to GitHub where -you can click on the pencil to edit the document.
-You can then make your changes directly on GitHub. Once you’re finished, fill -in a description of what you changed and click the “Propose Changes” button.
-Alternatively, these documents live in the docs/
directory of the
-repository and can be edited like code. See Contributing to the
-code below for more details on contributing this
-way.
We welcome contributions to the source code of Depthcharge—particularly -ones that address discussed issues.
-Contributions to Depthcharge follow a standard GitHub contribution workflow:
-Create your own fork of the Depthcharge repository on GitHub.
-Clone your forked Depthcharge repository to work on locally.
-Install the pre-commit hooks. - These will automatically lint and verify that new code matches our standard formatting with each new commit. -
-Create a new branch with a descriptive name for your changes:
-Make your changes (make sure to read below first).
-Add, commit, and push your changes to your forked repository.
-On the GitHub page for you forked repository, click “Pull request” to propose - adding your changes to Depthcharge.
-We’ll review, discuss, and help you make any revisions that are required. If - all goes well, your changes will be added to Depthcharge - in the next release!
-The Depthcharge project follows the PEP 8 guidelines for Python code style. -More specifically, we use Black to automatically format code and Ruff to automatically lint Python code in Depthcharge.
-We highly recommend setting up our pre-commit hooks. -These will run Black, Ruff, and some other checks during each commit, fixing problems that can be fixed automatically. -Because we run black for code linting as part of our tests, setting up this hook can save you from having to revise code formatting. Take the following steps to setup the pre-commit hooks:
-One the hook is installed, black will be run before any commit is made. If a
-file is changed by black, then you need to git add
the file again before
-finished the commit.
When you’re ready, open a pull request with your changes and we’ll start the review process. -Thank you for your contribution! :tada:
- - - - - - -depthcharge.data
)depthcharge.data.SpectrumDataset(spectra, path=None, **kwargs)
-
-
- Bases: Dataset
, CollateFnMixin
Store and access a collection of mass spectra.
-Parse and/or add mass spectra to an index in the -lance data format. -This format enables fast random access to spectra for training. -This file is then served as a PyTorch Dataset, allowing spectra -to be accessed efficiently for training and inference.
-If you wish to use an existing lance dataset, use the from_lance()
-method.
PARAMETER | -DESCRIPTION | -
---|---|
spectra |
-
-
-
- Spectra to add to this collection. These may be a DataFrame parsed
-with
-
- TYPE:
- |
-
path |
-
-
-
- The name and path of the lance dataset. If the path does
-not contain the
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Keyword arguments passed
-
- TYPE:
- |
-
ATTRIBUTE | -DESCRIPTION | -
---|---|
peak_files |
-
-
-
-
-
-
- TYPE:
- |
-
path |
-
-
-
-
-
-
- TYPE:
- |
-
path: Path
-
-
- property
-
-
-The path to the underyling lance dataset.
-peak_files: list[str]
-
-
- property
-
-
-The files currently in the lance dataset.
-__del__()
-
-Cleanup the temporary directory.
- -__getitem__(idx)
-
-Access a mass spectrum.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
idx |
-
-
-
- The index of the index of the mass spectrum to look up. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- dict
-
- |
-
-
-
- A dictionary representing a row of the dataset. Each -key is a column and the value is the value for that -row. List columns are automatically converted to -PyTorch tensors if the nested data type is compatible. - |
-
__len__()
-
-The number of spectra in the lance dataset.
- -add_spectra(spectra, **kwargs)
-
-Add mass spectrometry data to the lance dataset.
-Note that depthcharge does not verify whether the provided spectra -already exist in the lance dataset.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
spectra |
-
-
-
- Spectra to add to this collection. These may be a DataFrame parsed
-with
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Keyword arguments passed
-
- TYPE:
- |
-
from_lance(path, **kwargs)
-
-
- classmethod
-
-
-Load a previously created lance dataset.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
path |
-
-
-
- The path of the lance dataset. -
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Keyword arguments passed
-
- TYPE:
- |
-
depthcharge.data.AnnotatedSpectrumDataset(spectra, annotations, tokenizer, path=None, **kwargs)
-
-
- Bases: SpectrumDataset
Store and access a collection of annotated mass spectra.
-Parse and/or add mass spectra to an index in the -lance data format. -This format enables fast random access to spectra for training. -This file is then served as a PyTorch Dataset, allowing spectra -to be accessed efficiently for training and inference.
-If you wish to use an existing lance dataset, use the from_lance()
-method.
PARAMETER | -DESCRIPTION | -
---|---|
spectra |
-
-
-
- Spectra to add to this collection. These may be a DataFrame parsed
-with
-
- TYPE:
- |
-
annotations |
-
-
-
- The column name containing the annotations. -
-
- TYPE:
- |
-
tokenizer |
-
-
-
- The tokenizer used to transform the annotations into PyTorch -tensors. -
-
- TYPE:
- |
-
path |
-
-
-
- The name and path of the lance dataset. If the path does
-not contain the
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Keyword arguments passed
-
- TYPE:
- |
-
ATTRIBUTE | -DESCRIPTION | -
---|---|
peak_files |
-
-
-
-
-
-
- TYPE:
- |
-
path |
-
-
-
-
-
-
- TYPE:
- |
-
tokenizer |
-
-
-
- The tokenizer for the annotations. -
-
- TYPE:
- |
-
annotations |
-
-
-
- The annotation column in the dataset. -
-
- TYPE:
- |
-
collate_fn(batch)
-
-The collate function for a AnnotatedSpectrumDataset.
-The mass spectra must be padded so that they fit nicely as a tensor. -However, the padded elements are ignored during the subsequent steps.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
batch |
-
-
-
- A batch of data from an AnnotatedSpectrumDataset. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- dict of str, tensor or list
-
- |
-
-
-
- A dictionary mapping the columns of the lance dataset -to a PyTorch tensor or list of values. - |
-
from_lance(path, annotations, tokenizer, **kwargs)
-
-
- classmethod
-
-
-Load a previously created lance dataset.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
path |
-
-
-
- The path of the lance dataset. -
-
- TYPE:
- |
-
annotations |
-
-
-
- The column name containing the annotations. -
-
- TYPE:
- |
-
tokenizer |
-
-
-
- The tokenizer used to transform the annotations into PyTorch -tensors. -
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Keyword arguments passed
-
- TYPE:
- |
-
depthcharge.data.PeptideDataset(tokenizer, sequences, charges, *args)
-
-
- Bases: TensorDataset
A dataset for peptide sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
tokenizer |
-
-
-
- A tokenizer specifying how to transform peptide sequences. -into tokens. -
-
- TYPE:
- |
-
sequences |
-
-
-
- The peptide sequences in a format compatible with -your tokenizer. ProForma is preferred. -
-
- TYPE:
- |
-
charges |
-
-
-
- The charge state for each peptide. -
-
- TYPE:
- |
-
*args |
-
-
-
- Additional values to include during data loading. -
-
- TYPE:
- |
-
charges: torch.Tensor
-
-
- property
-
-
-The peptide charges.
-tokens: torch.Tensor
-
-
- property
-
-
-The peptide sequence tokens.
-loader(*args, **kwargs)
-
-A PyTorch DataLoader for peptides.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
*args |
-
-
-
- Arguments passed initialize a torch.utils.data.DataLoader,
-excluding
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Keyword arguments passed initialize a torch.utils.data.DataLoader,
-excluding
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- DataLoader
-
- |
-
-
-
- A DataLoader for the peptide. - |
-
depthcharge.encoders
)depthcharge.encoders.PositionalEncoder(d_model, min_wavelength=1.0, max_wavelength=100000.0)
-
-
- Bases: FloatEncoder
The positional encoder for sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
d_model |
-
-
-
- The number of features to output. -
-
- TYPE:
- |
-
min_wavelength |
-
-
-
- The shortest wavelength in the geometric progression. -
-
- TYPE:
- |
-
max_wavelength |
-
-
-
- The longest wavelength in the geometric progression. -
-
- TYPE:
- |
-
forward(X)
-
-Encode positions in a sequence.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
X |
-
-
-
- The first dimension should be the batch size (i.e. each is one -peptide) and the second dimension should be the sequence (i.e. -each should be an amino acid representation). -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- torch.Tensor of shape (batch_size, n_sequence, n_features)
-
- |
-
-
-
- The encoded features for the mass spectra. - |
-
depthcharge.encoders.FloatEncoder(d_model, min_wavelength=0.001, max_wavelength=10000, learnable_wavelengths=False)
-
-
- Bases: Module
Encode floating point values using sine and cosine waves.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
d_model |
-
-
-
- The number of features to output. -
-
- TYPE:
- |
-
min_wavelength |
-
-
-
- The minimum wavelength to use. -
-
- TYPE:
- |
-
max_wavelength |
-
-
-
- The maximum wavelength to use. -
-
- TYPE:
- |
-
learnable_wavelengths |
-
-
-
- Allow the selected wavelengths to be fine-tuned -by the model. -
-
- TYPE:
- |
-
forward(X)
-
-Encode m/z values.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
X |
-
-
-
- The masses to embed. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- torch.Tensor of shape (batch_size, n_float, d_model)
-
- |
-
-
-
- The encoded features for the floating point numbers. - |
-
depthcharge.encoders.PeakEncoder(d_model, min_mz_wavelength=0.001, max_mz_wavelength=10000, min_intensity_wavelength=1e-06, max_intensity_wavelength=1, learnable_wavelengths=False)
-
-
- Bases: Module
Encode mass spectrum.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
d_model |
-
-
-
- The number of features to output. -
-
- TYPE:
- |
-
min_mz_wavelength |
-
-
-
- The minimum wavelength to use for m/z. -
-
- TYPE:
- |
-
max_mz_wavelength |
-
-
-
- The maximum wavelength to use for m/z. -
-
- TYPE:
- |
-
min_intensity_wavelength |
-
-
-
- The minimum wavelength to use for intensity. The default assumes -intensities between [0, 1]. -
-
- TYPE:
- |
-
max_intensity_wavelength |
-
-
-
- The maximum wavelength to use for intensity. The default assumes -intensities between [0, 1]. -
-
- TYPE:
- |
-
learnable_wavelengths |
-
-
-
- Allow the selected wavelengths to be fine-tuned -by the model. -
-
- TYPE:
- |
-
forward(X)
-
-Encode m/z values and intensities.
-Note that we expect intensities to fall within the interval [0, 1].
- - - -PARAMETER | -DESCRIPTION | -
---|---|
X |
-
-
-
- The spectra to embed. Axis 0 represents a mass spectrum, axis 1 -contains the peaks in the mass spectrum, and axis 2 is essentially -a 2-tuple specifying the m/z-intensity pair for each peak. These -should be zero-padded, such that all of the spectra in the batch -are the same length. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- torch.Tensor of shape (n_spectra, n_peaks, d_model)
-
- |
-
-
-
- The encoded features for the mass spectra. - |
-
The depthcharge package provides utilities and classes to parse, store, and use mass spectra, peptides, and small molecules in Transformer models. -Although depthcharge is primarly focused Transformers, many of these classes can be used as building blocks for models of your own choice.
-Tokenizers split string inputs into tokens, such as peptide sequences or SMILES strings, into the pieces that we want to model with our neural networks. -For example, peptide sequences need to be broken into constituent amino acids, with or without modifications (the tokens). -In addition to specifying how tokens are created, the tokenizer classes also aid in mass calculations that are useful for modeling mass spectrometry data.
-Tokenizers live in the tokenizers submodule.
-Datasets are subclasses of PyTorch Datasets, specifically designed to store and retrieve mass spectrometry data, including the mass spectra themselves and analyte representations.
-Datasets live in the datasets submodule.
-Often the most useful representation for an input is not the raw parsed format. -In depthcharge, we provide a collection of sinusoidal encoders to represent general floating point numbers, positions in a sequence, and peaks in a mass spectrum.
-Encoders live in the encoders submodule.
-The Transformer architecture is a powerful tool for modeling sequence data, such as the amino acids of a peptide or even the peaks in a mass spectrum. -In depthcharge, we provide Transformer models specifically tailored for modeling different pieces of mass spectrometry and related data. -These classes are compound classes: they often combine the appropriate encoders with a traditional Transformer model to directly model the data of interest.
-Transformers live in the transformers submodule.
-Although most users will not interact with our primitive classes directly, they are the building blocks for representing mass spectrometry data types. -These classes are not primitives in the traditional sense of the word; rather, they define how to parse and store data such as peptides, mass spectra, and small molecules.
-Primitives are available as top-level classes in Depthcharge.
- - - - - - -depthcharge.MassSpectrum(filename, scan_id, mz, intensity, ms_level=None, retention_time=None, ion_mobility=None, precursor_mz=None, precursor_charge=None, label=None)
-
-
- Bases: MsmsSpectrum
A mass spectrum.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
filename |
-
-
-
- The file from which the spectrum originated. -
-
- TYPE:
- |
-
scan_id |
-
-
-
- The Hupo PSI standard scan identifier. -
-
- TYPE:
- |
-
mz |
-
-
-
- The m/z values. -
-
- TYPE:
- |
-
intensity |
-
-
-
- The intensity values. -
-
- TYPE:
- |
-
retention_time |
-
-
-
- The measured retention time. -
-
- TYPE:
- |
-
ion_mobility |
-
-
-
- The measured ion mobility. -
-
- TYPE:
- |
-
precursor_mz |
-
-
-
- The precursor ion m/z, if applicable. -
-
- TYPE:
- |
-
precursor_charge |
-
-
-
- The precursor charge, if applicable. -
-
- TYPE:
- |
-
label |
-
-
-
- A label for the mass spectrum. This is typically an -annotation, such as the generating peptide sequence, -but is distinct from spectrum_utils’ annotation. -
-
- TYPE:
- |
-
precursor_mass: float
-
-
- property
-
-
-The monoisotopic mass.
-usi: str
-
-
- property
-
-
-The Universal Spectrum Identifier.
-to_tensor()
-
-Combine the m/z and intensity arrays into a single tensor.
- - - -RETURNS | -DESCRIPTION | -
---|---|
-
- torch.tensor of shape (n_peaks, 2)
-
- |
-
-
-
- The mass spectrum information. - |
-
depthcharge.Peptide
-
-
-
- dataclass
-
-
-A peptide sequence with or without modification and/or charge.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequence |
-
-
-
- The bare amino acid sequence. -
-
- TYPE:
- |
-
modifications |
-
-
-
- The modification at each amino acid. This should be the length of
-
-
- TYPE:
- |
-
charge |
-
-
-
- The charge of the peptide. -
-
- TYPE:
- |
-
__post_init__()
-
-Validate the provided parameters.
- -from_massivekb(sequence, charge=None)
-
-
- classmethod
-
-
-Create a Peptide from MassIVE-KB annotations.
-MassIVE-KB includes N-term carbamylation, NH3-loss, acetylation, -as well as M oxidation, and deamidation of N and Q, in a -manner that does not comply with the ProForma standard.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequence |
-
-
-
- The peptide sequence from MassIVE-KB -
-
- TYPE:
- |
-
charge |
-
-
-
- The charge state of the peptide. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- Peptide
-
- |
-
-
-
- The parsed MassIVE peptide after conversion to a ProForma -format. - |
-
from_proforma(sequence)
-
-
- classmethod
-
-
-Create a Peptide from a ProForma 2.0 string.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequence |
-
-
-
- A ProForma 2.0-compliant string. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- Peptide
-
- |
-
-
-
- The parsed ProForma peptide. - |
-
massivekb_to_proforma(sequence, charge=None)
-
-
- classmethod
-
-
-Convert a MassIVE-KB peptide sequence to ProForma.
-MassIVE-KB includes N-term carbamylation, NH3-loss, acetylation, -as well as M oxidation, and deamidation of N and Q, in a -manner that does not comply with ProForma.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequence |
-
-
-
- The peptide sequence from MassIVE-KB -
-
- TYPE:
- |
-
charge |
-
-
-
- The charge state of the peptide. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- str
-
- |
-
-
-
- The parsed MassIVE peptide after conversion to a ProForma -format. - |
-
split()
-
-Split the modified peptide for tokenization.
- -depthcharge.PeptideIons
-
-
-
- dataclass
-
-
-The ions generated by a peptide.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
tokens |
-
-
-
- The string tokens that comprise the peptide sequence. -
-
- TYPE:
- |
-
precursor |
-
-
-
- The monoisotopic m/z of the precursor ion. -
-
- TYPE:
- |
-
fragments |
-
-
-
- The generated fragment ions originated from the peptide. -
-
- TYPE:
- |
-
b_ions: torch.Tensor[float]
-
-
- property
-
-
-The b ion series.
-sequence: str
-
-
- property
-
-
-The peptide sequence.
-y_ions: torch.Tensor[float]
-
-
- property
-
-
-The y ion series.
-depthcharge.Molecule
-
-
-
- dataclass
-
-
-A representation of a molecule.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
smiles |
-
-
-
- A SMILES string defining the molecule. -
-
- TYPE:
- |
-
charge |
-
-
-
- The charge of the molecule. -
-
- TYPE:
- |
-
__post_init__()
-
-Validate parameters.
- -from_selfies(selfies, charge=None)
-
-
- classmethod
-
-
-Create a molecule from a SELFIES string.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
selfies |
-
-
-
- The SELFIES string defining the molecule. -
-
- TYPE:
- |
-
charge |
-
-
-
- The charge of the molecule. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- Molecule
-
- |
-
-
-
- The parsed Molecule. - |
-
show(**kwargs)
-
-Show the molecule in 2D.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
**kwargs |
-
-
-
- Keyword arguments passed to
-
- TYPE:
- |
-
to_selfies()
-
-Convert SMILES to a SELFIES representaion.
- -depthcharge.tokenizers
)depthcharge.tokenizers.PeptideTokenizer(residues=None, replace_isoleucine_with_leucine=False, reverse=False)
-
-
- Bases: Tokenizer
A tokenizer for ProForma peptide sequences.
-Parse and tokenize ProForma-compliant peptide sequences. Additionally, -use this class to calculate fragment and precursor ion masses.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
residues |
-
-
-
- Residues and modifications to add to the vocabulary beyond the -standard 20 amino acids. -
-
- TYPE:
- |
-
replace_isoleucine_with_leucine |
-
-
-
- Replace I with L residues, because they are isobaric and often -indistinguishable by mass spectrometry. -
-
- TYPE:
- |
-
reverse |
-
-
-
- Reverse the sequence for tokenization, C-terminus to N-terminus. -
-
- TYPE:
- |
-
ATTRIBUTE | -DESCRIPTION | -
---|---|
residues |
-
-
-
- The residues and modifications and their associated masses.
-terminal modifcations are indicated by
-
- TYPE:
- |
-
index |
-
-
-
- The mapping of residues and modifications to integer representations. -
-
- TYPE:
- |
-
reverse_index |
-
-
-
- The ordered residues and modifications where the list index is the -integer representation for a token. -
-
- TYPE:
- |
-
stop_token |
-
-
-
- The stop token. -
-
- TYPE:
- |
-
__getstate__()
-
-How to pickle the object.
- -__setstate__(state)
-
-How to unpickle the object.
- -from_massivekb(replace_isoleucine_with_leucine=True, reverse=True)
-
-
- staticmethod
-
-
-Create a tokenizer with the observed peptide modications.
-Modifications are parsed from MassIVE-KB peptide strings -and added to the vocabulary.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
replace_isoleucine_with_leucine |
-
-
-
- Replace I with L residues, because they are isobaric and often -indistinguishable by mass spectrometry. -
-
- TYPE:
- |
-
reverse |
-
-
-
- Reverse the sequence for tokenization, C-terminus to N-terminus. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- MskbPeptideTokenizer
-
- |
-
-
-
- A tokenizer for peptides with the observed modifications. - |
-
from_proforma(sequences, replace_isoleucine_with_leucine=True, reverse=True)
-
-
- classmethod
-
-
-Create a tokenizer with the observed peptide modications.
-Modifications are parsed from ProForma 2.0-compliant peptide strings -and added to the vocabulary.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequences |
-
-
-
- The peptides from which to parse modifications. -
-
- TYPE:
- |
-
replace_isoleucine_with_leucine |
-
-
-
- Replace I with L residues, because they are isobaric and often -indistinguishable by mass spectrometry. -
-
- TYPE:
- |
-
reverse |
-
-
-
- Reverse the sequence for tokenization, C-terminus to N-terminus. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- PeptideTokenizer
-
- |
-
-
-
- A tokenizer for peptides with the observed modifications. - |
-
ions(sequences, precursor_charges, max_fragment_charge=None)
-
-Calculate the m/z for the precursor and fragment ions.
-Currently depthcharge only support b and y ions.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequences |
-
-
-
- The peptide sequences. -
-
- TYPE:
- |
-
precursor_charges |
-
-
-
- The charge of each precursor ion. If
-
- TYPE:
- |
-
max_fragment_charge |
-
-
-
- The maximum charge for fragment ions. The default is to consider
-up to the
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- list of PeptideIons
-
- |
-
-
-
- The precursor and fragment ions generated by the peptide. - |
-
split(sequence)
-
-Split a ProForma peptide sequence.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequence |
-
-
-
- The peptide sequence. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- list[str]
-
- |
-
-
-
- The tokens that compprise the peptide sequence. - |
-
depthcharge.tokenizers.Tokenizer(tokens, stop_token='$')
-
-
- Bases: ABC
An abstract base class for Depthcharge tokenizers.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
tokens |
-
-
-
- The tokens to consider. -
-
- TYPE:
- |
-
stop_token |
-
-
-
- The stop token to use. -
-
- TYPE:
- |
-
__len__()
-
-The number of tokens.
- -detokenize(tokens, join=True, trim_stop_token=True)
-
-Retreive sequences from tokens.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
tokens |
-
-
-
- The zero-padded tensor of integerized tokens to decode. -
-
- TYPE:
- |
-
join |
-
-
-
- Join tokens into strings? -
-
- TYPE:
- |
-
trim_stop_token |
-
-
-
- Remove the stop token from the end of a sequence. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- list[str] or list[list[str]]
-
- |
-
-
-
- The decoded sequences each as a string or list or strings. - |
-
split(sequence)
-
-
- abstractmethod
-
-
-Split a sequence into the constituent string tokens.
- -tokenize(sequences, to_strings=False, add_stop=False)
-
-Tokenize the input sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
sequences |
-
-
-
- The sequences to tokenize. -
-
- TYPE:
- |
-
to_strings |
-
-
-
- Return each as a list of token strings rather than a -tensor. This is useful for debugging. -
-
- TYPE:
- |
-
add_stop |
-
-
-
- Append the stop token tothe end of the sequence. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- torch.Tensor of shape (n_sequences, max_length) or list[list[str]]
-
- |
-
-
-
- Either a tensor containing the integer values for each -token, padded with 0’s, or the list of tokens comprising -each sequence. - |
-
depthcharge.transformers
)depthcharge.transformers.SpectrumTransformerEncoder(d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, peak_encoder=True)
-
-
- Bases: Module
A Transformer encoder for input mass spectra.
-Use this PyTorch module to embed mass spectra. By default, nothing
-other than the m/z and intensity arrays for each mass spectrum are
-considered. However, arbitrary information can be integrated into the
-spectrum representation by subclassing this class and overwriting the
-precursor_hook()
method.
PARAMETER | -DESCRIPTION | -
---|---|
d_model |
-
-
-
- The latent dimensionality to represent peaks in the mass spectrum. -
-
- TYPE:
- |
-
nhead |
-
-
-
- The number of attention heads in each layer.
-
- TYPE:
- |
-
dim_feedforward |
-
-
-
- The dimensionality of the fully connected layers in the Transformer -layers of the model. -
-
- TYPE:
- |
-
n_layers |
-
-
-
- The number of Transformer layers. -
-
- TYPE:
- |
-
dropout |
-
-
-
- The dropout probability for all layers. -
-
- TYPE:
- |
-
peak_encoder |
-
-
-
- The function to encode the (m/z, intensity) tuples of each mass
-spectrum.
-
- TYPE:
- |
-
ATTRIBUTE | -DESCRIPTION | -
---|---|
d_model |
-
-
-
-
-
-
- TYPE:
- |
-
nhead |
-
-
-
-
-
-
- TYPE:
- |
-
dim_feedforward |
-
-
-
-
-
-
- TYPE:
- |
-
n_layers |
-
-
-
-
-
-
- TYPE:
- |
-
dropout |
-
-
-
-
-
-
- TYPE:
- |
-
peak_encoder |
-
-
-
- The function to encode the (m/z, intensity) tuples of each mass -spectrum. -
-
- TYPE:
- |
-
transformer_encoder |
-
-
-
- The Transformer encoder layers. -
-
- TYPE:
- |
-
d_model: int
-
-
- property
-
-
-The latent dimensionality of the model.
-device: torch.device
-
-
- property
-
-
-The current device for the model.
-dim_feedforward: int
-
-
- property
-
-
-The dimensionality of the Transformer feedforward layers.
-dropout: float
-
-
- property
-
-
-The dropout for the transformer layers.
-n_layers: int
-
-
- property
-
-
-The number of Transformer layers.
-nhead: int
-
-
- property
-
-
-The number of attention heads.
-forward(mz_array, intensity_array, **kwargs)
-
-Embed a batch of mass spectra.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
mz_array |
-
-
-
- The zero-padded m/z dimension for a batch of mass spectra. -
-
- TYPE:
- |
-
intensity_array |
-
-
-
- The zero-padded intensity dimension for a batch of mass spctra. -
-
- TYPE:
- |
-
**kwargs |
-
-
-
- Additional fields provided by the data loader. These may be
-used by overwriting the
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
- latent
- |
-
-
-
- The latent representations for the spectrum and each of its -peaks. -
-
- TYPE:
- |
-
- mem_mask
- |
-
-
-
- The memory mask specifying which elements were padding in X. -
-
- TYPE:
- |
-
precursor_hook(mz_array, intensity_array, **kwargs)
-
-Define how additional information in the batch may be used.
-Overwrite this method to define custom functionality dependent on -information in the batch. Examples would be to incorporate any -combination of the mass, charge, retention time, or -ion mobility of a precursor ion.
-The representation returned by this method is preprended to the -peak representations that are fed into the Transformer encoder and -ultimately contribute to the spectrum representation that is the -first element of the sequence in the model output.
-By default, this method returns a tensor of zeros.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
mz_array |
-
-
-
- The zero-padded m/z dimension for a batch of mass spectra. -
-
- TYPE:
- |
-
intensity_array |
-
-
-
- The zero-padded intensity dimension for a batch of mass spctra. -
-
- TYPE:
- |
-
**kwargs |
-
-
-
- The additional data passed with the batch. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
-
- torch.Tensor of shape (batch_size, d_model)
-
- |
-
-
-
- The precursor representations. - |
-
depthcharge.transformers.PeptideTransformerEncoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, max_charge=5)
-
-
- Bases: _PeptideTransformer
A transformer encoder for peptide sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
n_tokens |
-
-
-
- The number of tokens used to tokenize peptide sequences. -
-
- TYPE:
- |
-
d_model |
-
-
-
- The latent dimensionality to represent the amino acids in a peptide -sequence. -
-
- TYPE:
- |
-
nhead |
-
-
-
- The number of attention heads in each layer.
-
- TYPE:
- |
-
dim_feedforward |
-
-
-
- The dimensionality of the fully connected layers in the Transformer -layers of the model. -
-
- TYPE:
- |
-
n_layers |
-
-
-
- The number of Transformer layers. -
-
- TYPE:
- |
-
dropout |
-
-
-
- The dropout probability for all layers. -
-
- TYPE:
- |
-
positional_encoder |
-
-
-
- The positional encodings to use for the amino acid sequence. If
-
-
- TYPE:
- |
-
max_charge |
-
-
-
- The maximum charge state for peptide sequences. -
-
- TYPE:
- |
-
forward(tokens, charges)
-
-Predict the next amino acid for a collection of sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
tokens |
-
-
-
- The integer tokens describing each peptide sequence, padded -to the maximum peptide length in the batch with 0s. -
-
- TYPE:
- |
-
charges |
-
-
-
- The charge state of each peptide. -
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
- latent
- |
-
-
-
- The latent representations for the spectrum and each of its -peaks. -
-
- TYPE:
- |
-
- mem_mask
- |
-
-
-
- The memory mask specifying which elements were padding in X. -
-
- TYPE:
- |
-
depthcharge.transformers.PeptideTransformerDecoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, max_charge=5)
-
-
- Bases: _PeptideTransformer
A transformer decoder for peptide sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
n_tokens |
-
-
-
- The number of tokens used to tokenize peptide sequences. -
-
- TYPE:
- |
-
d_model |
-
-
-
- The latent dimensionality to represent peaks in the mass spectrum. -
-
- TYPE:
- |
-
nhead |
-
-
-
- The number of attention heads in each layer.
-
- TYPE:
- |
-
dim_feedforward |
-
-
-
- The dimensionality of the fully connected layers in the Transformer -layers of the model. -
-
- TYPE:
- |
-
n_layers |
-
-
-
- The number of Transformer layers. -
-
- TYPE:
- |
-
dropout |
-
-
-
- The dropout probability for all layers. -
-
- TYPE:
- |
-
positional_encoder |
-
-
-
- The positional encodings to use for the amino acid sequence. If
-
-
- TYPE:
- |
-
max_charge |
-
-
-
- The maximum charge state for peptide sequences. -
-
- TYPE:
- |
-
forward(tokens, precursors, memory, memory_key_padding_mask)
-
-Predict the next amino acid for a collection of sequences.
- - - -PARAMETER | -DESCRIPTION | -
---|---|
tokens |
-
-
-
- The partial peptide sequences for which to predict the next -amino acid. Optionally, these may be the token indices instead -of a string. -
-
- TYPE:
- |
-
precursors |
-
-
-
- The precursor mass (axis 0) and charge (axis 1). -
-
- TYPE:
- |
-
memory |
-
-
-
- The representations from a
-
- TYPE:
- |
-
memory_key_padding_mask |
-
-
-
- The mask that indicates which elements of
-
- TYPE:
- |
-
RETURNS | -DESCRIPTION | -
---|---|
- scores
- |
-
-
-
- The raw output for the final linear layer. These can be Softmax -transformed to yield the probability of each amino acid for the -prediction. -
-
- TYPE:
- |
-
{"use strict";/*!
- * escape-html
- * Copyright(c) 2012-2013 TJ Holowaychuk
- * Copyright(c) 2015 Andreas Lubbe
- * Copyright(c) 2015 Tiancheng "Timothy" Gu
- * MIT Licensed
- */var $a=/["'&<>]/;Un.exports=Ra;function Ra(e){var t=""+e,r=$a.exec(t);if(!r)return t;var o,n="",i=0,s=0;for(i=r.index;i