Skip to content

Latest commit

 

History

History

Input data

PacBio full-length variant sequencing

PacBio_runs.csv contains information on PacBio runs linking the barcodes to the mutations. It has the following columns:

  • library: name of the library sequenced
  • run: date of the pacbio library submission (use this date to refer to experimental notebook)
  • fastq: FASTQ file from running CCS

The amplicons itself is defined in PacBio_amplicon.gb, and PacBio_feature_parse_specs.yaml specifies how to parse that with alignparse.

Illumina barcode sequencing

barcode_runs.csv has the Illumina barcode-sequencing runs used to count barcodes in different conditions. It describes the samples, which should be named as clearly as possible. It has the following columns:

  • date: date experiment was performed in YYYY-MM-DD encoding.
  • virus_batch: batch of virus used for the experiment.
  • library: which virus library was used.
  • sample_type: can be one of the following:
    • VSVG_control: entry mediated by VSVG
    • no-antibody_control: entry mediated by VEP of interest
    • antibody: encompasses sera and antibodies
  • antibody: name of the antibody if this sample has sample_type of antibody
  • antibody_concentration: concentration of antibody if this sample has sample_type of antibody. For sera, should be a fraction < 1 giving dilution (not a dilution factor).
  • replicate: experimental replicate.
  • fastq_R1: path to R1 FASTQ file, or semi-colon de-limited list of multiple FASTQs
  • exclude_after_counts: set to yes if barcode run should be excluded after counting barcodes
  • notes: any other notes about the sample.

The file neutralization_standard_barcodes.csv has the barcodes for the neutralization standard that is not expected to be neutralized by the sera or antibodies.

Assign sites to protein regions (domains)

reference_site_to_region.csv assigns sites in the protein to a region (protein domain).

Configuration for polyclonal analysis

polyclonal_config.yaml specifies how the analysis with polyclonal is done. For each antibody listed in barcode_runs.csv, specify:

  • max_epitopes: the maximum number of epitopes to test. The fitting keeps testing more epitopes up to this max until additional epitopes have no additional value.
  • n_bootstrap_samples: number of bootstrap samples to use.
  • min_epitope_activity_to_include: keep adding epitopes until activity <= this.
  • fit_kwargs: keyword arguments passed to Polyclonal.fit
  • plot_kwargs: keyword arguments passed to Polyclonal.plot_mut_escape.

Validation neutralization assays

validation_IC50s.csv contains mutations for which we want to calculate IC50s using the polyclonal fits and compare to measured values in validation experiments. Required columns are "antibody", "aa_substitutions", "measured IC50". If multiple mutations, they should be space delimited in "aa_substitutions" columns.