-
Notifications
You must be signed in to change notification settings - Fork 11
Identification of protein coding genes putatively involved in infection by combining metagenomics analysis and protein orthologue clustering.
Christine Sambles and David Studholme. University of Exeter, Devon.
In order to identify fungal protein-coding genes associated with Fraxinus:Hymenoschyphus in planta interactions, we took an orthologue clustering approach. By identifying fungal transcripts that are present in four samples taken from infected ash and removing transcripts that are also present in the KW1 isolate could reveal some infection-related transcripts from H. pseudoalbidus. Additionally, F. excelsior transcripts present in the infected material and absent from F. excelsior with no signs of infection could identify transcripts involved in the plants response to infection by H. pseudoalbidus.
Transcriptome assemblies:
F. excelsior: ATU1
C. fraxinea: KW1
Mixed material: AT1 , AT2 , Upton , Holt
Output from BLASTX searches against GenBank:
F. excelsior: ATU1
C. fraxinea: KW1
Mixed material: AT1, AT2, Upton, Holt
We used MEGAN as previously described (http://oadb.tsl.ac.uk/?p=704), to assign transcripts to taxonomic bins. These transcripts came from four transcript assemblies:
* 1 H. pseudoalbidus isolate (KW1) and
* 4 mixed material (AT1, AT2, Holt & Upton).
This resulted in 36,945 transcripts being allocated to the bin for order Helotiales.
The longest open reading frame for each Helotiales-binned transcript (Table 1) was translated into a predicted protein sequence. These protein sequences were clustered using OrthoMCL.
Table 1: Numbers of transcripts and percentages of all transcripts for each sample or isolate that were binned to the order Helotiales using MEGAN.
AT1 |
AT2 |
Holt |
Upton |
KW1 |
ATU1 |
|
Helotiales |
8,214 |
7,403 |
6,930 |
7,410 |
6,561 |
0 |
% all transcripts |
15.61% |
8.80% |
6.44% |
12.25% |
31.75% |
0.00% |
Between 4,548 and 5,551 proteins were clustered from each sample; the number of protein clusters was 6,505 in total. A Venn diagram of the clustered proteins can be seen in Figure 1.
Fig 1 : Venn diagram of Helotiales-binned proteins clustered with OrthoMCL for one H. pseudoalbidus isolate (KW1) and four mixed material samples from H. pseudoalbidus infected F. excelsior (AT1, AT2, Holt and Upton).
There was a core set of 3,118 protein clusters from detectable transcripts. A set of 113 protein clusters was identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton) and 33 only identified in KW1, a H. pseudoalbidus isolate. These will be referred to as the ‘in planta’ and ‘ex planta’ groups respectively.
The 113 protein clusters found only in H. pseudoalbidus infected F. excelsior (in planta) contained a total of 565 transcripts (459 excluding isoforms). We annotated the transcript sequences based on results of BLASTX searches. Additionally the GO, EC, KEGG, PFAM and CAZy (Carbohydrate-Active enzymes) databases were used to annotate the full set of 565 transcripts.
GO, EC and KEGG annotation were inferred using annot8r (Schmid and Blaxter 2008), PFAM domains were identified with Pfam scan (a wrapper script around hmmpfam) and CAZy-family members were annotated using the CAZYmes Analysis Toolkit (CAT) ( Park, Karpinets et al. 2010).
GO analysis revealed a reduction of growth-related and an increase of cell differentiation and proliferation proteins in infected material (Fig 2).
[Fig 2](http://figshare.com/articles/Gene_Ontology_GO_analysis_of_the_the_pan_proteome_KW1_AT1_AT2_Upton_Holt_compared_to_in_planta_proteins_/988717) : Gene Ontology (GO) analysis of the the pan-proteome (KW1, AT1, AT2, Upton, Holt) compared to in planta proteins. The in planta proteins were translated from Helotiales-binned transcripts (MEGAN) and were identified only in H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton). The pan-proteome proteins were also translated from Helotiales-binned transcripts (MEGAN) and include the isolate, KW1.PFAM and CAZy analysis of the 565 transcripts of the pan-proteome resulted in 88 PFAM domains/families and the following CAZy families:
* Glycosyl hydrolases family 18 (Pfam: Glyco_hydro_18, PF00704)
* Alcohol dehydrogenase GroES-like domain (Pfam: ADH_N, PF08240) & Zinc-binding dehydrogenase (Pfam: ADH_zinc_N, PF00107)
* alpha/beta hydrolase fold (Pfam: Abhydrolase_3, PF07859)
* Protein of unknown function, a putative transmembrane protein from bacteria. It is likely to be conserved between Mycobacterium species (Pfam: DUF2029, PF09594) & PAP2 superfamily (Pfam: PAP2_3, PF14378)
* Regulator of chromosome condensation (RCC1) repeat (Pfam: RCC1, PF00415)
* Chalcone-flavanone isomerase (Pfam: Chalcone, PF02431)
* Myosin head (motor domain) (Pfam: Myosin_head, PF00063) & Chitin synthase (Pfam: Chitin_synth_2, PF03142)RhgB_N|fn3_3|CBM-like.
BLASTX hits from the in planta transcripts included putative CFEM domain-containing protein (Marssonina brunnea) and Galactose mutarotase-like protein (Glarea lozoyensis). The Galactose mutarotase-like protein is of interest as it is also similar to rhamnogalacturonate lyase found in Aspergillus spp. and is known to degrade plant cell walls by cleaving the pectin backbone ( de Vries and Visser 2001). Some CFEM-containing proteins are proposed to have important roles in fungal pathogenesis (Kulkarni, Kelkar et al. 2003).
PFAM domains and families in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were identified using the hmmpfam wrapper script, Pfam scan. These were compared to the PFAM annotation of the ‘in planta’ group to identify over-representation of specific domains within this group. The domains and families in which >80% annotations were present in the ‘in planta’ group when compared to the ‘pan-proteome’ are shown in Table 1.
Table 1: Pfam domains and families in which >80% ‘pan-proteome’ annotations were present in the ‘in planta’ group ( http://pfam.sanger.ac.uk/).
Domain/Family |
Name |
Pfam accession |
ATP12 |
ATP12 chaperone protein |
PF07542 |
BOP1NT |
BOP1NT (NUC169) domain |
PF08145 |
iPGM_N |
BPG-independent PGAM N-terminus |
PF06415 |
CDC37_M |
Cdc37 Hsp90 binding domain |
PF08565 |
CDC37_N |
Cdc37 N terminal kinase binding domain |
PF03234 |
CDC37_C |
Cdc37 C terminal domain |
PF08564 |
Chalcone |
Chalcone-flavanone isomerase |
PF02431 |
Copper-bind |
Copper binding proteins plastocyanin/azurin family |
PF00127 |
Sdh5 |
Flavinator of succinate dehydrogenase |
PF03937 |
HD_3 |
HD domain |
PF13023 |
Hpt |
Hpt domain |
PF01627 |
Metalloenzyme |
Metalloenzyme superfamily |
PF01676 |
CENP-I |
Mis6 |
PF07778 |
Myosin_tail_1 |
Myosin tail |
PF01576 |
TRM |
N2 N2-dimethylguanosine tRNA methyltransferase |
PF02005 |
Es2 |
Nuclear protein Es2 |
PF09751 |
Tom37 |
Outer mitochondrial membrane transport complex protein |
PF10568 |
PAP2_3 |
PAP2 superfamily |
PF14378 |
PMC2NT |
PMC2NT (NUC016) domain |
PF08066 |
Porphobil_deam |
Porphobilinogen deaminase dipyromethane cofactor binding domain |
PF01379 |
Porphobil_deam(C) |
Porphobilinogen deaminase C-terminal domain |
PF03900 |
DUF2012 |
Protein of unknown function |
PF09430 |
DUF775 |
Protein of unknown function |
PF05603 |
Prp31_C |
Prp31 C terminal domain |
PF09785 |
Ribosomal_L32p |
Ribosomal L32p protein family |
PF01783 |
Several of the Pfam hits struck us as interesting; these are described below. The pairs of numbers in brackets are the number found within the in planta group / number found in entire ‘pan-proteome’:
Porphobil_deam and Porphobil_deamC (6/6) were found in two AT1 isoforms, AT2, two Holt isoforms and Upton. There were no peptides with this domain in the Helotiales binned KW1 proteome. Heme-biosynthetic porphobilinogen deaminase protects Aspergillus nidulans from nitrosative stress. In A. nidulans, a novel NO-tolerant (nitric oxide-tolerant) protein PBG-D (the heme biosynthesis enzyme porphobilinogen deaminase) modulates the reduction of environmental NO and nitrite by flavohemoglobin (FHB, encoded by fhbA and fhbB)) and nitrite reductase (NiR, encoded by niiA) ( Zhou, Narukami et al. 2012). NO is part of the plant hypersensitive response, a localized programmed cell death and confines pathogen to site of attempted infection (Mur, Carver et al. 2006).
Proteins matching the ‘copper binding proteins, plastocyanin/azurin’ family (Pfam: Copper-bind, PF00127) (3/3) domain were found in AT1, Holt & Upton. OrthoMCL clustered an AT2 protein with them, but the assembled transcript was incomplete at the 5’ end and the PF00127 was therefore not present. BLASTX searches indicated an amino acid sequence similarity to cupredoxin from Glarea lozoyensis and HHPred predicts similarity to cucumber stellacyanin. Due to the amino acid sequence similarity between the phytocyanins and fungal laccases, this may potentially be a laccase. White-rot fungi (e.g. Trametes cinnabarina, Trametes versicolor and Phlebia radiata) are reported to produce laccases which degrade lignin (Tuor, Winterhalter et al. 1995; Eggert, Temp et al. 1997) and laccase-mediated detoxification of phytoalexins generated by the plant defence systems has been observed in Botrytis cinerea (Pezet, Pont et al. 1991; Sbaghi, Jeandet et al. 1996; Adrian, Rajaei et al. 1998; Breuil, Jeandet et al. 1999).
The Hpt domain (Pfam: Hpt, PF01627) (5/5) was identified in two AT1 isoforms, AT2, Upton & Holt. The histidine-containing phosphotransfer (HPt) domain is a novel protein module with an active histidine residue that mediates phosphotransfer reactions in the two-component signalling systems (Catlett, Yoder et al. 2003).
Although below the threshold of 80%, 35.71% (5/14) of the CFEM domains identified in the ‘pan-proteome’ of KW1, AT1, AT2, Holt & Upton were present in the ‘in planta’ group and none were present in the ‘ex planta’ group. The CFEM domains were distributed across 4 clusters, only one of which is not present in KW1:
ClusterID : Clustered protein present in:
HELO2454 : AT1, AT2, HOLT, UPTON
HELO4337: AT1, AT2, HOLT, UPTON, KW1
HELO5213: AT1, HOLT, UPTON, KW1
HELO5952: AT2, UPTON, KW1
Fig 2: Phylogenetic tree of H. pseudoalbidus sequences from four OrthoMCL clusters where at least one sequence in the cluster contains a CFEM domain (Pfam: PF05730)). The names of full-length proteins are shown in black; in grey are names of shorter length proteins from incomplete transcript assembly that lack a CFEM domain but that cluster with CFEM domain sequences due to sequence similarity and inferred orthology. Orthologue clustering was performed on all translated transcripts binned to the Helotiales using MEGAN from the one H. pseudoalbidus isolate (KW1) and all four H. pseudoalbidus samples that were from infected F. excelsior (AT1, AT2, Holt & Upton).
The 33 clusters (representing 72 peptides) in the ex planta group which were only identified in the isolate KW1 were annotated with PFAM as previously described. This resulted in identification of 17 Pfam domains/families (Table 2).
Table 2: Pfam domains/families identified in the ex planta group
Domain/Family |
Name |
Pfam accession |
COX1 |
Cytochrome C and Quinol oxidase polypeptide I |
PF00115 |
DASH_Spc34 |
DASH complex subunit Spc34 |
PF08657 |
Pentapeptide_4 |
Pentapeptide repeats |
PF13599 |
Vac7 |
Vacuolar segregation subunit 7 P |
PF12751 |
DHQ_synthase |
3-dehydroquinate synthase |
PF01761 |
LtrA |
Bacterial low temperature requirement A protein |
PF06772 |
FSH1 |
Serine hydrolase |
PF03959 |
Tyrosinase |
Common central domain of tyrosinase |
PF00264 |
Glyco_hydro_47 |
Glycosyl hydrolase family 47 |
PF01532 |
DUF202 |
Domain of unknown function |
PF02656 |
SET |
SET domain |
PF00856 |
Abhydrolase_1 |
alpha/beta hydrolase fold |
PF00561 |
adh_short_C2 |
Enoyl-(Acyl carrier protein) reductase |
PF13561 |
Glyco_hydro_3 |
Glycosyl hydrolase family 3 N terminal domain |
PF00933 |
ADH_zinc_N |
Zinc-binding dehydrogenase |
PF00107 |
AAA |
ATPase family associated with various cellular activities |
PF00004 |
adh_short |
short chain dehydrogenase |
PF00106 |
This low number of peptides not identified in any of the H. pseudoalbidus infected ash samples limits the ability to perform any comparative analysis.
Proteins putatively involved in plant-pathogen interactions have been identified from groups of translated transcripts exclusively found in planta and were not identified in isolate KW1. They included a copper binding protein within the plastocyanin/azurin family, porphobilinogen deaminase, a CFEM domain-containing protein and a Galactose mutarotase-like protein.