I am broadly interested in the evolution and structure of host-associated microbial communities. Of the numerous taxa that compose the Arabidopsis microbiome, fungi make up a substantial portion, but studies to date have tended to focus on the bacterial portion. With the help of my labmate Manon Guilberteau, I have cultured over thirty unique fungal species from natural populations of Arabidopsis. By infecting sterile Arabidopsis with specific microbial taxa under tightly controlled environmental conditions, I will investigate the role of fungi in formation of the non-mycorrhizal plant microbiome.
The 1001 Genomes Consortium set out to provide detailed whole-genome sequences of at least 1001 genotypes of the model plant Arabidopsis thaliana. In a worldwide collaboration, including (past) lab members Angela Hancock, Matthew Horton, Wayan Muliyati, Gianluca Sperone and Joy Bergelson, the consortium released 1,135 genome sequences of A. thaliana. The joint effort results in a publicly available, invaluable resource to study phenotypic variation and adaptation in plants.
The release of the genomes in Cell 2016, 166, provides a fascinating insight into A. thaliana’s global population structure, migration patterns, and evolutionary history. When combined with the RegMap panel, we now have 2,029 natural A. thaliana genotypes with high quality polymorphism data that will greatly expand our ability to study how wild plants adapt to biotic and abiotic environments.
In scientific manuscripts, we tell stories of our research, generally in straight-line fashion with clear motivations and results. This type of research is rare (in my experience), with stories, motivations, and applications only realized post hoc. This is the nature of science, and our recent ISMEJ publication is no different.
With “16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies“, we introduce an exciting method to generate 16S rRNA gene (16S) copy number estimates for bacterial genomes based on comparison of sequencing read depths of ribosomal and single copy gene regions. Application of this method resulted in 16S copy number estimates for hundreds of bacterial species without closed genome representatives. This extended database of known 16S copy numbers combined with phylogenetic based normalization methods [ – PICRUSt] for 16S amplicon sequencing studies will lead to more accurate organismal abundance measurements. Note: Our article is not open access but the code is freely available.
These are valid and important motivations and applications, but really, we just wanted to know the 16S copy numbers for a handful of isolates so we could properly measure their abundances by amplicon sequencing in controlled community studies.
So here is the actual development route of 16Stimator:
A caveat of 16S amplicon sequencing studies is that, due to variation in bacterial 16S copy number, sequencing read and organismal abundances are not equivalent. For our controlled community experiments using leaf endophytic bacteria originally isolated from Arabidopsis thaliana, we needed to determine each isolate’s 16S copy number. We chose whole genome sequencing and assembly for this task. That was a horrible choice.
Current assembly algorithms do a poor job resolving repetitive genomic regions. Longer reads or larger insert sizes can overcome this limitation, but alas, we had short read, Illumina sequencing libraries with insert sizes smaller than ribosomal rRNA gene regions. After assembly, the 16S rRNA gene was found in one to few contigs. When we mapped reads back to the assembly, the coverage of the 16S contig was much greater than the average genomic coverage, so we sought to use read-depths to resolve 16S copy numbers. By statistical coverage comparisons of 16S to single copy, conserved genes, we were able to accurately estimate copy numbers.
Though the focus of the paper is on the sequencing read-depth approach, we did confirm 16S copy numbers experimentally, using an efficient qPCR approach. We compared amplification of 16S to single copy, conserved genes to determine copy number. The IDT-DNA gBlocks provided a convenient alternative to plasmid construction for creating standards with a 1:1 ratio of 16S to single copy gene.
Only after resolving 16S copy numbers for our isolates of interest did we realize that this method could be applied to thousands of other draft genomes. We scaled 16Stimator to process tens of thousands of sequencing libraries deposited in SRA, resulting in 16S copy number estimations for hundreds of species without closed genome representatives. A large and diverse database of 16S copy numbers combined with methods to correct for copy number bias in 16S amplicon sequencing studies will ultimately result in more accurate abundance and diversity estimates. If sequencing reads are publicly deposited along with draft genome sequences, then the database can continue to grow.
Though we did not initially intend to create a method to estimate 16S copy numbers from draft genomes, science threw us a curveball and 16Stimator was our response. All the scripts and data are publicly available at https://bitbucket.org/perisin/16stimator. We look forward to feedback on our method to continue to improve and generate 16Stimates!
Graduate student Alice MacQueen investigated the transcriptome-wide patterns of mRNA editing in a collaboration with the group of Chuan He at the Department of Chemistry and Institute for Biophysical Dynamics at the University of Chicago. m6A mRNA editing is essential for plant development, but the role this editing mark plays in the cell is still unknown. The research team found that m6A editing in plants is distinct from editing in yeast and mammals, enriched not only around the stop codon and within 3′-untranslated regions, but also around the start codon .
Deposition of this editing mark around the start codon was associated with chloroplast-specific genes and increased mRNA abundance, which suggests a regulatory role for m6A editing in plants distinct from other eukaryotes described to date.
During the last two decades, scientists achieved a better understanding of the molecular basis of host-parasite co-evolution. However, many studies focused on the interaction of the genetic plant model species Arabidopsis thaliana and the highly pathogenic but non-specific tomato pathogen Pseudomonas syringae pv. tomato DC3000.
The Bergelson lab studies the interaction of A. thaliana and one of its highly abundant bacterial resident, P. viridiflava . We previously identified broad-scale natural variation in resistance phenotypes towards two distinct clades of P. viridiflava . While some genotypes of A. thaliana show little signs of disease or low bacteria titer, others suffer from severe hydrolysis of leaf tissue.
In a collaboration with Fabrice Roux, Joy Bergelson and Madlen Vetter, we currently identify and confirm the genetic loci underlying strain-specific and general defense mechanisms of A. thaliana against its natural pathogen P. viridiflava.
The outcome of host-microbe interactions is influenced by host genetics and interactions among bacterial community members. Previous studies described the bacterial community associated with Arabidopsis thaliana in the field. Using controlled greenhouse experiments we now aim to characterize how endophytic species composition influences plant-pathogen interactions. We furthermore seek to identify host genetic loci underlying the putative control of bacterial community composition.
Plants recognize potential pathogens and induce a complex immune response by detecting pathogen-associated molecular patterns (PAMPs). While immune responses are beneficial for mitigating the detrimental effects of pathogens, PAMP perception comes at the cost of growth reduction in seedlings. The genetic basis of growth versus defense trade-offs is poorly understood. A genome-wide association study identified the genetic loci contributing to natural variation in expenses in innate immune responses. We experimentally validated several a priori and de novo candidate genes, which significantly contribute to de- or increase of biomass after PAMP-triggered seedling growth inhibition.