In prior studies, our lab found that costs of resistance to pathogens in the absence of disease was ~5-10% for the resistance (R) genes Rps5 and Rpm1, respectively. However, Arabidopsis thaliana has 149 R-genes so it is unlikely that many R genes incur such a high cost. The now published research of former PhD student Alice MacQueen focuses on Rps2 that exists as an ancient balanced polymorphism with two long-lived clades of alleles. Alice conducted field trials that show that Arabidopsis thaliana plants with resistant Rps2 are no less fit than those with a susceptible Rps2 allele in the absence of disease. Both resistant and susceptible Rps2 alleles contribute to controlling defense and stress gene expression thus presenting a pleiotropic effect to explain the maintenance of both alleles.
“These results demonstrate how profoundly the magnitude of fitness costs associated with disease resistance may be shaped by genomic architecture and pleiotropy… These findings shed much-needed light on how the full repertoire of R genes is maintained in the A. thaliana genome. More broadly, these results show that the nature of fitness costs and trade-offs of disease resistance vary among loci even within the same host. Such information is crucial for crop breeding, where the challenge lies in producing high-yield crops while minimizing the cost of disease control.”
We illustrated this post with Sir John Tenniel’s drawing of the Red Queen and Alice from Lewis Carroll’s Through the Looking-Glass. The Red Queen tells Alice: “Now, here, you see, it takes all the running you can do, to keep in the same place”. This is commonly used as an analogy for co-evolution, as hosts and parasites have to rapidly adapt to each other in order to not loose the race. A concept introduced by Leigh Van Valen’s 1973 article. The rate of this co-evolutionary arms race is expected to be constrained by fitness costs.
Alice MacQueen performed fitness experiments as part of her doctoral dissertation and is now a post doctoral researcher with the Juenger lab in Austin Texas.
Xiaoqin Sun worked with the Bergelson lab from 2007-2009 and is now at the Institute of Botany, Jiangsu Province and Chinese Academy of Sciences, Nanjing.
The 1001 Genomes Consortium set out to provide detailed whole-genome sequences of at least 1001 genotypes of the model plant Arabidopsis thaliana. In a worldwide collaboration, including (past) lab members Angela Hancock, Matthew Horton, Wayan Muliyati, Gianluca Sperone and Joy Bergelson, the consortium released 1,135 genome sequences of A. thaliana. The joint effort results in a publicly available, invaluable resource to study phenotypic variation and adaptation in plants.
The release of the genomes in Cell 2016, 166, provides a fascinating insight into A. thaliana’s global population structure, migration patterns, and evolutionary history. When combined with the RegMap panel, we now have 2,029 natural A. thaliana genotypes with high quality polymorphism data that will greatly expand our ability to study how wild plants adapt to biotic and abiotic environments.
In scientific manuscripts, we tell stories of our research, generally in straight-line fashion with clear motivations and results. This type of research is rare (in my experience), with stories, motivations, and applications only realized post hoc. This is the nature of science, and our recent ISMEJ publication is no different.
With “16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies“, we introduce an exciting method to generate 16S rRNA gene (16S) copy number estimates for bacterial genomes based on comparison of sequencing read depths of ribosomal and single copy gene regions. Application of this method resulted in 16S copy number estimates for hundreds of bacterial species without closed genome representatives. This extended database of known 16S copy numbers combined with phylogenetic based normalization methods [ – PICRUSt] for 16S amplicon sequencing studies will lead to more accurate organismal abundance measurements. Note: Our article is not open access but the code is freely available.
These are valid and important motivations and applications, but really, we just wanted to know the 16S copy numbers for a handful of isolates so we could properly measure their abundances by amplicon sequencing in controlled community studies.
So here is the actual development route of 16Stimator:
A caveat of 16S amplicon sequencing studies is that, due to variation in bacterial 16S copy number, sequencing read and organismal abundances are not equivalent. For our controlled community experiments using leaf endophytic bacteria originally isolated from Arabidopsis thaliana, we needed to determine each isolate’s 16S copy number. We chose whole genome sequencing and assembly for this task. That was a horrible choice.
Current assembly algorithms do a poor job resolving repetitive genomic regions. Longer reads or larger insert sizes can overcome this limitation, but alas, we had short read, Illumina sequencing libraries with insert sizes smaller than ribosomal rRNA gene regions. After assembly, the 16S rRNA gene was found in one to few contigs. When we mapped reads back to the assembly, the coverage of the 16S contig was much greater than the average genomic coverage, so we sought to use read-depths to resolve 16S copy numbers. By statistical coverage comparisons of 16S to single copy, conserved genes, we were able to accurately estimate copy numbers.
Though the focus of the paper is on the sequencing read-depth approach, we did confirm 16S copy numbers experimentally, using an efficient qPCR approach. We compared amplification of 16S to single copy, conserved genes to determine copy number. The IDT-DNA gBlocks provided a convenient alternative to plasmid construction for creating standards with a 1:1 ratio of 16S to single copy gene.
Only after resolving 16S copy numbers for our isolates of interest did we realize that this method could be applied to thousands of other draft genomes. We scaled 16Stimator to process tens of thousands of sequencing libraries deposited in SRA, resulting in 16S copy number estimations for hundreds of species without closed genome representatives. A large and diverse database of 16S copy numbers combined with methods to correct for copy number bias in 16S amplicon sequencing studies will ultimately result in more accurate abundance and diversity estimates. If sequencing reads are publicly deposited along with draft genome sequences, then the database can continue to grow.
Though we did not initially intend to create a method to estimate 16S copy numbers from draft genomes, science threw us a curveball and 16Stimator was our response. All the scripts and data are publicly available at https://bitbucket.org/perisin/16stimator. We look forward to feedback on our method to continue to improve and generate 16Stimates!