Behind the paper: 16Stimator

In scientific manuscripts, we tell stories of our research, generally in straight-line fashion with clear motivations and results. This type of research is rare (in my experience), with stories, motivations, and applications only realized post hoc. This is the nature of science, and our recent ISMEJ publication is no different.

With 16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies, we introduce an exciting method to generate 16S rRNA gene (16S) copy number estimates for bacterial genomes based on comparison of sequencing read depths of ribosomal and single copy gene regions. Application of this method resulted in 16S copy number estimates for hundreds of bacterial species without closed genome representatives. This extended database of known 16S copy numbers combined with phylogenetic based normalization methods [PICRUSt] for 16S amplicon sequencing studies will lead to more accurate organismal abundance measurements. Note: Our article is not open access but the code is freely available.

These are valid and important motivations and applications, but really, we just wanted to know the 16S copy numbers for a handful of isolates so we could properly measure their abundances by amplicon sequencing in controlled community studies.

So here is the actual development route of 16Stimator:

A caveat of 16S amplicon sequencing studies is that, due to variation in bacterial 16S copy number, sequencing read and organismal abundances are not equivalent. For our controlled community experiments using leaf endophytic bacteria originally isolated from Arabidopsis thaliana, we needed to determine each isolate’s 16S copy number. We chose whole genome sequencing and assembly for this task. That was a horrible choice.

Current assembly algorithms do a poor job resolving repetitive genomic regions. Longer reads or larger insert sizes can overcome this limitation, but alas, we had short read, Illumina sequencing libraries with insert sizes smaller than ribosomal rRNA gene regions. After assembly, the 16S rRNA gene was found in one to few contigs. When we mapped reads back to the assembly, the coverage of the 16S contig was much greater than the average genomic coverage, so we sought to use read-depths to resolve 16S copy numbers. By statistical coverage comparisons of 16S to single copy, conserved genes, we were able to accurately estimate copy numbers.

16Stimator pipeline overview.
16Stimator pipeline overview.

Though the focus of the paper is on the sequencing read-depth approach, we did confirm 16S copy numbers experimentally, using an efficient qPCR approach. We compared amplification of 16S to single copy, conserved genes to determine copy number. The IDT-DNA gBlocks provided a convenient alternative to plasmid construction for creating standards with a 1:1 ratio of 16S to single copy gene.

16S copy number estimates from de novo assemblies. For each endophytic isolate, paired-end sequencing reads (R1, R2) were generated on the Illumina HiSeq 2000 from short (~250 bp) and long (~2500) insert libraries (Short_Ins and Long_Ins, respectively). For closed-genome controls, similarly generated sequencing reads were downloaded from SRA: Escherichia coli TY-2482 (GCA_000217695.2, SRR292678, SRR292862), Bacteroides fragilis HMW 615 (GCA_000297735.1, SRR488169, SRR488170), Pseudomonas aeruginosa PAO1 (GCA_000006765.1, SRR032420, SRR032832) and Staphylococcus aureus KPL1828 (GCA_000507725.1, SRR835799, SRR958927). The 16Stimator pipeline was used to estimate 16S copy number as the ratio of median coverage for 16S and single-copy genes. Confidence intervals (95%) were either calculated as in Price and Bonett (2002) (PB), or via permutations (Perm). For endophytic isolates, 16S copy numbers were independently verified by absolute quantification via qPCR with the mean and standard deviation of technical replicates shown. For closed-genome controls, each horizontal line marks the rrnDB (Stoddard et al., 2014) consensus 16S copy number for each species. Note: the short-insert library for MEDvA23 and the long-insert library for MEB061 did not meet quality thresholds. 16S copy number was not experimentally determined by qPCR for E. coli TY-2482, B. fragilis HMW 615, P. aeruginosa PAO1 and S. aureus KPL1828.

Only after resolving 16S copy numbers for our isolates of interest did we realize that this method could be applied to thousands of other draft genomes. We scaled 16Stimator to process tens of thousands of sequencing libraries deposited in SRA, resulting in 16S copy number estimations for hundreds of species without closed genome representatives. A large and diverse database of 16S copy numbers combined with methods to correct for copy number bias in 16S amplicon sequencing studies will ultimately result in more accurate abundance and diversity estimates. If sequencing reads are publicly deposited along with draft genome sequences, then the database can continue to grow.

Though we did not initially intend to create a method to estimate 16S copy numbers from draft genomes, science threw us a curveball and 16Stimator was our response. All the scripts and data are publicly available at We look forward to feedback on our method to continue to improve and generate 16Stimates!

This post first appeared on

Kembel, Steven W., Martin Wu, Jonathan A. Eisen, and Jessica L. Green. 2012. “Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance.” PLoS Comput Biol 8 (10): e1002743. Cite
Langille, Morgan G. I., Jesse Zaneveld, J. Gregory Caporaso, Daniel McDonald, Dan Knights, Joshua A. Reyes, Jose C. Clemente, et al. 2013. “Predictive Functional Profiling of Microbial Communities Using 16S RRNA Marker Gene Sequences.” Nature Biotechnology 31 (9): 814–821. Cite

Beilsmith, Kathleen, Manus P. M. Thoen, Benjamin Brachi, Andrew D. Gloss, Mohammad H. Khan, and Joy Bergelson. 2019. “Genome-Wide Association Studies on the Phyllosphere Microbiome: Embracing Complexity in Host-Microbe Interactions.” The Plant Journal 97 (1): 164–81.
Juenger, Thomas, Timothy C. Morton, Rick E. Miller, and Joy Bergelson. 2005. “Scarlet Gilia Resistance to Insect Herbivory: The Effects of Early Season Browsing, Plant Apparency, and Phytochemistry on Patterns of Seed Fly Attack.” Evolutionary Ecology 19 (1): 79–101.
Frachon, Léa, Cyril Libourel, Romain Villoutreix, Sébastien Carrère, Cédric Glorieux, Carine Huard-Chauveau, Miguel Navascués, et al. 2017. “Intermediate Degrees of Synergistic Pleiotropy Drive Adaptive Evolution in Ecological Time.” Nature Ecology & Evolution 1 (10): 1551.
Vetter, Madlen, Talia L. Karasov, and Joy Bergelson. 2016. “Differentiation between MAMP Triggered Defenses in Arabidopsis Thaliana.” PLOS Genet 12 (6): e1006068.
Brachi, Benjamin, Christopher G. Meyer, Romain Villoutreix, Alexander Platt, Timothy C. Morton, Fabrice Roux, and Joy Bergelson. 2015. “Coselected Genes Determine Adaptive Variation in Herbivore Resistance throughout the Native Range of Arabidopsis Thaliana.” Proceedings of the National Academy of Sciences 112 (13): 4032–37.
Barrett, Luke G., Joel M. Kniskern, Natacha Bodenhausen, Wen Zhang, and Joy Bergelson. 2009. “Continua of Specificity and Virulence in Plant Host–Pathogen Interactions: Causes and Consequences.” New Phytologist 183 (3): 513–29.
Bergelson, Joy, Edward S. Buckler, Joseph R. Ecker, Magnus Nordborg, and Detlef Weigel. 2016. “A Proposal Regarding Best Practices for Validating the Identity of Genetic Stocks and the Effects of Genetic Variants.” The Plant Cell 28 (3): 606–9.
Alonso-Blanco, Carlos, Jorge Andrade, Claude Becker, Felix Bemm, Joy Bergelson, Karsten M. Borgwardt, Jun Cao, et al. 2016. “1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis Thaliana.” Cell 166 (2): 481–91.
Nguyen, Thi Ngoc Nga, Zaigham Shahzad, Denis Vile, Fabrice ROUX, Alia Dellagi, Joy Bergelson, Dominique Expert, Françoise Gosti, and Pierre Berthomieu. 2013. “Plant Response to Zinc Excess.” In 7th EPSO Conference ’Plants for a Greening Economy, np. Porto Heli, Greece.
Exposito-Alonso, Moises, Claude Becker, Verena J. Schuenemann, Ella Reiter, Claudia Setzer, Radka Slovak, Benjamin Brachi, et al. 2016. “The Rate and Effect of de Novo Mutations in a Colonizing Lineage of Arabidopsis Thaliana.” BioRxiv, November, 050203.
Gloss, Andrew D., Benjamin Brachi, Mitchell J. Feldmann, Simon C. Groen, Claudia Bartoli, Jerome Gouzy, Erika R. LaPlante, et al. 2017. “Genetic Variants Affecting Plant Size and Chemical Defenses Jointly Shape Herbivory in Arabidopsis.” BioRxiv, June, 156299.
Brachi, Benjamin, Daniele Filiault, Paul Darme, Marine Le Mentec, Envel Kerdaffrec, Fernando Rabanal, Alison Anastasio, et al. 2017. “Plant Genes Influence Microbial Hubs That Shape Beneficial Leaf Communities.” BioRxiv, August, 181198.
Bergelson, Joy, Jana Mittelstrass, and Matthew W. Horton. 2019. “Characterizing Both Bacteria and Fungi Improves Understanding of the Arabidopsis Root Microbiome.” Scientific Reports 9 (1): 24.
Gao, Liping, Fabrice Roux, and Joy Bergelson. 2009. “Quantitative Fitness Effects of Infection in a Gene-for-Gene System.” The New Phytologist 184 (2): 485–94.
Rubio, Bernadette, Patrick Cosson, Mélodie Caballero, Frédéric Revers, Joy Bergelson, Fabrice Roux, and Valérie Schurdi‐Levraud. 2018. “Genome-Wide Association Study Reveals New Loci Involved in Arabidopsis Thaliana and Turnip Mosaic Virus (TuMV) Interactions in the Field.” New Phytologist 0 (0).
Nallu, Sumitha, Jason A. Hill, Kristine Don, Carlos Sahagun, Wei Zhang, Camille Meslin, Emilie Snell-Rood, et al. 2018. “The Molecular Genetic Basis of Herbivory between Butterflies and Their Host Plants.” Nature Ecology & Evolution 2 (9): 1418.
Wang, Miaoyan, Fabrice Roux, Claudia Bartoli, Carine Huard-Chauveau, Christopher Meyer, Hana Lee, Dominique Roby, Mary Sara McPeek, and Joy Bergelson. 2018. “Two-Way Mixed-Effects Methods for Joint Association Analysis Using Both Host and Pathogen Genomes.” Proceedings of the National Academy of Sciences 115 (24): E5440–49.
Exposito-Alonso, Moises, Claude Becker, Verena J. Schuenemann, Ella Reiter, Claudia Setzer, Radka Slovak, Benjamin Brachi, et al. 2018. “The Rate and Potential Relevance of New Mutations in a Colonizing Plant Lineage.” PLOS Genetics 14 (2): e1007155.
Karasov, Talia L., Luke Barrett, Ruth Hershberg, and Joy Bergelson. 2017. “Similar Levels of Gene Content Variation Observed for Pseudomonas Syringae Populations Extracted from Single and Multiple Host Species.” PLOS ONE 12 (9): e0184195.
Karasov, Talia L., Eunyoung Chae, Jacob J. Herman, and Joy Bergelson. 2017. “Mechanisms to Mitigate the Trade-Off between Growth and Defense.” The Plant Cell 29 (4): 666–80.
Romeo, John T., and Timothy C. Morton. 1994. “Nonprotein Amino Acids of the Ingeae: Taxonomic and Ecological Considerations.” In Advances in Legume Systematics 5: The Nitrogen Factor, 89–99. Kew: Royal Botanic Gardens.
Vencl, Fredric V., and Timothy C. Morton. 1999. “Macroevolutionary Aspects of Larval Shield Defenses.” In Advances in Chrysomelidae Biology., 217–238. Leiden: Backhuys.
Vencl, Fredric V., and Timothy C. Morton. 1998. “Did Chemical Change in Shield Defenses Promote Diversification of Shining Leaf Beetles (Chrysomelidae: Criocerinae)?” In Proceedings of the Fourth International Symposium on the Chrysomelidae: Proceedings of XX l.C.E. Firenze, 1996, 205–18. Torino, Italy: Museo Regionale di Scienze Naturali.
Olmstead, Karen L., Robert F. Denno, Timothy C. Morton, and John T. Romeo. 1997. “Influence of Prokelisia Planthoppers on Amino Acid Composition and Growth of Spartina Alterniflora.” Journal of Chemical Ecology 23 (2): 303–21.
Morton, Timothy C., and Fredric V. Vencl. 1998. “Larval Beetles Form a Defense from Recycled Host-Plant Chemicals Discharged as Fecal Wastes.” Journal of Chemical Ecology 24 (5): 765–85.
Vencl, Fredric V., Timothy C. Morton, Ralph O. Mumma, and Jack C. Schultz. 1999. “Shield Defense of a Larval Tortoise Beetle.” Journal of Chemical Ecology 25 (3): 549–66.
Vencl, Fredric V., and Timothy C. Morton. 1998. “The Shield Defense of the Sumac Flea Beetle, Blepharida Rhois (Chrysomelidae: Alticinae).” CHEMOECOLOGY 8 (1): 25–32.
Weidenhamer, Jeffrey D., Timothy C. Morton, and John T. Romeo. 1987. “Solution Volume and Seed Number: Often Overlooked Factors in Allelopathic Bioassays.” Journal of Chemical Ecology 13 (6): 1481–91.
Morton, Timothy C., Andrew S. Zektzer, Jason P. Rife, and John T. Romeo. 1991. “Trans-4-Methoxypipecolic Acid, an Amino Acid from Inga Paterno.” Phytochemistry 30 (7): 2397–99.
Futuyma, Douglas J., Joseph S. Walsh, Timothy Morton, Daniel J. Funk, and Mark C. Keese. 1994. “Genetic Variation in a Phylogenetic Context: Responses of Two Specialized Leaf Beetles (Coleoptera: Chrysomelidae) to Host Plants of Their Congeners.” Journal of Evolutionary Biology 7 (2): 127–46.
Gurevitch, Jessica, Daniel R. Taub, Timothy C. Morton, Proserpina L. Gomez, and Ing-Nang Wang. 1996. “Competition and Genetic Background in a Rapid-Cycling Cultivar of Brassica Rapa (Brassicaceae).” American Journal of Botany 83 (7): 932–38.
Morton, Timothy C. 1998. “Chemotaxonomic Significance of Hydroxylated Pipecolic Acids in Central American Inga (Fabaceae: Mimosoideae: Ingeae).” Biochemical Systematics and Ecology 26 (4): 379–401.
Jackrel, Sara L., Timothy C. Morton, and J. Timothy Wootton. 2016. “Intraspecific Leaf Chemistry Drives Locally Accelerated Ecosystem Function in Aquatic and Terrestrial Communities.” Ecology 97 (8): 2125–35.
Pierre, Joseph F., Kristina B. Martinez, Honggang Ye, Anuradha Nadimpalli, Timothy C. Morton, Jinhui Yang, Qiang Wang, Noelle Patno, Eugene B. Chang, and Deng Ping Yin. 2016. “Activation of Bile Acids Signaling Improves Metabolic Phenotypes in High-Fat Diet-Induced Obese (DIO) Mice.” American Journal of Physiology - Gastrointestinal and Liver Physiology, June, ajpgi.00202.2016.
MacQueen, Alice, Xiaoqin Sun, and Joy Bergelson. 2016. “Genetic Architecture and Pleiotropy Shape Costs of Rps2-Mediated Resistance in Arabidopsis Thaliana.” Nature Plants 2 (July): 16110.
MacQueen, Alice, and Joy Bergelson. 2016. “Modulation of R-Gene Expression across Environments.” Journal of Experimental Botany 67 (7): 2093–2105.
Herman, Jacob J., Hamish G. Spencer, Kathleen Donohue, and Sonia E. Sultan. 2014. “How Stable ‘Should’ Epigenetic Modifications Be? Insights from Adaptive Plasticity and Bet Hedging.” Evolution 68 (3): 632–43.
Zhang, Wen. 2012. “EFFECTORSEARCH: Software for Identifying Effectors of T3SS in Bacterial Species.” Chinese Journal of Zoonoses 28 (6): 528.
Roux, F., and J. Bergelson. 2016. “Chapter Four - The Genetics Underlying Natural Variation in the Biotic Interactions of Arabidopsis Thaliana: The Challenges of Linking Evolutionary Genetics and Community Ecology.” In Current Topics in Developmental Biology, edited by Virginie Orgogozo, 119:111–56. Genes and Evolution. Academic Press.
Goss, Erica M, and Joy Bergelson. 2006. “Variation in Resistance and Virulence in the Interaction between Arabidopsis Thaliana and a Bacterial Pathogen.” Evolution; International Journal of Organic Evolution 60 (8): 1562–73.
Juenger, Thomas, and Joy Bergelson. 2002. “The Spatial Scale of Genotype by Environment Interaction (GEI) for Fitness in the Loose-Flowered Gilia, Ipomopsis Laxiflora (Polemoniaceae).” International Journal of Plant Sciences 163 (4): 613–618. 10.1086/340447.
Mitchell-Olds, T., and J. Bergelson. 1990. “Statistical Genetics of an Annual Plant, Impatiens Capensis. I. Genetic Basis of Quantitative Variation.” Genetics 124 (2): 407–415.
Bergelson, Joy, Thomas Juenger, and Michael J. Crawley. 1996. “Regrowth Following Herbivory in Ipomopsis Aggregata: Compensation but Not Overcompensation.” The American Naturalist 148 (4): 744–755.
Morris, WF, MB Traw, and J Bergelson. 2006. “On Testing for Tradeoffs between Constitutive and Induced Resistance.” Oikos 112: 102–110.
Traw, M B, J Kim, S Enright, D F Cipollini, and J Bergelson. 2003. “Negative Cross-Talk between Salicylate- and Jasmonate-Mediated Pathways in the Wassilewskija Ecotype of Arabidopsis Thaliana.” Molecular Ecology 12 (5): 1125–35.
Mauricio, Rodney, Eli A Stahl, Tonia Korves, Dacheng Tian, Martin Kreitman, and Joy Bergelson. 2003. “Natural Selection for Polymorphism in the Disease Resistance Gene Rps2 of Arabidopsis Thaliana.” Genetics 163 (2): 735–46.
Kniskern, JM, LG Barrett, and J Bergelson. 2010. “Maladaptation in Wild Populations of the Generalist Plant Pathogen Pseudomonas Syringae.” Evolution 65 (3): 818–830.
Cipollini, Donald F, Jeremiah W Busch, Kirk A Stowe, Ellen L Simms, and Joy Bergelson. 2003. “Genetic Variation and Relationships of Constitutive and Herbivore-Induced Glucosinolates, Trypsin Inhibitors, and Herbivore Resistance in Brassica Rapa.” Journal of Chemical Ecology 29 (2): 285–302.
Shonle, I., and J. Bergelson. 2000. “Evolutionary Ecology of the Tropane Alkaloids of Datura Stramonium L. (Solanaceae).” Evolution; International Journal of Organic Evolution 54 (3): 778–788. 10.1111/j.0014-3820.2000.tb00079.x.
Wichmann, Gale, and Joy Bergelson. 2004. “Effector Genes of Xanthomonas Axonopodis Pv. Vesicatoria Promote Transmission and Enhance Other Fitness Traits in the Field.” Genetics 166 (2): 693–706. 10.1534/genetics.166.2.693.