Comprehensive profiling of retroviral integration sites using target enrichment methods from historical koala samples without an assembled reference genome
Pin Cui1,*, Ulrike Löber1,2,*, David E. Alquezar-Planas1, Yasuko Ishida3, Alexandre Courtiol4, Peter Timms5, Rebecca N. Johnson6, Dorina Lenz7, Kristofer M. Helgen8,9, Alfred L. Roca3, Stefanie Hartman2 and Alex D. Greenwood1
1 Department of Wildlife Diseases, Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
2 Institute of Biochemistry & Biology, University of Potsdam, Potsdam, Germany
3 Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, United States
4 Department of Evolutionary Ecology, Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
5 University of the Sunshine Coast, Sippy Downs Queensland, Australia
6 Australian Centre for Wildlife Genomics, Australian Museum, Sydney, Australia
7 Department of Evolutionary Genetics, Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
8 National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
9 Department of Veterinary Medicine, Freie Universität Berlin, Berlin, Germany
* These authors contributed equally to this work.
Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV) is currently invading the germline of the koala (Phascolarctos cinereus) and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS) and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW) koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small.