A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity
Matthew Hobbs1, Ana Pavasovic2,3, Andrew G King1, Peter J Prentis4, Mark DB Eldridge1, Zhiliang Chen5, Donald J Colgan1, Adam Polkinghorne3,6, Marc R Wilkins5, Cheyne Flanagan7, Amber Gillett8, Jon Hanger9, Rebecca N Johnson1* and Peter Timms3,6
1 Australian Museum Research Institute, Australian Museum, 6 College Street, Sydney, NSW 2010, Australia.
2 School of Biomedical Sciences, Queensland University of Technology, 2 George Street, Brisbane, Queensland 4001, Australia.
3 Institute of Health and Biomedical Innovation, Queensland University of Technology, 60 Musk Avenue, Kelvin Grove, Queensland 4059, Australia.
4 School of Earth, Environmental and Biological Sciences, Queensland University of Technology, 2 George Street, Brisbane, Queensland 4001, Australia.
5 Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia.
6 Current affiliation: Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, Locked Bag 4, Maroochydore DC, Queensland 4558, Australia.
7 Port Macquarie Koala Hospital, Cnr. Roto Place and Lord St, Port Macquarie, NSW 2444, Australia.
8 Australia Zoo Wildlife Hospital, 1638 Steve Irwin Way, Beerwah, Queensland 4519, Australia.
9 Endeavour Veterinary Ecology Pty Ltd, 1695 Pumicestone Road, Toorbul, Queensland 4510, Australia.
Background The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus.
Results RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene. Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population.
Conclusions This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org.