Wednesday, October 30, 2013

Natural barcode and rRNA genes

Ribosomal RNA (rRNA) is part of ribosome, where proteins are synthesized from mRNAs. Ribosome is composed by two major types of rRNAs (large subunit -LSU and short subunit - SSU) and >50 ribosomal proteins.


The ribosome in eukaryotes is called 80S ribosome. S is the unit of sedimentation coefficients defined by Svedberg. Its large subunit is called 60S and small subunit is called 40S. In most eukaryotes the small ribosomal subunit contains 18S rRNA, while the large subunit contains three rRNA species: the 5S, 5.8S and 28S.

In eukaryotes, the rRNAs are generally encoded by many copied of rRNA genes in the genome. Mammalian cells have 2 mitochondrial (12S and 16S) rRNA molecules and 4 types of cytoplasmic rRNA (the 28S, 5.8S, 18S, and 5S subunits). The 28S, 5.8S, and 18S rRNAs are encoded by a single transcription unit (45S) separated by 2 internally transcribed spacers (ITS) and padded by external transcribed sequences (ETS) at both ends. The 45S rDNA organized into 5 clusters (each has 30-40 repeats) on chromosomes 13, 14, 15, 21, and 22. The region between the clusters is called intergenic spacer (IGS). These are transcribed by RNA polymerase I. (source from wikipedia). See below for details.
Credit: Xianjun Dong

One example of 45S rRNA in Genbank:

What interested me is that the ITS sequence can be used as barcode to identify different species because of its high degree of variation and ease of amplification.

5S occurs in tandem arrays (~200-300 true 5S genes and many dispersed pseudogenes), the largest one on the chromosome 1q41-42. 5S rRNA is transcribed by RNA polymerase III. Here is one 5S rRNA record in Genbank:

All animal mitochondria genomes contain 37 genes, 2 rRNA, 22 tRNA and 13 mRNA.The 2 rRNA are 12S rRNA and 16S rRNA, which are encoded by gene MTRNR1 and MT-RNR2, respectively. Research have shown that MT-RNR2 is associated with Alzheimer's disease and MTRNR1 is associated with hearing loss.

Due to the high number of copies of rRNA genes, we usually see big porportion of rRNA from the total RNA. That will bias the RNAseq result. So we need to remove the rRNAs from the library before sequencing (e.g. ribominus) and after sequencing (e.g. cufflinks -M option). 

