Thursday, November 29, 2012

Don't trust Cufflinks FPKM for short genes

Here is what Cole (author of Cufflinks) commented to the observation of "very high RPKM values from Cufflink":
This issue has been discussed elsewhere on this board. As Nicholas points out, RNA-Seq really isn't reliable for very short transcripts. The reason is that all the fragments that map to these transcripts come from the "tail" of the distribution of library fragment lengths. That is, fragments that map to microRNAs are much, much shorter than most fragments in the library - by design in the RNA-Seq protocol, which size selects away very short inserts. Thus, Cufflinks infers that even though relatively few fragments actually mapped to the microRNAs, there were probably TONS of individual microRNA molecules in the transcriptome before all of the various size selection parts of the protocol kicked in. Cufflinks accordingly increases the FPKM of these short transcripts to compensate for the bias against short fragments in the library.

This compensation was designed to improve accuracy for transcripts that are in the 500bp-1kb range - for longer transcripts, the "edge effects" due to library fragment size aren't much of an issue. However, I wouldn't trust FPKM values for transcripts shorter than your average fragment length. There's really just not enough data in most standard RNA-Seq libraries to say much about small RNA abundance.
I should also point out that other methods use this same bias correction technique (RSEM for example). As far as I'm aware, the "count-based" methods don't, but that doesn't mean they shouldn't. Most of those methods are strictly for differential analysis, where any edge effects are assumed to be affecting each condition the same way. That may or may not be the case in your data.
In any case, the quick answer to this problem is to simply remove or ignore transcripts shorter than around 300bp from your GTF. In a future version, we will be flagging these transcripts as too short for reliable quantification where appropriate.
Here is the full discussion:


  1. When you use a genuine service, you will be able to provide instructions, share materials and choose the formatting style. High Trust Flow

  2. Superbly written article, if only all bloggers offered the same content as you, the internet would be a far better place.. seo rank tracker


  3. Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained! click here right now

  4. This post is very simple to read and appreciate without leaving any details out. Great work! ecommerce platforms

  5. Wonderful article, thanks for putting this together! This is obviously one great post. Thanks for the valuable information and insights you have so provided here. this hyperlink

  6. This is any SEO's dream: a link from a quality website with the perfect anchor text. TF backlinks