Wednesday, October 08, 2014

Cufflinks mask option (-M/--mask-file) works when ...

Obviously I am not the only one who had questions on the "-M/--mask-file" mask GTF option in Cufflinks:

http://seqanswers.com/forums/showthread.php?t=8190
https://www.biostars.org/p/110289/
http://seqanswers.com/forums/showthread.php?t=29975

And too bad that no one from the Texedo group ever threw a piece of clue!

Here are few tips I found necessary to share in order to have it work:

1. The mask GTF file should have all 9 fields in required format. For example, the strand column should be '+', '-', or '.', not anything else. GTF/GFF file can be extracted from GENCODE (http://www.gencodegenes.org/) or downloaded from UCSC Table browser. It can be also converted from a bed file using Kent's bedToGenePred --> genePredToGtf. But be aware that that the bed file should have at least 6 columns (i.e. including strand column), otherwise the converted GTF file will have a "^@" in the strand column, which results in an invalid GTF.

For example, if you want to exclude all reads mapped to human mitochondrial genome,  you can use
echo "chrM 0 16571 mt 0 ." | bedToGenePred stdin stdout | genePredToGtf file stdin chrM.gtf

2. "-M" option also works for de novo assembly (cufflinks -g).

3. Using "-M" option should theoretically increase the FPKM value (comparing to no mask). So, if you observed opposite tread, there must be something wrong.

4. If you expect a lot of reads from the mask regions (e.g. chrM, rRNAs), you can substract the masked reads from your bam file before feeding to cufflinks, for example using "samtools view -L retained_region.bed".

1 comment:

  1. Hello,
    It's time to work with new project and make more clients. Make photography business more effective by taking our editing service. In fact, if you are really serious about your financial success, you can deal with Clipping Path Adept. We will give you the best editing service within your budget and it's on time.

    ReplyDelete