Wednesday, August 22, 2012

How Tophat call junctions from deletions?

We noticed that in the output SAM file of Tophat, there are two kinds of CIGAR line indicating the same type of alignment - deletion:

xxxxMxxxxDxxxxM  and xxxxMxxxxNxxxxM

Here is what I parsed from the source code of Tophat

 -------------- long_spanning_reads.cpp --------------

       * FIXME, currently just differentiating between a deletion and a
       * reference skip based on length. However, would probably be better
       * to denote the difference explicitly, this would allow the user
       * to supply their own (very large) deletions
      if ((lb->right - lb->left - 1) <= max_deletion_length)
          new_cigar.push_back(CigarOp(DEL, lb->right - lb->left - 1));
          antisense_closure = prev_hit->is_spliced() ? prev_hit->antisense_splice() : curr_hit->antisense_splice();
          new_cigar.push_back(CigarOp(REF_SKIP, lb->right - lb->left - 1));
          antisense_closure = lb->antisense;

As the author commented, this is still to fix.

From the code above, it seems that max_deletion_length is the one to control the DEL from REF_SKIP. But it does not change the output if I tune the value of --max-deletion-length in Tophat.

I also tried to confirm this by compiling the code myself, but have an error from making at the moment ("tophat_reports.cpp:105: error: ‘boost::random::mt19937’ has not been declared"). mt19937 is found in ~/include/boost/tr1/random.hpp, but even I include <boost/tr1/random.hpp> in tophat_reports.cpp, it still doesn't work. Don;t know why...

No comments:

Post a Comment