Thursday, July 12, 2012

Three ways to convert bam/bed file to bigwig, separated by strand

Here are three ways to convert bam/bed to bigwig, separated by strand:


# -----------  method 1


bamToBed -i accepted_hits.bam -split > accepted_hits.bed


awk '{if($6=="+") print}' accepted_hits.bed | sort -k1,1 | bedItemOverlapCount mm9 -chromSize=ChromInfo.txt stdin | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
awk '{if($6=="-") print}' accepted_hits.bed | sort -k1,1 | bedItemOverlapCount mm9 -chromSize=ChromInfo.txt stdin | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph

bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw


# ----------- method 2

bamToBed -i accepted_hits.bam -split > accepted_hits.bed


sort -k1,1 accepted_hits.bed | awk -v '{print $0 >> "accepted_hits.bed"$6}'
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed+ | sort -k1,1 -k2,2n > accepted_hits.plus.bedGraph
bedItemOverlapCount $index -chromSize=ChromInfo.txt accepted_hits.bed- | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}' > accepted_hits.minus.bedGraph

bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw


# ----------- method 3

bedtools genomecov -ibam -bg -split -strand + -i accepted_hits.bam -g ChromInfo.txt accepted_hits.plus.bedGraph
bedtools genomecov -ibam -bg -split -strand - -i accepted_hits.bam -g ChromInfo.txt accepted_hits.minus.bedGraph

bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw

8 comments:

  1. Hi!! Very useful post!!

    I first tried the 3rd method because looks more simpler that the others. I put in a bash for loop like this:


    for i in $(ls *.bam)

    do

    bedtools genomecov -ibam -bg -split -strand + -i $i -g chromInfo.txt > $i.plus.bedGraph
    bedtools genomecov -ibam -bg -split -strand - -i $i -g chromInfo.txt > $i.minus.bedGraph

    bedGraphToBigWig $i.plus.bedGraph chromInfo.txt accepted_hits.plus.bw
    bedGraphToBigWig $i.minus.bedGraph chromInfo.txt accepted_hits.minus.bw

    done

    But this error appears:

    Input error: Chromosome chr10 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr18 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr11 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr2 found in non-sequential lines. This suggests that the input file is not sorted correctly.
    Input error: Chromosome chr12 found in non-sequential lines. This suggests that the input file is not sorted correctly.

    So, it's seems that a sort step is missing

    But the first method works great, here is the bash for loop if some want to apply this pipeline to a bath of bams.

    for i in $(ls *.bam)

    do

    name=${i%.bam}

    echo $name

    bamToBed -i $name.bam -split > $name.bed


    awk '{if($6=="+") print}' $name.bed | sort -k1,1 | bedItemOverlapCount hg19 -chromSize=chromInfo.txt stdin | sort -k1,1 -k2,2n > $name.plus.bedGraph
    awk '{if($6=="-") print}' $name.bed | sort -k1,1 | bedItemOverlapCount hg19 -chromSize=chromInfo.txt stdin | sort -k1,1 -k2,2n | awk '{OFS="\t"; print $1,$2,$3,"-"$4}'$

    bedGraphToBigWig $name.plus.bedGraph chromInfo.txt $name.plus.bw
    bedGraphToBigWig $name.minus.bedGraph chromInfo.txt $name.minus.bw


    done


    cheers.

    ReplyDelete
    Replies
    1. bedtools require a sorted bam file, use samtools sort to sort it first

      Delete
  2. Anonymous2:26 PM

    The bedtools genomecov syntax in the 3rd method doesn't generate the right bedGraph, instead, it generates the data for coverage historgram.

    ReplyDelete
  3. There is a little correction as bedtools expects bed/gff/vcf files. This was tested with bedtools v2.17.0.

    Method 3 should be

    bedtools genomecov -bg -split -strand + -ibam accepted_hits.bam -g ChromInfo.txt > accepted_hits.plus.bedGraph
    bedtools genomecov -bg -split -strand - -ibam accepted_hits.bam -g ChromInfo.txt > accepted_hits.minus.bedGraph

    bedGraphToBigWig accepted_hits.plus.bedGraph ChromInfo.txt accepted_hits.plus.bw
    bedGraphToBigWig accepted_hits.minus.bedGraph ChromInfo.txt accepted_hits.minus.bw

    ReplyDelete
  4. Anonymous9:11 AM

    Hi!
    That is very helpful, but I have the problem that when I load my file on UCSC what it seems is that my minus file is just a shifted reflection of the plus and I do not see the strand specific information.
    I have mapped with STAR and than with samtools generated a bam file and than I used the third method.
    I am very puzzled.
    Best
    Giulia

    ReplyDelete
  5. Anonymous3:30 PM

    for f in *.bedgraph.gz; do
    zcat $f | grep -v track > $f.temp &&
    bedGraphToBigWig $f.temp ./hg19.chrom.sizes `echo $f | sed -e s/bedgraph.gz/bw/` &&
    rm $f.temp &
    done;

    ReplyDelete
  6. You may want to use bam2bigwig.py (https://github.com/lpryszcz/bin).
    More info here: http://bioinfoexpert.com/?p=235

    ReplyDelete
  7. Many of today's FTP software, also known as FTP clients, make use of graphical interface. This makes it very easy for new web hosting customers to transfer files to the server in a drag & drop environment.
    download shareit app

    ReplyDelete