Tuesday, August 14, 2012

convert bed to gtf, gtf to bed

bed --> gtf:

bedToGenePred input.bed stdout | genePredToGtf file stdin output.gtf

Here is an example from UCSC wiki (http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format):

Some gene tracks are in a bed format in the database, perhaps with extra columns past the standard bed format. In this case, extract the standard bed columns, convert it to a genePred and then to a gtf. For example

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -e "select chrom,chromStart,chromEnd,name,score,strand,thickStart,thickEnd from wgRna;" hg19 | bedToGenePred stdin stdout | genePredToGtf file stdin wgRna.gtf

gtf --> bed:

download gtf2bed.pl from Eric's site:

perl gtf2bed.pl input.gtf > output.bed

If simply converting gtf to bed6 format (no exons/intron info), the following bash line can work:
fgrep -w transcript gencode.v17.annotation.gtf | sed 's/[";]//g;' | awk '{OFS="\t"; print $1, $4-4,$5,$12,0,$7}'
or a bed9 format with additional info on gene/type/name
fgrep -w transcript gencode.v17.annotation.gtf | sed 's/[";]//g;' | awk '{OFS="\t"; print $1, $4-4,$5,$12,0,$7,$18,$14,$10}' 

Also, I edited Eric's code a bit to output a more meaningful gene ID, e,g.

($transid) = $f[8]=~ /transcript_id "([^"]+)"/;
($geneid) = $f[8]=~ /gene_id "([^"]+)"/;
($gene_type) = $f[8]=~ /gene_type "([^"]+)"/;
($gene_name) = $f[8]=~ /gene_name "([^"]+)"/;
($trans_type) = $f[8]=~ /transcript_type "([^"]+)"/;


  1. There is a bug in the gtf2bed.pl script. See https://code.google.com/p/ea-utils/issues/list

    1. It's already clarified as not a bug. see https://code.google.com/p/ea-utils/issues/detail?id=28

  2. We are not being debilitated from pursuing the ants. The reality, nonetheless, still remains that ants can't be eliminated from one's dwelling place that way.woodinville exterminators

  3. DO make sure there are enough clean linens and blankets and pillows on the bed for both of you. https://ampleom.livejournal.com/5150.html