[BioC] Renaming the seqlevels in a transcript database from GenomicFeatures
Sonali Arora
sarora at fhcrc.org
Fri Jun 27 21:18:13 CEST 2014
Hi Fong,
A TranscriptDb cannot be directly subset with 'keepSeqlevels' and
'dropSeqlevels'.
But the GRanges or GRangesLists extracted from the TranscriptDb can be
subset.
Below I show an example for how to extract GRanges, subset and change the
chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1")
## load packages
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg18.knownGene)
## extract GRanges
txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
seqlevels(txdb)
txbygene <- transcriptsBy(txdb, "gene")
seqlevels(txbygene)
## subset to get only ranges from chr 1 to 22
auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC",
group="auto")
txbygene <- keepSeqlevels(txbygene,auto)
seqlevels(txbygene)
##convert GRangesList to GRanges
ucsc_gr <- unlist(txbygene)
## renaming styles.
newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI")
ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle)
##check that naming was successful
ncbi_gr[seqnames(ncbi_gr)=="1"]
ucsc_gr[seqnames(ucsc_gr)=="chr1"]
You can easily adapt this example for your scenario - to convert from
"1" to "chr1". For more details : Please look at examples section of:
http://www.bioconductor.org/packages/devel/bioc/vignettes/GenomeInfoDb/inst/doc/GenomeInfoDb.pdf
and the man page for ?keepSeqlevels.
Thanks and Regards,
Sonali.
On 6/26/2014 2:02 PM, Fong Chun Chan wrote:
> Hi,
>
> I am using the GenomicFeatures package to extract exons from a transcript
> database file. I am using the ensembl transcript database which has no chr,
> yet my bam files that I am working have the chr prefix.
>
> I was thinking one could append chr to the seqlevels in the transcript
> database like this:
>
> library('GenomicFeatures')
> txdb <- loadDb(txdbFile)
> newSeqNames <- paste('chr', seqlevels(txdb), sep = '')
> names(newSeqNames) <- seqlevels(txdb)
> txdb <- renameSeqlevels( txdb, newSeqNames )
> seqlevels(txdb)
>
> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10"
> "chr11" "chr12" [13] "chr13" "chr14"
>
> This appears to work.
>
> But when I run:
>
> transcripts(txdb, vals = list( tx_chrom = "chr1") )
>
> GRanges with 0 ranges and 2 metadata columns:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> ---
> seqlengths:
>
> It suggests that the chromosome renaming is not working...because when I
> run:
>
> transcripts(txdb, vals = list( tx_chrom = '1') )
>
> GRanges with 17239 ranges and 2 metadata columns:
> seqnames ranges strand | tx_id
> tx_name
> <Rle> <IRanges> <Rle> | <integer>
> <character>
> [1] chr1 [11869, 14409] + | 126213
> ENST00000456328
>
> Has anyone had any luck in getting this to work?
>
> Thanks,
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Thanks and Regards,
Sonali
More information about the Bioconductor
mailing list