[BioC] Renaming the seqlevels in a transcript database from GenomicFeatures

Hervé Pagès hpages at fhcrc.org
Sat Jun 28 07:54:53 CEST 2014


Hi guys,

Indeed, 'transcripts(txdb, vals=list(tx_chrom="some_chrom"))'
is broken when the "effective" seqlevels are different from the
"native" ones. More precisely, the transcripts(), exons(), and
cds() extractors are not translating the user-supplied seqlevels
to native seqlevels before injecting them in the SQL query they
generate. We need to fix that.

We will post again here when it's done. Thanks for the report.

H.


On 06/27/2014 07:53 PM, Michael Lawrence wrote:
> Hi Sonali,
>
> I'm pretty sure what Fong has reported is a legitimate bug. The seqlevels
> are mapped at  some level, but the values for tx_chrom have not been
> updated.
>
> Also, what you say about keepSeqlevels and TranscriptDb is not true, at
> least for devel (and this stuff was written a year ago):
>
>> seqlevels(keepSeqlevels(txdb, "1"))
> [1] "1"
>
> Michael
>
>
> On Fri, Jun 27, 2014 at 12:18 PM, Sonali Arora <sarora at fhcrc.org> wrote:
>
>> Hi Fong,
>>
>> A TranscriptDb cannot be directly subset with 'keepSeqlevels' and
>> 'dropSeqlevels'.
>> But the GRanges or GRangesLists extracted from the TranscriptDb can be
>> subset.
>> Below I show an example for how to extract GRanges, subset and change the
>> chromosome name style from "UCSC" to "NCBI" ( ie from "chr1" to "1")
>>
>> ## load packages
>> library(GenomicFeatures)
>> library(TxDb.Hsapiens.UCSC.hg18.knownGene)
>>
>> ## extract GRanges
>> txdb <- TxDb.Hsapiens.UCSC.hg18.knownGene
>> seqlevels(txdb)
>> txbygene <- transcriptsBy(txdb, "gene")
>> seqlevels(txbygene)
>>
>> ## subset to get only ranges from chr 1 to 22
>> auto <- extractSeqlevelsByGroup(species="Homo sapiens", style="UCSC",
>> group="auto")
>> txbygene <- keepSeqlevels(txbygene,auto)
>> seqlevels(txbygene)
>>
>> ##convert GRangesList to GRanges
>> ucsc_gr <- unlist(txbygene)
>>
>> ## renaming styles.
>> newStyle <- mapSeqlevels(seqlevels(ucsc_gr),"NCBI")
>> ncbi_gr <- renameSeqlevels(ucsc_gr, newStyle)
>>
>> ##check that naming was successful
>> ncbi_gr[seqnames(ncbi_gr)=="1"]
>> ucsc_gr[seqnames(ucsc_gr)=="chr1"]
>>
>> You can easily adapt this example for your scenario - to convert from
>> "1" to "chr1". For more details : Please  look at examples section of:
>> http://www.bioconductor.org/packages/devel/bioc/vignettes/
>> GenomeInfoDb/inst/doc/GenomeInfoDb.pdf
>> and the man page for ?keepSeqlevels.
>>
>> Thanks and Regards,
>> Sonali.
>>
>>
>>
>>
>> On 6/26/2014 2:02 PM, Fong Chun Chan wrote:
>>
>>> Hi,
>>>
>>> I am using the GenomicFeatures package to extract exons from a transcript
>>> database file. I am using the ensembl transcript database which has no
>>> chr,
>>> yet my bam files that I am working have the chr prefix.
>>>
>>> I was thinking one could append chr to the seqlevels in the transcript
>>> database like this:
>>>
>>> library('GenomicFeatures')
>>> txdb <- loadDb(txdbFile)
>>> newSeqNames <- paste('chr', seqlevels(txdb), sep = '')
>>> names(newSeqNames) <- seqlevels(txdb)
>>> txdb <- renameSeqlevels( txdb, newSeqNames )
>>> seqlevels(txdb)
>>>
>>> [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10"
>>> "chr11" "chr12" [13] "chr13" "chr14"
>>>
>>> This appears to work.
>>>
>>> But when I run:
>>>
>>> transcripts(txdb, vals = list( tx_chrom = "chr1") )
>>>
>>> GRanges with 0 ranges and 2 metadata columns:
>>>      seqnames    ranges strand |     tx_id     tx_name
>>>         <Rle> <IRanges>  <Rle> | <integer> <character>
>>>     ---
>>>     seqlengths:
>>>
>>> It suggests that the chromosome renaming is not working...because when I
>>> run:
>>>
>>> transcripts(txdb, vals = list( tx_chrom = '1') )
>>>
>>> GRanges with 17239 ranges and 2 metadata columns:
>>>             seqnames                 ranges strand   |     tx_id
>>> tx_name
>>>                <Rle>              <IRanges>  <Rle>   | <integer>
>>> <character>
>>>         [1]     chr1         [11869, 14409]      +   |    126213
>>> ENST00000456328
>>>
>>> Has anyone had any luck in getting this to work?
>>>
>>> Thanks,
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.
>>> science.biology.informatics.conductor
>>>
>>
>> --
>> Thanks and Regards,
>> Sonali
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.
>> science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list