[BioC] biomaRt: retrieve total chromosome lengths

De Bondt, An-7114 [PRDBE] ADBONDT at PRDBE.jnj.com
Tue Oct 31 09:15:08 CET 2006


Hi Steffen,
Hi Jim,

Thanks for your suggestions!  
To avoid hard coding, I'll retrieve indeed the end position of the last
transcript on each of the chromosomes.  This is, relatively seen, pretty
close to the real length of the chromosome.

An

-----Original Message-----
From: Steffen Durinck [mailto:durincks at mail.nih.gov]
Sent: Monday, 30 October 2006 21:17
To: James W. MacDonald
Cc: De Bondt, An-7114 [PRDBE]; 'bioconductor at stat.math.ethz.ch'
Subject: Re: [BioC] biomaRt: retrieve total chromosome lengths


Hi An,

There is no way to retrieve the chromosome lengths with biomaRt when 
used with Ensembl.
The closest you'll get with biomaRt is to subtract the position of the 
'first' transcript from the position of the 'last' transcript.

If you want to use the Ensembl data to get this information (you'll need 
to do some browser clicking), you can select your species of interest at
http://www.ensembl.org/

for hsapiens:

http://www.ensembl.org/Homo_sapiens/index.html

then select a chromosome e.g.:

http://www.ensembl.org/Homo_sapiens/mapview?chr=1

and here you'll get the length.

Cheers,
Steffen

James W. MacDonald wrote:
> Hi An,
>
> De Bondt, An-7114 [PRDBE] wrote:
>   
>> Hi,
>>
>> How can I retrieve, for a certain organism (e.g. human), the total length
of
>> each of its chromosomes using biomaRt?
>> 	library(biomaRt)
>> 	mart <- useMart("ensembl")
>> 	mart <- useDataset("hsapiens_gene_ensembl", mart)
>> 	chr.lengths <- ???
>>     
>
> Well, this doesn't agree exactly with what I see on this webpage:
>
>
http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/faqs.s
html
>
> But it is pretty close. Of course I am finding the end of the 'last' 
> transcript on a given chromosome rather than the end of the chromosome 
> itself, so there will likely be differences. However, I don't see an 
> attribute that looks like it gives chromosomal information without first 
> being mapped through a gene, so I don't know if you can get exactly what 
> you want.
>
> If there is a way, Steffen Durinck will undoubtedly know what it is, but 
> I haven't seen a response from him as yet.
>
> Anyway, here is what I did.
>
>  > mart <- useMart("ensembl", "hsapiens_gene_ensembl")
> Checking attributes and filters ... ok
>  > a <- getBM("hsapiens_gene_ensembl_structure.transcript_chrom_end", 
> "chromosome_name", c(1:21, "x","y"), mart, output="list")
>  > sapply(a[[1]], max)
>          1         2         3         4         5
> 247197891 242713278 199439629 191246650 180727832
>          6         7         8         9        10
> 170735623 158630410 146252219 140191642 135347681
>         11        12        13        14        15
> 134361903 132289533 114110907 106354309 100334282
>         16        17        18        19        20
>   88771793  78646005  76106388  63802660  62429769
>         21         x         y
>   46935585 154908521  57767721
>
> Best,
>
> Jim
>
>
>   
>> Thanks in advance!
>> An
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>     
>
>
>   


-- 
Steffen Durinck, Ph.D.

Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/

Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877



More information about the Bioconductor mailing list