[BioC] "start" filter in biomaRt
Shi, Tao
shidaxia at yahoo.com
Mon Aug 25 23:43:30 CEST 2008
Hi Steffen,
Thanks!
...Tao
----- Original Message ----
From: "steffen at stat.Berkeley.EDU" <steffen at stat.Berkeley.EDU>
To: "Shi, Tao" <shidaxia at yahoo.com>
Cc: bioconductor at stat.math.ethz.ch
Sent: Monday, August 25, 2008 12:33:29 PM
Subject: Re: [BioC] "start" filter in biomaRt
Hi Tao,
If you add the strand attribute (see below), you'll notice that the first
gene lays on the reverse strand. In ensembl everything is given on the
forward strand. The start position of the first gene is in fact 1029881
as it lays on the reverse strand and this is why it was returned.
mart = useMart("ensembl", dataset="hsapiens_gene_ensembl")
g = getBM(c("ensembl_gene_id","start_position","end_position", "strand"),
filters = c("chromosome_name","start"), values=list(17,1000000), mart =
mart)
ord=order(g[,2])
g=g[ord,]
g[1:10,]
ensembl_gene_id start_position end_position strand
1135 ENSG00000159842 853510 1029881 -1
1442 ENSG00000205899 1120603 1121504 1
690 ENSG00000184811 1129707 1151031 1
1134 ENSG00000209456 1156460 1156529 -1
452 ENSG00000108953 1194595 1250267 -1
451 ENSG00000167193 1272190 1306294 -1
1133 ENSG00000197879 1314232 1342633 -1
450 ENSG00000132376 1344622 1366719 -1
689 ENSG00000174238 1368037 1412835 -1
449 ENSG00000167703 1424448 1478880 -1
cheers,
Steffen
> Hi list,
>
> In the following code, I'm using 'biomaRt' to retrieve all the genes that
> start beyond 1000000 on chromosome 17. However, I'm not expecting the
> first gene which starts at 853510 to be in the resulting list. It seems
> the "start" filter is not just simple ">=". More explanations, please!
>
> Thanks!
>
> ...Tao
>
>
>
>> library(biomaRt)
> Loading required package: RCurl
>> mart = useMart("ensembl", dataset="hsapiens_gene_ensembl")
> Checking attributes and filters ... ok
>> tmp2 <-
>> getBM(attributes=c("ensembl_gene_id","start_position","end_position"),filters
>> = c("chromosome_name", "start"), values=list("17", "1000000"), mart =
>> mart)
>> tmp2[order(tmp2$start_position),][1:10,]
> ensembl_gene_id start_position end_position
> 1135 ENSG00000159842 853510 1029881
> 1442 ENSG00000205899 1120603 1121504
> 690 ENSG00000184811 1129707 1151031
> 1134 ENSG00000209456 1156460 1156529
> 452 ENSG00000108953 1194595 1250267
> 451 ENSG00000167193 1272190 1306294
> 1133 ENSG00000197879 1314232 1342633
> 450 ENSG00000132376 1344622 1366719
> 689 ENSG00000174238 1368037 1412835
> 449 ENSG00000167703 1424448 1478880
>> sessionInfo()
> R version 2.7.0 (2008-04-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_1.14.0 RCurl_0.9-3
>
> loaded via a namespace (and not attached):
> [1] XML_1.95-2
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list