[BioC] error/problem with biomaRt gene symbol query

James W. MacDonald jmacdon at med.umich.edu
Tue Jul 6 19:08:58 CEST 2010


Hi Sergii,

On 7/5/2010 5:20 AM, Sergii Ivakhno wrote:
> Dear All,
>
> I want to retrieve gene symbols for microarray probes such that I
> receive some output even if no gene spans the probe. I am using position
> information in biomaRt for this:
>
> genes=getBM(attributes = c("hgnc_symbol"), filters=
> c("chromosome_name","start","end"), values =
> list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl)
>
>
>
> Unfortunately, it seems that biomaRt does not provide NULL output for
> probes outside genes, so that it is not possible to assign resulting
> probes to gene names.

Rather than trying to get biomaRt to churn out NULL results, why don't 
you just get back the positions that match to gene positions, and then 
merge() with your original position data?

See ?merge, as well as the all.x argument.


>
>
>
> posnew = position of array probes
>
> length(posnew ) =  24760
>
> length(genes) = 336 (only !)
>
>
>
> I tried few tricks:
>
> 1) Explicitly specifying na.value = "no gene";
>
> 2) Also trying to retrieve "chromosome_name", as this is bound to
> provide output for every value in posnew.
>
>
>
> genes=getBM(attributes = c("chromosome_location","hgnc_symbol"),
> filters= c("chromosome_name","start","end"), values =
> list(rep(i,length(posnew)),posnew,posnew+10), mart = ensembl,na.value =
> "no gene")
>
>
>
> The query returns error:
>
>
>
> 1 Query ERROR: caught BioMart::Exception::Usage: Attributes from
> multiple attribute pages are not allowed
>
> Error in getBM(attributes = c("chromosome_location", "hgnc_symbol"),
> filters = c("chromosome_name",  :
>
>    Number of columns in the query result doesn't equal number of
> attributes in query.  This is probably an internal error, please report.
>
>
>
>
>
> Would be grateful for suggestions - I realise that you can biomaRt
> within lapply loop to query one position  at a time, but this proves to
> be too time consuming when you have 1 million probes.

Yes, and repeatedly hitting online resources in a tight loop is an 
optimized strategy for getting your IP banned.

Best,

Jim


>
> Many thanks!
>
> Sergii
>
>
>
>> sessionInfo()
>
> R version 2.7.0 (2008-04-22)
>
> x86_64-unknown-linux-gnu
>
>
>
> locale:
>
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.U
> TF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=
> C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATI
> ON=C
>
>
>
> attached base packages:
>
>   [1] splines   grid      tools     stats     graphics  grDevices utils
>
>   [8] datasets  methods   base
>
>
>
> other attached packages:
>
>   [1] biomaRt_1.14.1       RCurl_0.9-4          snapCGH_1.8.0
>
>   [4] aCGH_1.14.0          sma_0.5.15           multtest_1.20.0
>
>   [7] cluster_1.11.10      GLAD_1.16.0          DNAcopy_1.14.0
>
> [10] tilingArray_1.18.0   pixmap_0.4-7         geneplotter_1.18.0
>
> [13] annotate_1.18.0      xtable_1.5-2         AnnotationDbi_1.2.1
>
> [16] RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0
>
> [19] survival_2.34-1      vsn_3.6.0            lattice_0.17-6
>
> [22] strucchange_1.3-3    sandwich_2.1-0       zoo_1.5-3
>
> [25] RColorBrewer_1.0-2   affy_1.18.1          preprocessCore_1.2.0
>
> [28] affyio_1.8.0         Biobase_2.0.1        limma_2.14.5
>
>
>
> loaded via a namespace (and not attached):
>
> [1] KernSmooth_2.22-22 XML_1.96-0
>
>
> ----------------------------------------------
> Sergii Ivakhno
>
> PhD student
>
> Computational Biology Group
> Cancer Research UK Cambridge Research Institute
> Li Ka Shing Centre
> Robinson Way
> Cambridge CB2 0RE
> England
>
> +44 (0)1223 404293 (O)
> +44 (0)1223 404128 (F)
>
> http://www.compbio.group.cam.ac.uk<http://www.compbio.group.cam.ac.uk/>
> /
>
>
> This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a charity registered under number 1089464 and a company limited by guarantee registered in England&  Wales under number 4325234. Our registered address is 61 Lincoln's Inn Fields, London WC2A 3PX. Our central telephone number is 020 7242 0200.
>
> This communication and any attachments contain information which is confidential and may also be privileged.   It is for the exclusive use of the intended recipient(s).  If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful.  If you have received this communication in error, please notify the sender and delete the email and destroy any copies of it.
>
> E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses.  We do not accept liability for any such matters or their consequences.  Anyone who communicates with us by e-mail is taken to accept the risks in doing so.
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list