[BioC] getBM returns shorter vectors than values

Hans-Rudolf Hotz hrh at fmi.ch
Thu Sep 12 11:47:16 CEST 2013


Hi Francesco

This is due to the actual biomart server which is access by the 
Bioconductor package biomaRt. Unless, I am unaware of a recent change in 
the biomart server, there is now way to preserve the order of the input 
(or keep duplicates, or indicate which id does not have a result, etc).

Of course, there is a quick and dirty (and bad) solution: You loop over 
your gene IDs and make an individual request for each gene....


Regards, Hans-Rudolf



On 09/12/2013 11:21 AM, Francesco Lescai wrote:
> Hi Steffen,
> thanks for your reply, yes it works this way :-)
>
> however, getBM doesn't seem to return results in the same order. here's a simple test
>
>> tesgenes
> [1] "ENSMUSG00000027255" "ENSMUSG00000020472" "ENSMUSG00000020807" "ENSMUSG00000086769" "ENSMUSG00000016024"
>> getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id", "external_gene_id"), values=tesgenes, mart=ensembl)
>       ensembl_gene_id external_gene_id
> 1 ENSMUSG00000016024              Lbp
> 2 ENSMUSG00000020472         Zkscan17
> 3 ENSMUSG00000020807    4933427D14Rik
> 4 ENSMUSG00000027255          Arfgap2
> 5 ENSMUSG00000086769          Gm15587
>
> therefore if I have a data.frame with gene IDs and I just make a cbind, it doesn't match.
> I solved it by merging the two data.frame by columns id like this
>
> MyResults <- merge(
>    MyResults,
>    getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id", "external_gene_id"), values= MyResults$geneID, mart=ensembl),
>    by.x="geneID",
>    by.y="ensembl_gene_id"
>    )
>
> is there any way to control getBM() to return data in the same order of the vector of values, or it is a behaviour due to the way the query works?
>
> thanks for your prompt reply,
> Francesco
>
> On 11 Sep 2013, at 17:46, Steffen Durinck <durinck.steffen at gene.com<mailto:durinck.steffen at gene.com>> wrote:
>
> Hi Francesco,
>
> That is correct, biomaRt doesn't return anything if it can find it.  It is designed to work just like the BioMart web services at www.biomart.org<http://www.biomart.org/> which behave the same.
> I usually add the filter as an attribute so I can match things up and figure out what did return a result.
> Your query would be:
>
> Anno <- getBM(attributes=c("affy_huex_1_0_st_v2","strand","transcript_start","chromosome_name","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ensembl)
>
> If you want a vector back with the same length as ID and with NA's where you didn't get a result, you could write a wrapper function around getBM that does that for you.
>
> Best,
> Steffen
>
>
> On Wed, Sep 11, 2013 at 6:15 AM, Francesco Lescai <francesco.lescai at hum-gen.au.dk<mailto:francesco.lescai at hum-gen.au.dk>> wrote:
> Hi,
> I have the same problem, and it's been this way since I used biomaRt I might say.
> is there any way to force getBM to return NA when the attribute corresponding to the filter cannot be found?
> At least when annotating your results you'd be able to get same length vectors, and it would be much easier to do that in data.frames.
>
> thanks for any suggestions,
> cheers,
> Francesco
>
>
> On 29 Aug 2013, at 05:40, Atul <atulkakrana at outlook.com<mailto:atulkakrana at outlook.com><mailto:atulkakrana at outlook.com<mailto:atulkakrana at outlook.com>>> wrote:
>
> Hi All,
>
> I am using Oligo package to analyse samples generated using HuEx 1.0 ST v2 chip. The problem I am facing is with annotating the results.
>
> Here is my code (simplified):
>
> celFilesA <- list.celfiles()
> AF_data.A <- read.celfiles(celFilesA,pkgname='pd.huex.1.0.st.v2')
> AF.eset.RMA <- rma(AF_data.A,target='core')
>
>> dim(exprs(AF.eset.RMA))
> [1] 22011    10
>
> ##Attempt to annotate
> library(biomaRt)
> ID <- rownames(AF.eset.RMA)
> ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')
> Anno <- getBM(attributes=c("strand","transcript_start","chromosome_name","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ensembl)
>
>> dim(Anno)
> [1] 1635    4
>
> As you see, out of total 22011 genes/probeset I can annotate only 1635 genes/probesets. Is there any way I can get the annotations for all of the genes/probesets and add them back to my expression set (AF.eset.RMA). So, that annotations are included in the final results.
>
>
> Usually, with other chips I do this:
> ID <- featureNames(AF.eset.RMA)
> Symbol <- getSYMBOL(ID, 'mouse4302.db')
> Name <- as.character(lookUp(ID, "mouse4302.db", "GENENAME"))
> tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,stringsAsFactors=F)
> tmp[tmp=="NA"] <- NA
> fData(AF.esetRMA) <- tmp
>
> And this is what I want to achieve in present case. I would appreciate your help.
>
> Thanks
>
> AK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org><mailto:Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>          [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list