[BioC] getBM returns shorter vectors than values

Sean Davis sdavis2 at mail.nih.gov
Thu Sep 12 11:51:55 CEST 2013


On Thu, Sep 12, 2013 at 5:47 AM, Hans-Rudolf Hotz <hrh at fmi.ch> wrote:
> Hi Francesco
>
> This is due to the actual biomart server which is access by the Bioconductor
> package biomaRt. Unless, I am unaware of a recent change in the biomart
> server, there is now way to preserve the order of the input (or keep
> duplicates, or indicate which id does not have a result, etc).

Francesco,

It is an extra step, but see the match() and merge() functions to
rectify your input vector with the results from biomaRt.

Sean

> Of course, there is a quick and dirty (and bad) solution: You loop over your
> gene IDs and make an individual request for each gene....
>
>
> Regards, Hans-Rudolf
>
>
>
> On 09/12/2013 11:21 AM, Francesco Lescai wrote:
>>
>> Hi Steffen,
>> thanks for your reply, yes it works this way :-)
>>
>> however, getBM doesn't seem to return results in the same order. here's a
>> simple test
>>
>>> tesgenes
>>
>> [1] "ENSMUSG00000027255" "ENSMUSG00000020472" "ENSMUSG00000020807"
>> "ENSMUSG00000086769" "ENSMUSG00000016024"
>>>
>>> getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id",
>>> "external_gene_id"), values=tesgenes, mart=ensembl)
>>
>>       ensembl_gene_id external_gene_id
>> 1 ENSMUSG00000016024              Lbp
>> 2 ENSMUSG00000020472         Zkscan17
>> 3 ENSMUSG00000020807    4933427D14Rik
>> 4 ENSMUSG00000027255          Arfgap2
>> 5 ENSMUSG00000086769          Gm15587
>>
>> therefore if I have a data.frame with gene IDs and I just make a cbind, it
>> doesn't match.
>> I solved it by merging the two data.frame by columns id like this
>>
>> MyResults <- merge(
>>    MyResults,
>>    getBM(filters=c("ensembl_gene_id"), attributes=c("ensembl_gene_id",
>> "external_gene_id"), values= MyResults$geneID, mart=ensembl),
>>    by.x="geneID",
>>    by.y="ensembl_gene_id"
>>    )
>>
>> is there any way to control getBM() to return data in the same order of
>> the vector of values, or it is a behaviour due to the way the query works?
>>
>> thanks for your prompt reply,
>> Francesco
>>
>> On 11 Sep 2013, at 17:46, Steffen Durinck
>> <durinck.steffen at gene.com<mailto:durinck.steffen at gene.com>> wrote:
>>
>> Hi Francesco,
>>
>> That is correct, biomaRt doesn't return anything if it can find it.  It is
>> designed to work just like the BioMart web services at
>> www.biomart.org<http://www.biomart.org/> which behave the same.
>> I usually add the filter as an attribute so I can match things up and
>> figure out what did return a result.
>> Your query would be:
>>
>> Anno <-
>> getBM(attributes=c("affy_huex_1_0_st_v2","strand","transcript_start","chromosome_name","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ensembl)
>>
>> If you want a vector back with the same length as ID and with NA's where
>> you didn't get a result, you could write a wrapper function around getBM
>> that does that for you.
>>
>> Best,
>> Steffen
>>
>>
>> On Wed, Sep 11, 2013 at 6:15 AM, Francesco Lescai
>> <francesco.lescai at hum-gen.au.dk<mailto:francesco.lescai at hum-gen.au.dk>>
>> wrote:
>> Hi,
>> I have the same problem, and it's been this way since I used biomaRt I
>> might say.
>> is there any way to force getBM to return NA when the attribute
>> corresponding to the filter cannot be found?
>> At least when annotating your results you'd be able to get same length
>> vectors, and it would be much easier to do that in data.frames.
>>
>> thanks for any suggestions,
>> cheers,
>> Francesco
>>
>>
>> On 29 Aug 2013, at 05:40, Atul
>> <atulkakrana at outlook.com<mailto:atulkakrana at outlook.com><mailto:atulkakrana at outlook.com<mailto:atulkakrana at outlook.com>>>
>> wrote:
>>
>> Hi All,
>>
>> I am using Oligo package to analyse samples generated using HuEx 1.0 ST v2
>> chip. The problem I am facing is with annotating the results.
>>
>> Here is my code (simplified):
>>
>> celFilesA <- list.celfiles()
>> AF_data.A <- read.celfiles(celFilesA,pkgname='pd.huex.1.0.st.v2')
>> AF.eset.RMA <- rma(AF_data.A,target='core')
>>
>>> dim(exprs(AF.eset.RMA))
>>
>> [1] 22011    10
>>
>> ##Attempt to annotate
>> library(biomaRt)
>> ID <- rownames(AF.eset.RMA)
>> ensembl <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl')
>> Anno <-
>> getBM(attributes=c("strand","transcript_start","chromosome_name","hgnc_symbol"),filters=c("affy_huex_1_0_st_v2"),values=ID,mart=ensembl)
>>
>>> dim(Anno)
>>
>> [1] 1635    4
>>
>> As you see, out of total 22011 genes/probeset I can annotate only 1635
>> genes/probesets. Is there any way I can get the annotations for all of the
>> genes/probesets and add them back to my expression set (AF.eset.RMA). So,
>> that annotations are included in the final results.
>>
>>
>> Usually, with other chips I do this:
>> ID <- featureNames(AF.eset.RMA)
>> Symbol <- getSYMBOL(ID, 'mouse4302.db')
>> Name <- as.character(lookUp(ID, "mouse4302.db", "GENENAME"))
>> tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name,stringsAsFactors=F)
>> tmp[tmp=="NA"] <- NA
>> fData(AF.esetRMA) <- tmp
>>
>> And this is what I want to achieve in present case. I would appreciate
>> your help.
>>
>> Thanks
>>
>> AK
>>
>> _______________________________________________
>> Bioconductor mailing list
>>
>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org><mailto:Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list