[BioC] biomaRt ensembl mmusculus does not contain all ensembl IDs (lincRNA, miRNA etc)?

Duke duke.lists at gmx.com
Mon Apr 18 22:51:37 CEST 2011


Hi Steffen,

Thanks so much for quick response. Yes, removing entrezgene does help!

Bests,

D.

On 4/18/11 4:41 PM, Steffen Durinck wrote:
> Hi Duke,
>
> It looks like this is a BioMart server issue where the wrong type of
> table join is made with the entezgene table.
> If you remove the entrezgene attribute you'll get everything back:
>
>> getBM(filters="ensembl_transcript_id", attributes=c("ensembl_transcript_id","ensembl_gene_id","external_transcript_id","refseq_dna"), values=ensTransIDs,mart= mart)
>    ensembl_transcript_id    ensembl_gene_id external_transcript_id refseq_dna
> 1    ENSMUST00000000001 ENSMUSG00000000001              Gnai3-001  NM_010306
> 2    ENSMUST00000042585 ENSMUSG00000037982             Gm9725-201
> 3    ENSMUST00000083463 ENSMUSG00000065397             Mir155-201  NR_029565
>
>
> We notified the BioMart team of this behavior a while ago and they
> would make a change in the next release.
>
> Cheers,
> Steffen
>
>
>
> On Mon, Apr 18, 2011 at 1:33 PM, Duke<duke.lists at gmx.com>  wrote:
>> Hi folks,
>>
>> Following instruction of biomaRt usage, I am trying to get information for
>> our mmu data. The code I used was below:
>>
>> ----------
>> library(biomaRt)
>> mart<- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))
>> ensTransIDs<- c("ENSMUST00000000001",
>> "ENSMUST00000083463","ENSMUST00000042585")
>> getBM(filters="ensembl_transcript_id",
>> attributes=c("ensembl_transcript_id","ensembl_gene_id",
>> "external_transcript_id", "external_gene_id", "refseq_dna", "entrezgene"),
>> values=ensTransIDs,mart= mart)
>> ----------
>>
>> This code runs fine with some transcript_ids, but for some of others (for
>> example, lincRNAs or miRNAs), it gave empty results. For example, the code
>> above for one gene, one lincRNA and one miRNA produced result:
>>
>>   ensembl_transcript_id    ensembl_gene_id external_transcript_id
>> 1    ENSMUST00000000001 ENSMUSG00000000001              Gnai3-001
>>   external_gene_id refseq_dna entrezgene
>> 1            Gnai3  NM_010306      14679
>>
>>
>> =>  only gene Gnai3 is detected, the other two are not.
>>
>> Anybody knows what I am doing wrong here, or it is just the database in
>> ensembl does not contain all the available transcript_id data?
>>
>> For the record, here is my sessionInfo():
>>
>>> sessionInfo()
>> R version 2.12.2 (2011-02-25)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] biomaRt_2.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] RCurl_1.4-3  XML_3.2-0    tools_2.12.2
>>
>> Thanks,
>>
>> D.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>



More information about the Bioconductor mailing list