[BioC] creating GSEA files using biomart

Juliet Hannah juliet.hannah at gmail.com
Thu Sep 13 18:08:32 CEST 2012


Thanks Steffen for the helpful answers. "description", how embarrassing!


On Thu, Sep 13, 2012 at 11:42 AM, Steffen Durinck
<durinck.steffen at gene.com> wrote:
> Hi Juliet,
>
> The third attribute you're looking for is 'description':
>
>  idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol","description"),
> filters ="affy_hg_u133a",values = probeSets, mart = ensembl)
>
> Gives:
>
>   affy_hg_u133a hgnc_symbol
> description
> 1     219666_at      MS4A6A              membrane-spanning 4-domains,
> subfamily A, member 6A [Source:HGNC Symbol;Acc:13375]
> 2   220547_s_at      FAM35B                     family with sequence
> similarity 35, member B [Source:HGNC Symbol;Acc:31425]
> 3     218034_at        FIS1 fission 1 (mitochondrial outer membrane) homolog
> (S. cerevisiae) [Source:HGNC Symbol;Acc:21689]
> 4   220547_s_at     FAM35B2       family with sequence similarity 35, member
> B2 (pseudogene) [Source:HGNC Symbol;Acc:34038]
> 5   220547_s_at      FAM35A                     family with sequence
> similarity 35, member A [Source:HGNC Symbol;Acc:28773]
>
>
> There is no systematic way to figure out with attribute name you need to use
> all you have is the attribute name and a description of the attribute.  The
> more you get used to looking at those, the easier it gets to figure out
> which one you need and once you know the attributes you need, often you'll
> be using a similar set of attributes most of the time
>
>
> It is interesting to see in your example that one probeset maps to three
> different but closely related genes.  In the past I thought Ensembl would
> remove such unambiguous  mappers.  I think the best to do in this case is to
> remove all probes that map to multiple genes as there is no way to tell
> which gene you'll be measuring.  I'll report this example to the Ensembll
> team as they used to do this for us.
>
> Cheers,
> Steffen
>
> On Thu, Sep 13, 2012 at 8:29 AM, Juliet Hannah <juliet.hannah at gmail.com>
> wrote:
>>
>> All,
>>
>> I am trying to create the GSEA chip file. This example uses Affy data,
>> and the chip file is already available. I'm
>> doing this as an exercise in preparation for other platforms.
>>
>> The chip file should look like:
>>
>>
>> Probe Set ID    Gene Symbol     Gene Title
>> 244901_at       ORF25   hypothetical protein
>> 244902_at       NAD4L   NADH dehydrogenase subunit 4L
>> 244912_at       CCB382  cytochrome c biogenesis orf382
>> 244919_at       CCB203  cytochrome c biogenesis orf203
>> 244925_at       NAD7    NADH dehydrogenase subunit 7
>>
>> How can I obtain the third column from biomart. I tried searching the
>> attributes, but couldn't find the right name. Is it a matter of trial
>> and error to find the correct attribute, or
>> are there systematic ways to find it. Here is what I have so far:
>>
>> library("biomaRt")
>> probeSets <- c("219666_at", "220547_s_at", "218034_at")
>>
>> ensembl = useMart("ensembl")
>> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
>>
>> idens <- getBM(attributes = c("affy_hg_u133a","hgnc_symbol"), filters
>> = "affy_hg_u133a",values = probeSets, mart = ensembl)
>>
>>
>> Also, does anyone have any suggestions regarding how to handle the
>> duplicates (seen in this example) with respect to GSEA.
>>
>> Thanks,
>>
>> Juliet Hannah
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list