[BioC] Exon array annotation with limma?

James MacDonald jmacdon at med.umich.edu
Tue Dec 22 01:33:05 CET 2009


Oh yeah, it finally worked its way through my dim bulb. All the IDs in that annotation package are based on Entrez Gene IDs, not Affymetrix IDs (this is true of all the MBNI chips - the IDs are based on the annotation database they used for re-mapping the probes).

There is a package on BioC, the pd.huex.1.0.st.v2 package that has annotation information for the probes, but you would need to use oligo for the processing. I would be surprised if you would be able to use limma2annaffy for annotation however.

Best,

Jim



James W. MacDonald, M.S.
Biostatistician
Douglas Lab
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
>>> Michael Imbeault <michael.imbeault at sympatico.ca> 12/21/09 6:05 PM >>>
Hello James,

"Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
   value for "3948543_at" not found"

You're right, it's a mismatch problem; I just looked at all the "_at" 
probes in the db file and their id is not even right - some are 9994_at, 
which have no equivalent for the transcript clusters ids on my chip, 
which are of the format "3948543" - always 7 digits.

I don't know why is it so in the db file, but it doesn't match anything 
on my chip, nor on NetAffx, so I don't know why the MBNI folks did it 
like so.

My next options are to look at the source code of onechannelGUI (which 
doesn't use the MBNI file to accomplish its magic) or try to fix my 
RCurl / Biomart problem. I'll probably open a new mailing list topic for 
this one. Much frustration so far with the exon array chip workflow, 
it's so unlike the other chips; there really should be a wiki for this :)

Thanks for the help,
Michael

On 21/12/2009 3:54 PM, James W. MacDonald wrote:
> Hi Michael,
>
> Michael Imbeault wrote:
>> Thanks for the help James,
>>
>> I did:
>>
>> featureNames(eSet.gene) <- paste(featureNames(eSet.gene), "_at" , 
>> sep="")
>>
>> (note the  sep="", without it the probes were like "1000 _at"). 
>> Sadly, the end result is the same, except that as a side effect, the 
>> first column probe links don't work anymore (because of the added 
>> _at, they don't link to the right probe on the Affy site).
>>
>> I verified that the probes in eSet.gene contain _at after the 
>> operation. I build my eset with:
>>
>> eSet.gene <- new("ExpressionSet", exprs = rma.gene, phenoData = 
>> phenoData)
>>
>> Should I add annotation="huex10stv2" or "huex10stv2hsentrezg" or 
>> something similar? Do i need the cdf file in addition to the .db one?
>
> No. I think there is just a mismatch problem here. As you mentioned 
> below, onechannelGUI is able to create a table with annotation.
>
> All limma2annaffy is doing is passing the probeset IDs on to annaffy. 
> All the matching and link building are done there, but if the IDs 
> don't match to anything in the annotation package then annaffy will 
> just create an empty cell in the table.
>
> If you take the first 10 or so featureNames (with the _at appended) 
> and do e.g.,
>
> mget(<thefeaturenames>, huex10stv2hsentrezgUNIGENE)
>
> do you get anything returned?
>
> Best,
>
> Jim
>>
>> Thanks,
>> Michael
>>
>> On 21/12/2009 1:01 PM, James W. MacDonald wrote:
>>> Hi Michael,
>>>
>>> Michael Imbeault wrote:
>>>> Hello,
>>>>
>>>> I'm analyzing human exon arrays normalized using Affymetrix Power 
>>>> tools for normalization (using 'core' probes) and limma for 
>>>> significantly modulated genes (all at the gene level, of course).
>>>>
>>>> The limma2annaffy function produce tables, but with all annotation 
>>>> table cells empty. I'm doing:
>>>>
>>>> limma2annaffy(eSet.gene, fit2, design,cont.matrix, lib = 
>>>> "huex10stv2hsentrezg.db", interactive=F, pfilt=0.05, fldfilt=0.8)
>>>>
>>>> where huex10stv2entrezg.db is from : 
>>>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/12.1.0/entrezg.asp 
>>>>
>>>>
>>>> Is it the right file to use?
>>>
>>> Most likely. However, the MBNI folks have an unfortunate habit of 
>>> adding _at to the end of all their probesets, regardless the source. 
>>> So for instance, if I look at the probesets in this package, I get 
>>> something like this:
>>>
>>> > head(Lkeys(huex10stv2hsentrezgGENENAME))
>>> [1] "10000_at"     "10001_at"     "10002_at"
>>> [4] "100033423_at" "100033424_at" "100033425_at"
>>>
>>> And I am betting if you do something like 
>>> head(featureNames(eSet.gene)), you won't have any of those nasty _at 
>>> extensions.
>>>
>>> A simple albeit kludgy fix would be for you to first do
>>>
>>> featureNames(eSet.gene) <- paste(featureNames(eSet.gene), "_at")
>>>
>>> and then run limma2annaffy().
>>>>
>>>> Using onechannelGUI produce the same tables but with annotations, 
>>>> so I know there's a way to do it.
>>>
>>> I am betting that the onechannelGUI folks know about the extra _at 
>>> extensions and are silently stripping them. I could hypothetically 
>>> do the same, but I rebel against the idea that I should have to put 
>>> code in my package to protect people from infelicities in other 
>>> people's packages.
>>>
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>>
>>>> To complicate things further, limma2biomaRt, which is another 
>>>> option, fails with:
>>>>
>>>>     "Request to BioMart web service failed. Verify if you are still
>>>>     connected to the internet.  Alternatively the BioMart web 
>>>> service is
>>>>     temporarily down."
>>>>
>>>> which from the mailing list seem to be an RCurl problem. I tried 
>>>> updating it to the latest and older (0.92) versions, using 
>>>> --internet2 doesn't solve this and as far as I know i'm not using a 
>>>> proxy to connect to the net. I'm under Windows 7.
>>>>
>>>> Any help would be appreciated,
>>>> Michael
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues



More information about the Bioconductor mailing list