[BioC] Analysing Human Gene ST 1.0 Arrays with oligo and oneChannelGUI yield different number of probesets

Benilton Carvalho bcarvalh at jhsph.edu
Fri Oct 30 23:00:57 CET 2009


Javier,

the gene array is a "subset" of the exon array, therefore the  
probesets map to exons. The (core) MPS file groups (reliable)  
probesets forming "meta probesets", which map to genes and, AFAIK, do  
not include controls.

I'm not sure what you mean with "best package available for  
annotation". The "pd.hugene*" package is the one used by oligo so you  
can preprocess the data.

The hugene10stprobeset.db will give you information on the *probeset*.

If you summarize to the gene level, you'll be looking at the  
hugene10sttranscriptcluster.db package. In this case, say you want the  
ENTREZID for "7896759", then you can just:

library(hugene10sttranscriptcluster.db)
hugene10sttranscriptclusterENTREZID[["7896759"]]

Cheers,
b

On Oct 30, 2009, at 10:41 AM, Javier Pérez Florido wrote:

> Dear Benilton,
> Thanks for your help. I have more questions. What is the summarization
> at gene-level? I thought that a probeset = gene.
> The "new probesets" defined in the MPS file, are related to the
> experiment or are they controls?
>
> Two more things:
> - I would like to perform an analysis without the control genes. How  
> may
> I know which genes are controls to remove them from the analysis?
> - What is the best package available for
> annotation?hugene10stprobeset.db? I suppose that using the  
> featureNames
> of the expression set, I can get the ENTREZID of the probesets through
> this annotation package.
>
> Thanks again,
> Javier
>
>
> Benilton Carvalho escribió:
>> That makes me think that I forgot one 'svn commit' sometime in the
>> past... Apologies for that.
>>
>> In the meantime, please use the following description.
>>
>> Until BioC 2.4, oligo summarized only to the probeset level (as
>> defined in the PGF file). Affymetrix made available meta-probeset
>> files (MPS) that define "new probesets", which allow summarization to
>> the gene-level. For exon arrays, there are 3 MPSs (depending on the
>> quality): core (best), extended and full. For gene arrays, there's
>> only "core" MPS.
>>
>> Therefore, summaries to the gene level should use this additional
>> annotation.
>>
>> So, using the 'target' argument, you can set to what level you want
>> the summarization to be: "probeset", "core", "extended" and "full"  
>> are
>> the possible values (this is available starting now on BioC 2.5).
>>
>> I'll make sure the documentation is updated soon to reflect this  
>> change.
>>
>> Once again, apologies.
>>
>> b
>>
>> On Oct 29, 2009, at 8:21 PM, Javier Pérez Florido wrote:
>>
>>> Dear Benilton,
>>> Thanks for your quick reply. Now, it works with the target argument.
>>> However, I searched on the web for the meaning of this argument and
>>> couldn't find anything. What is "target" for?
>>> Why does oligo's manual say: "The ExpressionSet returned when either
>>> Exon/Gene-FeatureSet objects are passed contain extra annotation  
>>> on the
>>> featureData slot that the user should take into account for
>>> exon/gene-level analyses"?
>>> I didn't work with Human Gene ST arrays before, so, I quite new on  
>>> this
>>> topic.
>>> Thanks again,
>>> Javier
>>>
>>>
>>>
>>>
>>>
>>> Benilton Carvalho escribió:
>>>> Dear Javier,
>>>>
>>>> You have not provided the exact call to RMA you used nor your
>>>> sessionInfo() information.
>>>>
>>>> If you're using the latest oligo (BioC 2.5), you can call:
>>>>
>>>> results = rma(object, target="core")
>>>>
>>>> to get the 33297 "probesets" you refer to...
>>>>
>>>> Note that building the package yourself is a nice exercise, but you
>>>> could just download it via biocLite().
>>>>
>>>> Cheers,
>>>>
>>>> b
>>>>
>>>> On Oct 29, 2009, at 5:42 PM, Javier Pérez Florido wrote:
>>>>
>>>>> Dear list,
>>>>> Some time ago I analysed a set of Human Gene ST Arrays with
>>>>> oneChannelGUI. Now I'm trying to reproduce the results using oligo
>>>>> package but I am quite surprised with the results obtained. With  
>>>>> oligo
>>>>> package, after preprocessing with rma, the number of probesets are
>>>>> 253002 while with oneChannelGUI the number of probesets are  
>>>>> 33297, and
>>>>> the CEL files are the same!!!
>>>>>
>>>>> For oligo package, and prior to read the CEL files,  I had to  
>>>>> build
>>>>> the
>>>>> annotation package using pdInfoPackage, since the CDF file is not
>>>>> supported by Affymetrix. For this purpose, first I had to  
>>>>> download the
>>>>> library files "Human Gene 1.0 ST Array, Analysis" from Affymetrix
>>>>> website. The necessary files for building the package are:
>>>>> HuGene-1_0-st-v1.r4.pgf
>>>>> HuGene-1_0-st-v1.r4.clf
>>>>> HuGene-1_0-st-v1.na29.hg18.probeset (CSV file)
>>>>>
>>>>> Then, I executed the following commands:
>>>>> library(pdInfoBuilder)
>>>>> baseDir <- "pathWhereTheFilesAre"
>>>>> (pgf <- list.files(baseDir, pattern = ".pgf",full.names = TRUE))
>>>>> (clf <- list.files(baseDir, pattern = ".clf",full.names = TRUE))
>>>>> (prob <- list.files(baseDir, pattern =  
>>>>> ".probeset.csv",full.names =
>>>>> TRUE))
>>>>> seed <- new("AffyGenePDInfoPkgSeed",pgfFile = pgf, clfFile =
>>>>> clf,probeFile = prob, author = "Javier",email =  
>>>>> "email",biocViews =
>>>>> "AnnotationData",genomebuild = "NCBI Build 36",organism = "Human",
>>>>> species = "Homo Sapiens",url = "")
>>>>> makePdInfoPackage(seed, destDir = ".")
>>>>>
>>>>> And I installed the package:
>>>>> R CMD INSTALL pd.hugene.1.0.st.v1\
>>>>>
>>>>> The package was installed OK and I read and preprocessed the CEL  
>>>>> files
>>>>> using RMA, but the number of probesets are 253002!!!! So many
>>>>> probesets
>>>>> compared to the ones given by oneChannelGUI.
>>>>>
>>>>> Any comments for such big difference??
>>>>> Thanks,
>>>>> Javier
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>
>>
>>
>



More information about the Bioconductor mailing list