[BioC] Issues about how to filter and annotate the MoGene-2_0-st and MoEx-1_0-st-v1 array probe sets
James W. MacDonald
jmacdon at uw.edu
Tue Jun 10 16:26:47 CEST 2014
On 6/8/2014 10:37 AM, 张超 wrote:
> Dear list,
> I would like to use the paCalls from oligo package for filtering probe sets with absence of transcripts. My data are from MoGene-2_0-st and MoEx-1_0-st-v1 array (Affymetrix). My data after reading CEL files is a GeneFeatureSet with 12 samples (6 for control groups, and 6 for experimental groups). What should I do with these data computed by paCalls(PSDABG) as below ?
>> OligoRawData<-read.celfiles(CEL file lists)
>> dagbPS <- paCalls(OligoRawData, "PSDABG")
> What to do next to filter the probe sets? Could you please send me a complete examples and a detailed explanation for it?
You need to decide what constitutes 'present' and how many samples have
to be present in order to keep the probeset.
So if I were to say that a p < 0.05 is present and I needed 20 such
samples, I could do
keep <- rowSums(dagbPS < 0.05) > 19
eset <- eset[keep,]
If the above code is mysterious to you, then you need to read 'An
Introduction to R'.
> In addition, moex10sttranscriptcluster.db can be used for annotation of data from MoEx-1_0-st-v1 array, and both of mogene20stprobeset.db and mogene20sttranscriptcluster.db can be used for that of data from MoGene-2_0-st (including both of gene and lncRNA lists). But only more than half of the probe sets are anotated with gene symbols by below commands.
>> results<-decideTests(fit2, method="global", adjust.method="fdr", p.value=0.05, lfc=0.5) #DEGs determination by t tests
>> genesymbol = getText(aafSymbol(rownames(results), "moex10sttranscriptcluster.db" ));#annotated by moex10sttranscriptcluster.db for data get from MoEx-1_0-st-v1 array
> Only 1217 and 24709 can be annotated by mogene20stprobeset.db and mogene20sttranscriptcluster.db seperately for data of MoGene-2_0-st (length(genesymbol[which(genesymbol!="")])). But the total num is 41345 (length(results)). Only 14966 can be mapped by moex10sttranscriptcluster.db for data of MoEx-1_0-st-v1 (total num is 23332 - length(results)). Should I need to add some more db for the annotation?
The annotation packages with 'transcriptcluster' in their names are for
instances where you have summarized probesets at the transcript level
(which is the default for rma() in oligo). If you want to summarize at
the probeset level (which I would not recommend doing, btw), you need to
use target = "probeset" in your call to rma().
In other words, you should only be using the transcriptcluster
annotation packages. Although please note that the
moex10transcriptcluster.db package is for the Mouse Exon 10 ST array,
not the Gene ST array.
There are any number of reasons that only a subset of probesets on the
array have symbols. First, there are lots of controls, which won't have
gene symbols. Second, the lincRNA/snoRNA/miRNA probesets that Affy put
on these array won't have gene symbols either (because, they aren't
genes). Third, there is still some speculative content on these arrays;
things that might end up being genes, with gene names, in the future,
but which are just hypothetical at this point in time. Fourth, the
annaffy package uses the old style methods of getting annotations, in
which case any probeset that matches more than one gene symbol will be
You will be much better served if you were to do something like
gns <- select(mogene10sttranscriptcluster.db, featureNames(eset),
Which will result in a warning that you have multiple mappings. You will
have to deal with those multiple mappings as you see fit. But after
doing so, you can then do
fit$genes <- gns
and your topTable object will then be populated with the annotations.
You might then consider using the ReportingTools package, which is under
active development and maintenance, rather than the annaffy package
which may still be actively maintained, but is no longer AFAICT under
> BTW, I am a beginner of this field. I found there are too few documents for examples about how to use functions of oligo package. Could you please also give me some suggestions? Looking forword to your reply. I really appreciate for your any helps.
> Thanks again.
> Best regards.
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor