[BioC] paCalls oligo package

James W. MacDonald jmacdon at uw.edu
Mon Sep 24 17:52:55 CEST 2012

Hi Juan,

On 9/24/2012 11:09 AM, Juan Fernández Tajes wrote:
> Dear list,
> I would like to use the paCalls from oligo package for filtering probe sets with absence of transcripts. My data are from Hugene 1.1 st array (Affymetrix)
> My data after reading CEL files is a GeneFeatureSet with 1178100 features and 23 samples
>> data
> GeneFeatureSet (storageMode: lockedEnvironment)
> assayData: 1178100 features, 23 samples
> element names: exprs
> protocolData
> rowNames: 10SE191_2 10SE207 ... 10SE360 (23 total)
> varLabels: exprs dates
> varMetadata: labelDescription channel
> phenoData
> rowNames: 10SE191_2 10SE207 ... 10SE360 (23 total)
> varLabels: Sample_ID INIBIC_ID ... Cluster2 (11 total)
> varMetadata: labelDescription
> featureData: none
> experimentData: use 'experimentData(object)'
> Annotation: pd.hugene.1.1.st.v1
> I called the paCalls function:
>> dabgPS<- paCalls(data, "PSDABG")
> And I obtained a matrix of 257430x23, how can I used this information to filter those probes without transcript?
> My aim is to obtain an average expression value in only those probes with a "true" transcription.

I would argue that you can't actually do this with microarray data. What 
you can do is say if the probeset intensities for a given transcript are 
significantly brighter than background probesets. I think that is a very 
different thing from saying a transcript isn't expressed, but opinions 
differ on that point.

Please note that the matrix that paCalls() returns is made up of 
p-values testing the hypothesis that the given probeset is not different 
from background probesets (so a small p-value causes you to reject the 
null hypothesis, and conclude that they *are* different).

Also note that it appears you have summarized your data at the exon 
level, whereas you ran paCalls at the transcript level. This won't work, 
so you either have to do rma() using target = "core", or paCalls() using 
"DAGB" in order to be consistent. Personally I wouldn't use rma(target = 
"probeset") for the Gene ST arrays, because tons of the probesets only 
have one probe at that summarization level.

So the next question is what should you do with these data? You could 
for instance say that at least N of the probesets for a given gene have 
to have a p-value < 0.05, where N = the number of samples in the 
smallest group you are comparing. That way, if the gene is transcribed 
in at least one sample, you retain it (e.g., if a gene is transcribed in 
one sample and not in any other, this is still an interesting result).

Something like

N <- 5 ## or whatever
ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)
data.filtered <- data[ind,]



> Many thanks in advance
> Juan
> my sessionInfo is:
> R version 2.15.1 (2012-06-22)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> locale:
> [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] affy_1.34.0 pd.hugene.1.1.st.v1_3.6.0 genefilter_1.38.0
> [4] limma_3.12.1 annotate_1.34.1 multtest_2.12.0
> [7] oligo_1.20.4 oligoClasses_1.18.0 hugene11sttranscriptcluster.db_4.0.1
> [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5
> [13] AnnotationDbi_1.18.1 Biobase_2.16.0 BiocGenerics_0.2.0
> loaded via a namespace (and not attached):
> [1] affxparser_1.28.1 affyio_1.24.0 BiocInstaller_1.4.7 Biostrings_2.24.1 bit_1.1-8
> [6] codetools_0.2-8 ff_2.2-7 foreach_1.4.0 IRanges_1.14.4 iterators_1.0.6
> [11] MASS_7.3-20 preprocessCore_1.18.0 splines_2.15.1 stats4_2.15.1 survival_2.36-14
> [16] tools_2.15.1 XML_3.9-4 xtable_1.7-0 zlibbioc_1.2.0
> ---------------------------------------------------------------
> Juan Fernandez Tajes, ph. D
> Departamento de Biología Celular y Molecular
> Facultad de Ciencias-Universidade da Coruña
> Tlf. +34 981 167000 ext 2030
> e-mail: jfernandezt at udc.es
> ----------------------------------------------------------------
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

More information about the Bioconductor mailing list