[BioC] paCalls oligo package

James W. MacDonald jmacdon at uw.edu
Mon Sep 24 18:27:42 CEST 2012


Hi Juan,

On 9/24/2012 12:24 PM, Juan Fernández Tajes wrote:
> Hi James,
>
> Many thanks for your answer, for instance, I want to check if the 
> probeset intensities for a given transcript are significantly brighter 
> than background probesets, but I wasn´t able to explain it properly. I 
> have another question regarding your explanation:
>
> >Also note that it appears you have summarized your data at the exon
> >level, whereas you ran paCalls at the transcript level. This won't work,
> >so you either have to do rma() using target = "core", or paCalls() using
> >"DAGB" in order to be consistent
>
> I've obtained my object "data" running the following code:
>
> geneCELs <- list.celfiles("./CEL", full.names=T)
> affyGeneFS <- read.celfiles(geneCELs)
> data <- affyGeneFS
> sampleNames(data) <- sub("\\.CEL$", "", sampleNames(dat))
> metadata_array <- read.delim(file="metadata_array_oligo.txt", 
> header=T, sep="\t")
> rownames(metadata_array) <- metadata_array$Sample_ID
> phenoData(data) <- new("AnnotatedDataFrame", data=metadata_array)
>
> When you say that I should do rma() using target="core" or paCalls() 
> using "DAGB" in order to be consistent, should I do the following?
>
> dabgPS<- paCalls(data, "PSDABG")
> dat.rma <- rma(data, target="core")
> ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)
> data.filtered <- dat.rma[ind,]
>
> Am I right?

Yes, contingent upon your smallest group having five samples.

Best,

Jim


>
> Juan
>
>
> ---------------------------------------------------------------
> Juan Fernandez Tajes, ph. D
> Grupo XENOMAR
> Departamento de Biología Celular y Molecular
> Facultad de Ciencias-Universidade da Coruña
> Tlf. +34 981 167000 ext 2030
> e-mail: jfernandezt at udc.es
> ----------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
> *De: *"James W. MacDonald" <jmacdon at uw.edu>
> *Para: *"Juan Fernández Tajes" <jfernandezt at udc.es>
> *CC: *bioconductor at r-project.org
> *Enviados: *Lunes, 24 de Septiembre 2012 17:52:55
> *Asunto: *Re: [BioC] paCalls oligo package
>
> Hi Juan,
>
> On 9/24/2012 11:09 AM, Juan Fernández Tajes wrote:
> > Dear list,
> >
> > I would like to use the paCalls from oligo package for filtering 
> probe sets with absence of transcripts. My data are from Hugene 1.1 st 
> array (Affymetrix)
> > My data after reading CEL files is a GeneFeatureSet with 1178100 
> features and 23 samples
> >
> >> data
> > GeneFeatureSet (storageMode: lockedEnvironment)
> > assayData: 1178100 features, 23 samples
> > element names: exprs
> > protocolData
> > rowNames: 10SE191_2 10SE207 ... 10SE360 (23 total)
> > varLabels: exprs dates
> > varMetadata: labelDescription channel
> > phenoData
> > rowNames: 10SE191_2 10SE207 ... 10SE360 (23 total)
> > varLabels: Sample_ID INIBIC_ID ... Cluster2 (11 total)
> > varMetadata: labelDescription
> > featureData: none
> > experimentData: use 'experimentData(object)'
> > Annotation: pd.hugene.1.1.st.v1
> >
> > I called the paCalls function:
> >
> >> dabgPS<- paCalls(data, "PSDABG")
> > And I obtained a matrix of 257430x23, how can I used this 
> information to filter those probes without transcript?
> >
> > My aim is to obtain an average expression value in only those probes 
> with a "true" transcription.
>
> I would argue that you can't actually do this with microarray data. What
> you can do is say if the probeset intensities for a given transcript are
> significantly brighter than background probesets. I think that is a very
> different thing from saying a transcript isn't expressed, but opinions
> differ on that point.
>
> Please note that the matrix that paCalls() returns is made up of
> p-values testing the hypothesis that the given probeset is not different
> from background probesets (so a small p-value causes you to reject the
> null hypothesis, and conclude that they *are* different).
>
> Also note that it appears you have summarized your data at the exon
> level, whereas you ran paCalls at the transcript level. This won't work,
> so you either have to do rma() using target = "core", or paCalls() using
> "DAGB" in order to be consistent. Personally I wouldn't use rma(target =
> "probeset") for the Gene ST arrays, because tons of the probesets only
> have one probe at that summarization level.
>
> So the next question is what should you do with these data? You could
> for instance say that at least N of the probesets for a given gene have
> to have a p-value < 0.05, where N = the number of samples in the
> smallest group you are comparing. That way, if the gene is transcribed
> in at least one sample, you retain it (e.g., if a gene is transcribed in
> one sample and not in any other, this is still an interesting result).
>
> Something like
>
> N <- 5 ## or whatever
> ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)
> data.filtered <- data[ind,]
>
>
> Best,
>
> Jim
>
>
> >
> > Many thanks in advance
> >
> > Juan
> >
> > my sessionInfo is:
> >
> > R version 2.15.1 (2012-06-22)
> > Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> >
> > locale:
> > [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8
> >
> > attached base packages:
> > [1] stats graphics grDevices utils datasets methods base
> >
> > other attached packages:
> > [1] affy_1.34.0 pd.hugene.1.1.st.v1_3.6.0 genefilter_1.38.0
> > [4] limma_3.12.1 annotate_1.34.1 multtest_2.12.0
> > [7] oligo_1.20.4 oligoClasses_1.18.0 
> hugene11sttranscriptcluster.db_4.0.1
> > [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5
> > [13] AnnotationDbi_1.18.1 Biobase_2.16.0 BiocGenerics_0.2.0
> >
> > loaded via a namespace (and not attached):
> > [1] affxparser_1.28.1 affyio_1.24.0 BiocInstaller_1.4.7 
> Biostrings_2.24.1 bit_1.1-8
> > [6] codetools_0.2-8 ff_2.2-7 foreach_1.4.0 IRanges_1.14.4 
> iterators_1.0.6
> > [11] MASS_7.3-20 preprocessCore_1.18.0 splines_2.15.1 stats4_2.15.1 
> survival_2.36-14
> > [16] tools_2.15.1 XML_3.9-4 xtable_1.7-0 zlibbioc_1.2.0
> >
> >
> > ---------------------------------------------------------------
> > Juan Fernandez Tajes, ph. D
> > Grupo XENOMAR
> > Departamento de Biología Celular y Molecular
> > Facultad de Ciencias-Universidade da Coruña
> > Tlf. +34 981 167000 ext 2030
> > e-mail: jfernandezt at udc.es
> > ----------------------------------------------------------------
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list