[BioC] paCalls oligo package

Tue Sep 25 16:25:55 CEST 2012

Hi Juan,

On 9/25/2012 7:16 AM, Juan Fernández Tajes wrote:
> Hi Jim,
>
> Sorry for bothering you, but I need to clarify my code because I think 
> I´ve misunderstood something, here is the code that I´m using:
>
> > dat <- affyGeneFS
> > sampleNames(dat) <- sub("\\.CEL$", "", sampleNames(myAB))
> > metadata_array <- read.delim(file="metadata_array_oligo.txt", 
> header=T, sep="\t")
> > rownames(metadata_array) <- metadata_array$Sample_ID
> > phenoData(dat) <- new("AnnotatedDataFrame", data=metadata_array)

So I don't know what this ^^^^^^ business is all about. But it appears 
to me that you are trying to do something tricky when simple is the best 
recourse.

dat <- read.celfiles(filenames = list.celfiles()) ## if celfiles aren't 
in working dir, add path to list.celfiles.
eset <- rma(dat)
dagbPS <- paCalls(dat, "PSDAGB")
ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > 6)
eset.filtered <- eset[ind,]

Best,

Jim

>
> > dabgPS <- paCalls(dat, "PSDABG")
> > N <- 6
> > ind <- apply(dabgPS, 1, function(x) sum(x<0.05) > N)
> > dat.filtered <- dat[ind,]
> > dat.rma <- rma(dat.filtered, target="core")
>
> Is this right? or Should do I change "PSDABG" by "DABG", i.e:
>
> > dabgS <- paCalls(dat, "DABG")
> > N <- 6
> > ind <- apply(dabgS, 1, function(x) sum(x<0.05) > N)
> > dat.filtered <- dat[ind,]
> > dat.rma <- rma(dat.filtered, target="core")
>
> Many thanks in advance
>
> BW,
>
> Juan
>
>
> ---------------------------------------------------------------
> Juan Fernandez Tajes, ph. D
> Grupo XENOMAR
> Departamento de Biología Celular y Molecular
> Facultad de Ciencias-Universidade da Coruña
> Tlf. +34 981 167000 ext 2030
> e-mail: jfernandezt at udc.es
> ----------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
> *De: *"James W. MacDonald" <jmacdon at uw.edu>
> *Para: *"Juan Fernández Tajes" <jfernandezt at udc.es>
> *CC: *bioconductor at r-project.org
> *Enviados: *Lunes, 24 de Septiembre 2012 18:27:42
> *Asunto: *Re: [BioC] paCalls oligo package
>
> Hi Juan,
>
> On 9/24/2012 12:24 PM, Juan Fernández Tajes wrote:
> > Hi James,
> >
> > Many thanks for your answer, for instance, I want to check if the
> > probeset intensities for a given transcript are significantly brighter
> > than background probesets, but I wasn´t able to explain it properly. I
> > have another question regarding your explanation:
> >
> > >Also note that it appears you have summarized your data at the exon
> > >level, whereas you ran paCalls at the transcript level. This won't 
> work,
> > >so you either have to do rma() using target = "core", or paCalls() 
> using
> > >"DAGB" in order to be consistent
> >
> > I've obtained my object "data" running the following code:
> >
> > geneCELs <- list.celfiles("./CEL", full.names=T)
> > affyGeneFS <- read.celfiles(geneCELs)
> > data <- affyGeneFS
> > sampleNames(data) <- sub("\\.CEL$", "", sampleNames(dat))
> > metadata_array <- read.delim(file="metadata_array_oligo.txt",
> > header=T, sep="\t")
> > rownames(metadata_array) <- metadata_array$Sample_ID
> > phenoData(data) <- new("AnnotatedDataFrame", data=metadata_array)
> >
> > When you say that I should do rma() using target="core" or paCalls()
> > using "DAGB" in order to be consistent, should I do the following?
> >
> > dabgPS<- paCalls(data, "PSDABG")
> > dat.rma <- rma(data, target="core")
> > ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)
> > data.filtered <- dat.rma[ind,]
> >
> > Am I right?
>
> Yes, contingent upon your smallest group having five samples.
>
> Best,
>
> Jim
>
>
> >
> > Juan
> >
> >
> > ---------------------------------------------------------------
> > Juan Fernandez Tajes, ph. D
> > Grupo XENOMAR
> > Departamento de Biología Celular y Molecular
> > Facultad de Ciencias-Universidade da Coruña
> > Tlf. +34 981 167000 ext 2030
> > e-mail: jfernandezt at udc.es
> > ----------------------------------------------------------------
> >
> >
> > ------------------------------------------------------------------------
> > *De: *"James W. MacDonald" <jmacdon at uw.edu>
> > *Para: *"Juan Fernández Tajes" <jfernandezt at udc.es>
> > *CC: *bioconductor at r-project.org
> > *Enviados: *Lunes, 24 de Septiembre 2012 17:52:55
> > *Asunto: *Re: [BioC] paCalls oligo package
> >
> > Hi Juan,
> >
> > On 9/24/2012 11:09 AM, Juan Fernández Tajes wrote:
> > > Dear list,
> > >
> > > I would like to use the paCalls from oligo package for filtering
> > probe sets with absence of transcripts. My data are from Hugene 1.1 st
> > array (Affymetrix)
> > > My data after reading CEL files is a GeneFeatureSet with 1178100
> > features and 23 samples
> > >
> > >> data
> > > GeneFeatureSet (storageMode: lockedEnvironment)
> > > assayData: 1178100 features, 23 samples
> > > element names: exprs
> > > protocolData
> > > rowNames: 10SE191_2 10SE207 ... 10SE360 (23 total)
> > > varLabels: exprs dates
> > > varMetadata: labelDescription channel
> > > phenoData
> > > rowNames: 10SE191_2 10SE207 ... 10SE360 (23 total)
> > > varLabels: Sample_ID INIBIC_ID ... Cluster2 (11 total)
> > > varMetadata: labelDescription
> > > featureData: none
> > > experimentData: use 'experimentData(object)'
> > > Annotation: pd.hugene.1.1.st.v1
> > >
> > > I called the paCalls function:
> > >
> > >> dabgPS<- paCalls(data, "PSDABG")
> > > And I obtained a matrix of 257430x23, how can I used this
> > information to filter those probes without transcript?
> > >
> > > My aim is to obtain an average expression value in only those probes
> > with a "true" transcription.
> >
> > I would argue that you can't actually do this with microarray data. What
> > you can do is say if the probeset intensities for a given transcript are
> > significantly brighter than background probesets. I think that is a very
> > different thing from saying a transcript isn't expressed, but opinions
> > differ on that point.
> >
> > Please note that the matrix that paCalls() returns is made up of
> > p-values testing the hypothesis that the given probeset is not different
> > from background probesets (so a small p-value causes you to reject the
> > null hypothesis, and conclude that they *are* different).
> >
> > Also note that it appears you have summarized your data at the exon
> > level, whereas you ran paCalls at the transcript level. This won't work,
> > so you either have to do rma() using target = "core", or paCalls() using
> > "DAGB" in order to be consistent. Personally I wouldn't use rma(target =
> > "probeset") for the Gene ST arrays, because tons of the probesets only
> > have one probe at that summarization level.
> >
> > So the next question is what should you do with these data? You could
> > for instance say that at least N of the probesets for a given gene have
> > to have a p-value < 0.05, where N = the number of samples in the
> > smallest group you are comparing. That way, if the gene is transcribed
> > in at least one sample, you retain it (e.g., if a gene is transcribed in
> > one sample and not in any other, this is still an interesting result).
> >
> > Something like
> >
> > N <- 5 ## or whatever
> > ind <- apply(dagbPS, 1, function(x) sum(x < 0.05) > N)
> > data.filtered <- data[ind,]
> >
> >
> > Best,
> >
> > Jim
> >
> >
> > >
> > > Many thanks in advance
> > >
> > > Juan
> > >
> > > my sessionInfo is:
> > >
> > > R version 2.15.1 (2012-06-22)
> > > Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> > >
> > > locale:
> > > [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8
> > >
> > > attached base packages:
> > > [1] stats graphics grDevices utils datasets methods base
> > >
> > > other attached packages:
> > > [1] affy_1.34.0 pd.hugene.1.1.st.v1_3.6.0 genefilter_1.38.0
> > > [4] limma_3.12.1 annotate_1.34.1 multtest_2.12.0
> > > [7] oligo_1.20.4 oligoClasses_1.18.0
> > hugene11sttranscriptcluster.db_4.0.1
> > > [10] org.Hs.eg.db_2.7.1 RSQLite_0.11.1 DBI_0.2-5
> > > [13] AnnotationDbi_1.18.1 Biobase_2.16.0 BiocGenerics_0.2.0
> > >
> > > loaded via a namespace (and not attached):
> > > [1] affxparser_1.28.1 affyio_1.24.0 BiocInstaller_1.4.7
> > Biostrings_2.24.1 bit_1.1-8
> > > [6] codetools_0.2-8 ff_2.2-7 foreach_1.4.0 IRanges_1.14.4
> > iterators_1.0.6
> > > [11] MASS_7.3-20 preprocessCore_1.18.0 splines_2.15.1 stats4_2.15.1
> > survival_2.36-14
> > > [16] tools_2.15.1 XML_3.9-4 xtable_1.7-0 zlibbioc_1.2.0
> > >
> > >
> > > ---------------------------------------------------------------
> > > Juan Fernandez Tajes, ph. D
> > > Grupo XENOMAR
> > > Departamento de BiologÃa Celular y Molecular
> > > Facultad de Ciencias-Universidade da CoruÃ±a
> > > Tlf. +34 981 167000 ext 2030
> > > e-mail: jfernandezt at udc.es
> > > ----------------------------------------------------------------
> > >
> > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099