[BioC] Analyzing expression Affymetrix Hugene1.0.st array

James W. MacDonald jmacdon at uw.edu
Fri Sep 28 17:21:31 CEST 2012


Hi Juan,

On 9/28/2012 11:05 AM, Juan Fernández Tajes wrote:
> Dear James,
>
> Many thanks for your quick and easy understable question. I would like 
> to ask you if you could recommend me a method to determine which point 
> could be considered as level for distinguishing expression values from 
> noise?

I don't know if there is a method that has been developed that purports 
to do this. In the past I have seen people recommending things like 
using the negative controls as a lower bound (since they are supposedly 
not expressed).

I think this becomes a bit more difficult with the Gene ST arrays, as 
the negative controls have a nasty habit of looking not only expressed, 
but differentially expressed. A lot of these controls are supposed to 
target introns, which makes me wonder how much of the total RNA 
extracted from a cell is mRNA for which the introns have yet to be excised.

Anyway, to me it looks like a chicken-egg problem. You want to see what 
sort of expression values you get for genes that are almost surely not 
expressed, but given the data it is hard to decide which of the things 
that aren't supposed to be expressed are actually not expressed (or 
alternatively, which of the controls have absolutely no 
cross-hybridization with transcripts that are expressed).

So if you are still trying to come up with a method of saying if a gene 
is expressed or not, I don't think you can do that with microarray data 
unless you are willing to make a bunch of (likely unfounded) assumptions.

Best,

Jim


>
> Juan
>
> ---------------------------------------------------------------
> Juan Fernandez Tajes, ph. D
> Grupo XENOMAR
> Departamento de Biología Celular y Molecular
> Facultad de Ciencias-Universidade da Coruña
> Tlf. +34 981 167000 ext 2030
> e-mail: jfernandezt at udc.es
> ----------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
> *De: *"James W. MacDonald" <jmacdon at uw.edu>
> *Para: *"Juan Fernández Tajes" <jfernandezt at udc.es>
> *CC: *"bioconductor" <bioconductor at r-project.org>
> *Enviados: *Viernes, 28 de Septiembre 2012 16:48:10
> *Asunto: *Re: [BioC] Analyzing expression Affymetrix Hugene1.0.st array
>
> Hi Juan,
>
> On 9/28/2012 6:10 AM, Juan Fernández Tajes wrote:
> > Dear List,
> >
> > I´m working with expression data obtained from Affymetrix HuGene 
> 1.0 st array. I´m interested in knowing how many genes are expressed 
> in chromosome 16. Surprisingly, all the genes included (808) in the 
> array and mapped to chromosome have expression values (from 2.01 to 
> 12.4), can I conclude that all these genes are expressed in this tissue?
>
> Not really. Microarrays are not suitable for determining if a gene is
> being expressed or not. The only use IMO of microarray data is to
> determine if a gene is *differentially* expressed. This is what Benilton
> is getting at in his response to your question.
>
> The expression values we generate from a set of microarrays are very far
> removed from the actual amount of mRNA that existed in the samples we
> are measuring, and have undergone quite a bit of manipulation. In
> addition, there is quite a bit of technical noise introduced in each
> step of the process. So the best we can hope for is that the expression
> value for a given gene is proportional to the amount of mRNA that
> existed in the original sample, but not that we are quantifying the
> amount of mRNA.
>
> In addition, the expression values are based off of data from a 16 bit
> TIFF image. So the values have a maximum range from 2^0 - 2^16, or
> 1-65535 on the natural scale. Given that fact, do you really want to
> contend that a gene with an expression of 2^2.01 is being expressed?
> That expression level is likely not distinguishable from noise. So one
> more difficulty in deciding if a gene is expressed is deciding at which
> point you can distinguish signal from underlying noise.
>
> Best,
>
> Jim
>
>
> >
> > Many thanks in advance
> >
> > Here is my code:
> >
> >
> > geneCELs.N<- list.celfiles(getwd(), full.names=T)
> > affyGeneFS.N<- read.celfiles(geneCELs.N)
> > myAB.N<- affyGeneFS.N
> > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N))
> > sampleNames(myAB.N)<- sub("\\.CEL$", "", sampleNames(myAB.N))
> > metadata_array.N<- read.delim(file="metadata.txt", header=T, sep="\t")
> > rownames(metadata_array.N)<- metadata_array.N$Sample_ID
> > phenoData(myAB.N)<- new("AnnotatedDataFrame", data=metadata_array.N)
> > myAB.N_rma<- rma(myAB.N, target="core")
> > annotation(myAB.N_rma)<- "hugene10sttranscriptcluster.db"
> >
> > ppc<- function(x) paste("^", x, sep="")
> > myFindMap<- function(mapEnv, which){
> > myg<- ppc(which)
> > a1 = eapply(mapEnv, function(x)
> > grep(myg, x, value=T))
> > unlist(a1)
> > }
> > chr16.N<- myFindMap(hugene10sttranscriptclusterCHR, 16)
> > chr16.N<- as.data.frame(chr16.N)
> > chr16.N$probes<- rownames(chr16.N)
> > probes.chr16.N<- chr16.N$probes
> > sel.N<- match(probes.chr16.N, featureNames(myAB.N_rma), nomatch=0)
> > es2_chr16.N<- myAB.N_rma[sel.N,]
> > data.exprs.N<- as.data.frame(exprs(es2_chr16.N))
> > g.N<- featureNames(es2_chr16.N)
> > linked.N<- links(hugene10sttranscriptclusterSYMBOL)
> > data.exprs.N.symbol<- merge(data.exprs.N, linked.N, 
> by.x="row.names", by.y="probe_id")
> > row.names(data.exprs.N.symbol)<- data.exprs.N.symbol[[1]]
> > data.exprs.N.symbol<- data.exprs.N.symbol[, -1]
> > data.exprs.N.symbol$Mean.Exprs<- rowMeans(data.exprs.N.symbol[, 1:12])
> >
> >
> > Juan
> >
> >
> > ---------------------------------------------------------------
> > Juan Fernandez Tajes, ph. D
> > Grupo XENOMAR
> > Departamento de Biología Celular y Molecular
> > Facultad de Ciencias-Universidade da Coruña
> > Tlf. +34 981 167000 ext 2030
> > e-mail: jfernandezt at udc.es
> > ----------------------------------------------------------------
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> >
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list