[BioC] What to do with multiple probes?

Wed Nov 30 11:59:04 CET 2005

Hi
Thanks Robert and Sean for your comments on my problem.

Robert Gentleman wrote:
> Hi,
>  Sean has already answered some of your questions, but I will provide a 
> few of my thoughts on this.
> 
>  1) there is little discussion because it is a reasonably difficult 
> topic and there is not clear cut answers, besides "it depends" and it 
> does depend on a lot of different things.
> 
>  For example, you might exclude the information on some probe sets if 
> they are far from the poly-A tail and dT-priming was used, if random 
> priming was used, then all should be equally good (but I am not aware of 
> a comprehensive comparison).
> 
>  In some cases, depending on the data, processing etc, you can develp 
> tools for comparing duplicate probe sets and combining the information 
> to get better estimates for whether genes are expressed and at what 
> levels (you could compare the probes and see if they are unique in the 
> genome, for example). In these situations, using R is a good thing, 
> since you can pretty much do any reasonable analysis, but you need to 
> know some statistics and some programming to do it, and there is no 
> clear recipe to follow.

I have complete bacterial genome custom Agilent microarrays.
In my case custom means that we design all probes ourself (not Agilent),
with assigning quality-scores to the probes, and have tried to avoid
"poor" probes.

The quality metrics for the probes should be equally good. It is random 
priming experiment. This is an bacterial platform, so no tissue 
specificity is possible, more over I may expect that there are not to 
much biological variation between biological replicates.

Taking into the account above  discussion about multiple probes, I guess 
I can only make decisions individually gene-by-gene.
For that I can imagine the design in my point 2.
My question is: what kind of script should be to rearrange MAlist and 
save it in text format?

> krasikov at science.uva.nl wrote:
> 
>> Dear all,
>>
>> 1.
>> I have a general question about the multiple probes for each gene.
>> This question has been discussed several times by BioC community,
>> but I didn't find any clear solution.
>>
>> My array platform is bacterial Custom Agilent oligo microarray.
>> It consists of 8000 unique probes for bit more than 3000 genes 
>> (complete bacterial genome) with 1, 2 or 3 probes per gene (mostly 
>> depending on the length of the gene: 1 for short and 3 for long ones).
>>
>> The generated list contains statistics for each probe.
>> What should I do to generate the gene list (which is normally needed 
>> for the biology related research)?
>> It's fine when the gene is decided to be regulated for all three probes
>> in the same direction, but what to do if not?
>> Should I exclude such genes from final list?
>> May anybody give me a clue how to deal with that?
>>
>> 2. This  is for a while my particular solution,
>> which is maybe far too strict.
>>
>> My list contain the info like this
>> (result of the write.fit):
>> (for three probes for the same gene)
>> A    M    p    Result    Probename
>> *    *    *    1    xxx1111_123
>> *    *    *    0    xxx1111_566
>> *    *    *    1    xxx1111_1050
>>
>> How to arrange it in elegant way:
>> A.mean    M.mean    New.Result xxx1111    M.1    M.2    M.3    p.1    
>> p.2    p.3 ?
>> where A.mean and M.mean are means of all probes for that gene
>> and a new Result is logical (something like all three 1 then 1,
>> all three -1 then -1, if at least one zero or opposite than 0)
>>
>> 3.
>> For my experiment (in a strictly controlled conditions, with 5 
>> biological replicates and some dye-swaps for them) from my
>> 8000 probes 3500 diceded to be regulated, which is almost half of 
>> complete set (big part of the decisions is biologically relevant,
>> which is nice).
> 
> 
>  Do you mean about 3500 are showing differential expression? This seems 
> very large, and you do realize that it violates most of the principles 
> that underly the usual normalization procedures? That may be more of a 
> problem for you than the duplicate probes. And fixing it, or convincing 
> yourself that the outputs of the normalization are ok, will take some 
> time and statistical expertise. In my experience these are way outside 
> of what can easily be dealt with on a mailing list - local expertise is 
> what is needed.

The general question:
How to validate the normalization outcome?
Density plots?
I have tried "loees with aquantile" and "vsn" and outcome of the 
decideTests is more or less the same - a lot of probes with differential 
expression.

Here below the code I used in limma:

RG <- read.maimages(...)
...assigning spotTypes
...removing controlspots from the RG
RGb <- backgroundCorrect(RG,method="minimum")
MA <- normalizeWithinArrays(RGb, method="loess")
MA <- normalizeBetweenArrays(MA, method="Aquantile")
...design
fit <- lmFit(MA, design)
...contrast.matrix
fit <- contrasts.fit(fit, contrast.matrix)
fit <- eBayes(fit)
res <- decideTests(fit, method = "separate", adjust.method="BH",
+ p.value=0.001)
write.fit(fit, results = res, file = "...", digits=2, adjust="BH", sep="\t")

In that condition I've got 1800 up and 1800 down probes (out from 8100)
Decreasing p.value to 0.0001 gave me 800 up and 800 down.

I would like to mention here, that quite a big part of obtained data
is physiologically relevant in my experiment,
and the nature of the experiment suggests big differential expression.

Any suggestions?

Best wishes
Vladimir

>  Best wishes,
>    Robert
> 
> 
>> Is not it to much? (I'm thinking about the statistical assumption that
>> most of genes should be not changed) However physiologically my 
>> experiment should produce rather big differential expression.
>>
>> I used direct ratio design, loess and than aquantile normalization,
>> with BH correction in decideTests and p-value cut-off 0.001.
>>
>> Thanks in advance for any help.
>> Vladimir
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>