[BioC] vennSelect error

James W. MacDonald jmacdon at med.umich.edu
Mon Jun 6 15:20:58 CEST 2011


Hi Mali,

On 6/5/2011 2:36 AM, mali salmon wrote:
> OK, I have noticed that when I formulate the model in a different way, I get
> the expected results, and the expected columns are printed.
> The model that works fine is:
>
>> TS<- paste(targets$Strain, targets$Treatment, sep=".")
>> TS<- factor(TS, levels=c("WT.U","WT.S","Mu.U","Mu.S"))
>> design<- model.matrix(~0+TS)
>> colnames(design)<- levels(TS)
>> fit<- lmFit(filtered, design)
>> cont.matrix<- makeContrasts(WT.SvsU=WT.S-WT.U, Mu.SvsU=Mu.S-Mu.U,
> Diff=(Mu.S-Mu.U)-(WT.S-WT.U),levels=design)
>> fit2<- contrasts.fit(fit, cont.matrix)
>> fit2<- eBayes(fit2)
>
> Then when I run "limma2annaffy" I get the correct columns.
>
> However, when I use a different approach (but equivalent according to the
> user guide)  below, the wrong columns are printed when I use
> "limma2annaffy":
>> Strain<- factor(targets$Strain, levels=c("WT","Mu"))
>> Treatment<- factor(targets$Treatment, levels=c("U","S"))
>> design<- model.matrix(~Strain*Treatment)
>> fit<- lmFit(filtered, design)
>> cont.matrix<- cbind(WT.SvsU=c(0,0,1,0),Mu.SvsU=c(0,0,1,1),Diff=c(0,0,0,1))
>> fit2<- contrasts.fit(fit, cont.matrix)
>> fit2<- eBayes(fit2)
>
> Any idea what I did wrong?

You didn't do anything wrong per se, if anything it is a shortcoming of 
limma2annaffy(). In order to make the code simpler, I require that the 
model be specified using a cell means model (as you did above where it 
worked for you), rather than a factor effects model (the 
parameterization you tried that didn't work).

Maybe this isn't stated as explicitly as it should be. I'll take a look 
at the documentation and see if I can make some changes.

Best,

Jim


> Thank
> Mali
>
>
> On Sun, Jun 5, 2011 at 8:54 AM, mali salmon<shalmom1 at gmail.com>  wrote:
>
>> Dear Jim
>> Thanks to your help I can generate HTML lists now, but the expression is
>> missing for some of the samples.
>> I also tried just to output lists of genes that are significant in each of
>> the contrasts, and out of 10 samples, the expression of only 2-5 is printed
>> (in both the txt and html files).
>> Below is the function I used
>>> limma2annaffy(filtered, fit2, design,cont.matrix, annotation(eset),adjust
>> = "fdr",anncols = aaf.handler(chip="yeast2.db")[-c(5,6)],
>> fldfilt=1,tstat=TRUE,pfilt = 0.05,text=TRUE,expression = TRUE)
>>
>> For "Diff" contrast genes only the expression of "Mu.S.1", "Mu.S.2" is
>> printed
>> For  "Mu.SvsU" and "WT.SvsU" contrasts  only the expression of  "WT.S.1"
>> "WT.S.2"  "WT.S.3"  "Mu.S.1"  "Mu.S.2" is printed
>>
>> Below are the targets, contrast matrix and design objects
>>
>>> target
>>         FileName Strain Treatment
>> 1     WT.U.1     WT         U
>> 2     WT.U.2     WT         U
>> 3     WT.U.3     WT         U
>> 4     WT.S.1     WT         S
>> 5     WT.S.2     WT         S
>> 6     WT.S.3     WT         S
>> 7     Mu.U.1     Mu         U
>> 8     Mu.U.2     Mu         U
>> 9     Mu.S.1     Mu         S
>> 10   Mu.S.2     Mu         S
>>
>>
>>> cont.matrix
>>       WT.SvsU Mu.SvsU Diff
>> [1,]       0       0    0
>> [2,]       0       0    0
>> [3,]       1       1    0
>> [4,]       0       1    1
>>
>>> design
>>     (Intercept) StrainMu TreatmentS StrainMu:TreatmentS
>> 1            1        0          0                   0
>> 2            1        0          0                   0
>> 3            1        0          0                   0
>> 4            1        0          1                   0
>> 5            1        0          1                   0
>> 6            1        0          1                   0
>> 7            1        1          0                   0
>> 8            1        1          0                   0
>> 9            1        1          1                   1
>> 10           1        1          1                   1
>>
>> Thanks
>> Mali
>>
>>
>>
>>
>>
>> On Fri, Jun 3, 2011 at 6:51 AM, mali salmon<shalmom1 at gmail.com>  wrote:
>>
>>> Thanks a lot Jim for your prompt support
>>> Mali
>>>
>>>
>>> On Thu, Jun 2, 2011 at 8:36 PM, James W. MacDonald<jmacdon at med.umich.edu
>>>> wrote:
>>>
>>>> Hi Mali,
>>>>
>>>>
>>>> On 6/2/2011 3:00 PM, mali salmon wrote:
>>>>
>>>>> Thanks Jim for your reply.
>>>>> I repeated the analysis without resetting the annotation slot but I
>>>>> still
>>>>> get the same error.
>>>>> Below is the code I used:
>>>>>
>>>>>   library(limma)
>>>>>> library(affy)
>>>>>> library(affycoretools)
>>>>>> data<- ReadAffy()
>>>>>> eset<- rma(data)
>>>>>> #remove low variance genes
>>>>>> library(genefilter)
>>>>>> eset2<-varFilter(eset,var.func=IQR,var.cutoff=0.1)
>>>>>> #remove control probes (AFFX)
>>>>>> filtered<-featureFilter(eset2, require.entrez=F, remove.dupEntrez=F)
>>>>>> targets<-readTargets("targets.txt")
>>>>>> TS<- paste(targets$Strain, targets$Treatment, sep=".")
>>>>>> Strain<- factor(targets$Strain, levels=c("WT","Mu"))
>>>>>> Treatment<- factor(targets$Treatment, levels=c("U","S"))
>>>>>> design<- model.matrix(~Strain*Treatment)
>>>>>> fit<- lmFit(filtered, design)
>>>>>> cont.matrix<-
>>>>>> cbind(WT.SvsU=c(0,0,1,0),Mu.SvsU=c(0,0,1,1),Diff=c(0,0,0,1))
>>>>>> fit2<- contrasts.fit(fit, cont.matrix)
>>>>>> fit2<- eBayes(fit2)
>>>>>> results<- decideTests(fit2,lfc=1)
>>>>>> vennSelect(filtered, design, results, cont.matrix, fit2)
>>>>>>
>>>>>
>>>> The problem is that you are by default trying to get annotation for
>>>>
>>>>> aaf.handler()[c(1:3,6:7, 9:12)]
>>>> [1] "Probe"         "Symbol"        "Description"   "GenBank"
>>>> [5] "Gene"          "UniGene"       "PubMed"        "Gene Ontology"
>>>> [9] "Pathway"
>>>>
>>>> But since this is a yeast chip, not all of these are available. For this
>>>> sort of issue I have included a '...' argument to vennSelect() that allows
>>>> you to pass arguments to lower level functions, so you can just do something
>>>> like
>>>>
>>>> library(annaffy) ## we need this to expose the aaf.handler() function
>>>> vennSelect(filtered, design, results, cont.matrix, fit2, anncols =
>>>> aaf.handler(chip = "yeast2.db"))
>>>>
>>>> Which will then give you all possible annotations for the yeast2.db chip,
>>>> which are
>>>>
>>>>> aaf.handler(chip = "yeast2.db")
>>>> [1] "Probe"               "Description"         "Chromosome"
>>>> [4] "Chromosome Location" "PubMed"              "Gene Ontology"
>>>> [7] "Pathway"
>>>>
>>>> Unfortunately, you will still get the error you see, which comes from the
>>>> call for Gene Ontology data, so if you amend to
>>>>
>>>> vennSelect(filtered, design, results, cont.matrix, fit2, anncols =
>>>> aaf.handler(chip = "yeast2.db")[-6])
>>>>
>>>> Then it will work. I will contact Colin Smith, the maintainer of annaffy,
>>>> separately to see if he can track down the error you see with GO terms.
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>> Loading required package: yeast2.db
>>>>> Loading required package: org.Sc.sgd.db
>>>>> Error in sqliteExecStatement(con, statement, bind.data) :
>>>>> RS-DBI driver: (error in statement: cannot join using column gene_id -
>>>>> column not present in both tables)
>>>>>
>>>>> Thanks
>>>>> Mali
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 2, 2011 at 2:45 PM, James W. MacDonald<
>>>>> jmacdon at med.umich.edu>wrote:
>>>>>
>>>>>   Hi Mali,
>>>>>>
>>>>>>
>>>>>> On 6/2/2011 7:40 AM, mali salmon wrote:
>>>>>>
>>>>>>   Dear list
>>>>>>> I am trying to use affycoretools for the analysis of affy yeast
>>>>>>> arrays.
>>>>>>> When I tried to use vennSelect in order generate HTML files containing
>>>>>>> the
>>>>>>> results of decideTests I got the following error:
>>>>>>>
>>>>>>>   annotation(eset)<-"yeast2.db"
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>   Good thing you didn't snip the above line from your code. In the
>>>>>> future, it
>>>>>> is better to give us more than what you think we need, rather than
>>>>>> less.
>>>>>>
>>>>>> The problem here is that you are resetting the annotation slot for your
>>>>>> ExpressionSet (unnecessarily, and incorrectly). Not sure why you are
>>>>>> doing
>>>>>> this; did you see some code somewhere that lead you to believe this is
>>>>>> either desired or necessary? If so, please tell us so we can hopefully
>>>>>> get
>>>>>> it corrected.
>>>>>>
>>>>>> Note that the annotation slot for an ExpressionSet doesn't imply
>>>>>> annotation
>>>>>> of (in this case) probeset IDs to genes, etc. The annotation slot
>>>>>> contains
>>>>>> name of the package that has the mappings of probes to probesets that
>>>>>> were
>>>>>> used to created the summarized values in that ExpressionSet. This is
>>>>>> nice to
>>>>>> have if you save the object, and wonder later how exactly the data were
>>>>>> summarized.
>>>>>>
>>>>>> So start back at the line preceding the line above where you reset the
>>>>>> annotation slot, and try again, without resetting the annotation slot.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>   ...
>>>>>>
>>>>>>>
>>>>>>>   fit2<- eBayes(fit2)
>>>>>>>> results<- decideTests(fit2,lfc=1)
>>>>>>>> vennSelect(filtered,design,results,cont.matrix,fit2)
>>>>>>>>
>>>>>>>>   Error in sqliteExecStatement(con, statement, bind.data) :
>>>>>>>    RS-DBI driver: (error in statement: cannot join using column gene_id
>>>>>>> -
>>>>>>> column not present in both tables)
>>>>>>>
>>>>>>> Any idea why I get this error?
>>>>>>> Thanks
>>>>>>> Mali
>>>>>>>
>>>>>>>   sessionInfo()
>>>>>>>
>>>>>>>>
>>>>>>>>   R version 2.12.1 (2010-12-16)
>>>>>>> Platform: i386-redhat-linux-gnu (32-bit)
>>>>>>>
>>>>>>> locale:
>>>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>>>
>>>>>>> attached base packages:
>>>>>>> [1] grid      stats     graphics  grDevices utils     datasets
>>>>>>>   methods
>>>>>>> [8] base
>>>>>>>
>>>>>>> other attached packages:
>>>>>>>   [1] hgu133a2.db_2.4.5    org.Hs.eg.db_2.4.6   annaffy_1.22.0
>>>>>>>   [4] affycoretools_1.22.0 KEGG.db_2.4.5        GO.db_2.4.5
>>>>>>>   [7] gplots_2.8.0         caTools_1.11         bitops_1.0-4.1
>>>>>>> [10] gdata_2.8.1          gtools_2.6.2         yeast2.db_2.4.5
>>>>>>> [13] org.Sc.sgd.db_2.4.6  RSQLite_0.9-4        DBI_0.2-5
>>>>>>> [16] annotate_1.28.1      AnnotationDbi_1.12.0 genefilter_1.32.0
>>>>>>> [19] yeast2cdf_2.7.0      affy_1.28.0          Biobase_2.10.0
>>>>>>> [22] limma_3.6.9
>>>>>>>
>>>>>>> loaded via a namespace (and not attached):
>>>>>>>   [1] affyio_1.18.0         biomaRt_2.6.0         Biostrings_2.18.2
>>>>>>>   [4] Category_2.16.1       gcrma_2.22.0          GOstats_2.16.0
>>>>>>>   [7] graph_1.28.0          GSEABase_1.12.2       IRanges_1.8.9
>>>>>>> [10] preprocessCore_1.12.0 RBGL_1.26.0           RCurl_1.5-0
>>>>>>> [13] splines_2.12.1        survival_2.36-2       tcltk_2.12.1
>>>>>>> [16] tools_2.12.1          XML_3.2-0             xtable_1.5-6
>>>>>>>
>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> James W. MacDonald, M.S.
>>>>>> Biostatistician
>>>>>> Douglas Lab
>>>>>> University of Michigan
>>>>>> Department of Human Genetics
>>>>>> 5912 Buhl
>>>>>> 1241 E. Catherine St.
>>>>>> Ann Arbor MI 48109-5618
>>>>>> 734-615-7826
>>>>>> **********************************************************
>>>>>> Electronic Mail is not secure, may not be read every day, and should
>>>>>> not be
>>>>>> used for urgent or sensitive issues
>>>>>>
>>>>>>
>>>>>
>>>> --
>>>> James W. MacDonald, M.S.
>>>> Biostatistician
>>>> Douglas Lab
>>>> University of Michigan
>>>> Department of Human Genetics
>>>> 5912 Buhl
>>>> 1241 E. Catherine St.
>>>> Ann Arbor MI 48109-5618
>>>> 734-615-7826
>>>> **********************************************************
>>>> Electronic Mail is not secure, may not be read every day, and should not
>>>> be used for urgent or sensitive issues
>>>>
>>>
>>>
>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list