[BioC] Entrez Gene ID to Probe Set Name

Thu Oct 23 18:17:37 CEST 2008

Here is the previous query with a more descriptive subject.

-----Original Message-----
From: McGee, Monnie
Sent: Thu 10/23/2008 11:14 AM
To: bioconductor at stat.math.ethz.ch
Subject: RE: Bioconductor Digest, Vol 68, Issue 23

Dear List,

Is there an elegant way to obtain the name of a probe set from an Affymetrix platform (doesn't matter which one) corresponding to a given ENTREZ gene ID?  It seems that it is fairly simple to obtain the entrez ID if you have a probe set, but the reverse problem seems non-trival -at least it is to me.  

There's no particular reason I need to know.  I just want to know if it's possible.

Thanks!
Monnie

Monnie McGee, Ph.D.
Associate Professor
Department of Statistical Science
Southern Methodist University
Ph: 214-768-2462
Fax: 214-768-4035

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of bioconductor-request at stat.math.ethz.ch
Sent: Thu 10/23/2008 5:00 AM
To: bioconductor at stat.math.ethz.ch
Subject: Bioconductor Digest, Vol 68, Issue 23

Send Bioconductor mailing list submissions to
	bioconductor at stat.math.ethz.ch

To subscribe or unsubscribe via the World Wide Web, visit
	https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
	bioconductor-request at stat.math.ethz.ch

You can reach the person managing the list at
	bioconductor-owner at stat.math.ethz.ch

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."

Today's Topics:

   1. GOstat: listing genes from hyperGTest (Tim Smith)
   2. export toptables into Genespring (Pemmasani, Kalyani)
   3. Re: Limma contrasts question (James W. MacDonald)
   4. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
   5. Re: Limma contrasts question (Daniel Brewer)
   6. quality assessment and preprocessing for tiling array-based
      CGH data (Leon Yee)
   7. GOstats and org.EcK12.eg.db (Robert Castelo)
   8. Re: quality assessment and preprocessing for tiling
      array-based CGH data (Sean Davis)
   9. Re: GOstat: listing genes from hyperGTest (Tim Smith)
  10. Re: quality assessment and preprocessing for tiling
      array-based CGH data (Leon Yee)
  11. Re: Beadarray and illumina methylation arrays (Mark Dunning)
  12. Re: quality assessment and preprocessing for tiling
      array-based CGH data (Sean Davis)
  13. Problem using Rgraphviz (edge weights going missing). (Dan Bolser)
  14. Re: newbie problems with AnnBuilder (Mark Kimpel)
  15. Re: newbie problems with AnnBuilder (Sean Davis)
  16. Re: newbie problems with AnnBuilder (Mark Kimpel)
  17. Re: GOstats and org.EcK12.eg.db (Robert Gentleman)
  18. Re: quality assessment and preprocessing for tiling
      array-based CGH data (Leon Yee)
  19. Bioconductor installation problem: unable to access
      repository (Shinichiro Wachi)
  20. Re: quality assessment and preprocessing for tiling
      array-based CGH data (Sean Davis)
  21. Re: GOstat: listing genes from hyperGTest (James W. MacDonald)
  22. Re: Bioconductor installation problem: unable to	access
      repository (Patrick Aboyoun)
  23. Bioconductor 2.3 is released (Patrick Aboyoun)
  24. Re: How to save result from limma (Jenny Drnevich)
  25. scale questions (Hui-Yi Chu)
  26. Re: [Fwd: batch info for cellHTS] (Florian Hahne)
  27. problem with Category package and custom annotationDbi
      (Mark Kimpel)
  28. Re: problem with Category package and custom annotationDbi
      (Marc Carlson)
  29. Re: scale questions (Sean Davis)
  30. Re: scale questions (Sean Davis)
  31. Re: problem with Category package and custom annotationDbi
      (Mark Kimpel)
  32. Re: How to save result from limma (Gordon K Smyth)
  33. Package "xps" "import.expr.scheme" error (Wei,Caimiao)
  34. Re: Lumi and Beadstudio 1.5.13 (Leon Peshkin)
  35. Offre exceptionnelle suite au probl?me technique
      (Clara de Dessous Ch?ri)

----------------------------------------------------------------------

Message: 1
Date: Wed, 22 Oct 2008 03:43:33 -0700 (PDT)
From: Tim Smith <tim_smith_666 at yahoo.com>
Subject: [BioC] GOstat: listing genes from hyperGTest
To: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <257981.79114.qm at web58005.mail.re3.yahoo.com>
Content-Type: text/plain

Hi,

I
was performing a hyperGTest for genes in homo-sapiens. For a set of
input genes, this function returns some 'significant' GO terms. What I
wanted to now do was to co-relate each significant GO term (returned by
this function) with genes (from my set of input genes) associated with
that GO term. However, I think that I may be using the wrong
package/function to get the releveant set of genes.

Currently, what I'm doing is finding the significant GO terms by using the following code:

-----------------------
### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 

 paramsGO <- new("GOHyperGParams", geneIds = genes1,
          universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
          ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
          testDirection = "over")

GO <- hyperGTest(paramsGO)
--------------------------
This
gives me a set of significant GO terms. Now, I would like to find which
subset of genes in 'genes1' is associated with each of the significant
GO term. To do this I map all GO terms to their Entrez IDs using the
'org.Hs.eg.db' package using the following:

xx <- as.list(org.Hs.egGO2EG)

to
get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
this number small?) that map to at least one Entrez ID. So, from here I
look up which Entrez IDs are associated with my GO term of interest.

My
problem is that often, the GO term from hyperGTest is not associated
with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
above ), i.e. the GO term/ID is not in the list obtained from
'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
hyperGTest, but does not appear to be associated with any Entrez IDs in
the org.Hs.eg.db package. Where could I be going wrong?

I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]]

Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....

My sessionInfo() is:
--------------------------------
R version 2.7.2 (2008-08-25) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United
States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
 [1] grid      splines   tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1]
gplots_2.6.0         gmodels_2.14.1       gtools_2.4.0        
gdata_2.4.1          Rgraphviz_1.18.1     GOstats_2.6.0       
Category_2.6.0      
 [8] RBGL_1.16.0          annotate_1.18.0     
xtable_1.5-2         graph_1.18.0         PFAM.db_2.2.0       
GO.db_2.2.0          KEGG.db_2.2.0       
[15] org.Hs.eg.db_2.2.0   AnnotationDbi_1.2.0  RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0    survival_2.34-1      affy_1.18.0         
[22] preprocessCore_1.2.0 affyio_1.8.0         Biobase_2.0.0       

loaded via a namespace (and not attached):
[1] cluster_1.11.11 MASS_7.2-44    

---------------------------------

	[[alternative HTML version deleted]]

------------------------------

Message: 2
Date: Wed, 22 Oct 2008 12:34:38 +0100
From: "Pemmasani, Kalyani" <kalyani.pemmasani at nuigalway.ie>
Subject: [BioC] export toptables into Genespring
To: <bioconductor at stat.math.ethz.ch>
Message-ID:
	<6B017AD2AE2F6F489087FC986588136B88FA42 at EVS1.ac.nuigalway.ie>
Content-Type: text/plain;	charset="iso-8859-1"

Hi all,

Is there a way to export toptables from LIMMA into Genespring software program (from Agilent technologies) for clustering?

Best regards,
Kalyani
-------------------------------------------
Kalyani Pemmasani
Marie Curie research fellow
National Diagnostics Centre
National University of Ireland
Galway, IRELAND
e.mail: kalyani.pemmasani at nuigalway.ie
Ph.no: +353(0)91492815
Fax: +353 (0) 91586570 

------------------------------

Message: 3
Date: Wed, 22 Oct 2008 09:07:16 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] Limma contrasts question
To: Daniel Brewer <daniel.brewer at icr.ac.uk>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF2584.5010509 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Daniel Brewer wrote:

> Hi Jim,
> 
> Could you go into the maths of the contrast formulas a bit?  I would
> like to get a really solid understanding of what it is doing for future
> analyses.

Once you understand what the coefficients are, the contrasts are just 
simple algebra. In your case, all of the coefficients are estimating the 
difference between the sample and PC3M (e.g., Knockdown - PC3M).

So the algebra is something like this:

2(Knockdown - PC3M) - (Scramble - PC3M)
=
2Knockdown - 2PC3M - Scramble + PC3M
=
2Knockdown - Scramble - PC3M
=
Knockdown - (Scramble + PC3M)/2

Which is knockdown minus the mean of the controls.

Note that this will be the numerator of the resulting t-statistic. The 
denominator will be sort of an average of the variability within each of 
the three groups being compared. So the question being answered is 'What 
genes are different in Knockdown as compared to the average of the 
controls?'. However, there is nothing here to test if the two controls 
are similar at all (and you might not care).

So for instance, you might have a gene with average expression like this:

Knockdown = 10
PC3M = 4
Scramble = 7

If the intra-group variability is small for each sample type, then you 
will likely get a significant t-statistic even though the two controls 
are probably significantly different as well. Which is why I mentioned 
earlier that you might want to test the Scramble - PC3M contrast as well.

Best,

Jim

> 
> Many thanks
> 
> Dan
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662

------------------------------

Message: 4
Date: Wed, 22 Oct 2008 09:10:39 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] GOstat: listing genes from hyperGTest

Cc: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <48FF264F.50404 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Tim,

Does probeSetSummary() do what you want?

Best,

Jim

Tim Smith wrote:
>  
> Hi,
> 
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What I
> wanted to now do was to co-relate each significant GO term (returned by
> this function) with genes (from my set of input genes) associated with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
> 
> Currently, what I'm doing is finding the significant GO terms by using the following code:
> 
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 
> 
>  paramsGO <- new("GOHyperGParams", geneIds = genes1,
>           universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
>           ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
>           testDirection = "over")
> 
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find which
> subset of genes in 'genes1' is associated with each of the significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
> 
> xx <- as.list(org.Hs.egGO2EG)
> 
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from here I
> look up which Entrez IDs are associated with my GO term of interest.
> 
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
> hyperGTest, but does not appear to be associated with any Entrez IDs in
> the org.Hs.eg.db package. Where could I be going wrong?
> 
> I would give a set of genes so that the example is reproducible, but [[elided Yahoo spam]]
> 
> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
> 
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
>  [1] grid      splines   tools     stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
>  [1]
> gplots_2.6.0         gmodels_2.14.1       gtools_2.4.0        
> gdata_2.4.1          Rgraphviz_1.18.1     GOstats_2.6.0       
> Category_2.6.0      
>  [8] RBGL_1.16.0          annotate_1.18.0     
> xtable_1.5-2         graph_1.18.0         PFAM.db_2.2.0       
> GO.db_2.2.0          KEGG.db_2.2.0       
> [15] org.Hs.eg.db_2.2.0   AnnotationDbi_1.2.0  RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0    survival_2.34-1      affy_1.18.0         
> [22] preprocessCore_1.2.0 affyio_1.8.0         Biobase_2.0.0       
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44    
> 
> 
> ---------------------------------
> 
> 
>       
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662

------------------------------

Message: 5
Date: Wed, 22 Oct 2008 14:50:06 +0100
From: Daniel Brewer <daniel.brewer at icr.ac.uk>
Subject: Re: [BioC] Limma contrasts question
To: "James W. MacDonald" <jmacdon at med.umich.edu>,
	bioconductor at stat.math.ethz.ch
Message-ID: <48FF2F8E.4020208 at icr.ac.uk>
Content-Type: text/plain; charset=ISO-8859-1

James W. MacDonald wrote:
> Daniel Brewer wrote:
> 
>> Hi Jim,
>>
>> Could you go into the maths of the contrast formulas a bit?  I would
>> like to get a really solid understanding of what it is doing for future
>> analyses.
> 
> Once you understand what the coefficients are, the contrasts are just
> simple algebra. In your case, all of the coefficients are estimating the
> difference between the sample and PC3M (e.g., Knockdown - PC3M).
> 
> So the algebra is something like this:
> 
> 2(Knockdown - PC3M) - (Scramble - PC3M)
> =
> 2Knockdown - 2PC3M - Scramble + PC3M
> =
> 2Knockdown - Scramble - PC3M
> =
> Knockdown - (Scramble + PC3M)/2
> 
> Which is knockdown minus the mean of the controls.
> 
> Note that this will be the numerator of the resulting t-statistic. The
> denominator will be sort of an average of the variability within each of
> the three groups being compared. So the question being answered is 'What
> genes are different in Knockdown as compared to the average of the
> controls?'. However, there is nothing here to test if the two controls
> are similar at all (and you might not care).
> 
> So for instance, you might have a gene with average expression like this:
> 
> Knockdown = 10
> PC3M = 4
> Scramble = 7
> 
> If the intra-group variability is small for each sample type, then you
> will likely get a significant t-statistic even though the two controls
> are probably significantly different as well. Which is why I mentioned
> earlier that you might want to test the Scramble - PC3M contrast as well.
> 
> Best,
> 
> Jim
> 
> 
>>
>> Many thanks
>>
>> Dan
>>
> 

Thanks for that, that is brilliant and has clarified everything.  I did
a comparison of the controls and there were no significant genes found
which is encouraging.

Many thanks

Dan

-- 
**************************************************************
Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}

------------------------------

Message: 6
Date: Wed, 22 Oct 2008 21:51:24 +0800
From: Leon Yee <yee.leon at gmail.com>
Subject: [BioC] quality assessment and preprocessing for tiling
	array-based	CGH data
To: bioconductor at stat.math.ethz.ch
Message-ID: <48FF2FDC.6020203 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear all,

     Is there any well-established routine for quality assessment and 
preprocessing of array CGH data, especially tiling array-based CGH data? 
I found most of the quality assessment of array data are about 
expression array, while few are related to array CGH data.
     We are using agilent 244k CGH array of rat, and now we have the 
text files produced by Feature Extraction, don't know whether they are 
of good quality. Could anyone help provide some clues? Thanks in advance!

     After read.maimage(), we got the RGlist object, which contain 
several components including R, G, Rb, Gb, and so on.  The probes are of 
3 types: -1, 1 and 0. 0 means normal probe; -1 mean negative control, i 
guess, and the probe names are like (-)3xSLv1, NC1_00000002, etc[no 
corresponding probe sequence]; 1 means positive control, i guess, and 
the probe names are like DarkCorner, DCP_008001.0, RnCGHBrightCorner, 
SRN_800002, etc[no corresponding probe sequence].  The number of -1 is 
1275, while the number of 1 is 4217, each of which has its R, Rb, G, Gb 
values. Do we need these values for quality assessment and 
normalization? How?
     In addition, in the normal probes, we have 1000 probes repeating 3 
times in the array. How could we use these data for quality assessment 
and normalization?

Regards,
Leon

------------------------------

Message: 7
Date: Wed, 22 Oct 2008 15:48:22 +0200
From: Robert Castelo <robert.castelo at upf.edu>
Subject: [BioC] GOstats and org.EcK12.eg.db
To: bioconductor at stat.math.ethz.ch
Message-ID: <1224683302.5889.35.camel at llull.imim.es>
Content-Type: text/plain

dear list,

I cannot get to work GOstats with the annotation for E. coli in
org.EcK12.eg.db. Please find below the code that reproduces the problem
including the error message, and my sessionInfo() at the end of this
email. I have included the same exercise with the human annotation
package org.Hs.eg.db which runs fine in my system. Any help with this
will be very much appreciated.

thanks!!
robert.
==========CODE STARTS HERE===========

library(org.Hs.eg.db)
library(org.EcK12.eg.db)
library(GOstats)

geneuniverse <- mappedkeys(org.EcK12.egSYMBOL)
set.seed(12345)
geneset <- sample(geneuniverse, size=100, replace=FALSE)

goHypGparams <- new("GOHyperGParams",
                    geneIds=geneset,
                    universeGeneIds=geneuniverse,
                    annotation="org.EcK12.eg.db", ontology="BP",
                    pvalueCutoff=1.0, conditional=TRUE,
                    testDirection="over")

goHypGcond <- hyperGTest(goHypGparams)

Error in get(mapName, envir = pkgEnv, inherits = FALSE) : 
  variable "org.EcK12.egENTREZID" was not found
Error in mget(probes, ID2EntrezID(datPkg)) : 
  error in evaluating the argument 'envir' in selecting a method for
function 'mget'

geneuniverse <- mappedkeys(org.Hs.egSYMBOL)
set.seed(12345)
geneset <- sample(geneuniverse, size=100, replace=FALSE)

goHypGparams <- new("GOHyperGParams",
                    geneIds=geneset,
                    universeGeneIds=geneuniverse,
                    annotation="org.Hs.eg.db", ontology="BP",
                    pvalueCutoff=1.0, conditional=TRUE,
                    testDirection="over")

goHypGcond <- hyperGTest(goHypGparams)

sessionInfo()
R version 2.8.0 beta (2008-10-05 r46601) 
x86_64-unknown-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils
datasets 
[8] methods   base     

other attached packages:
 [1] org.Hs.eg.db_2.2.6    GOstats_2.7.0         Category_2.7.6       
 [4] genefilter_1.21.5     survival_2.34-1       RBGL_1.17.2          
 [7] annotate_1.19.2       xtable_1.5-4          GO.db_2.2.5          
[10] graph_1.19.6          org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12 
[13] RSQLite_0.7-0         DBI_0.2-4             Biobase_2.1.7        

loaded via a namespace (and not attached):
[1] cluster_1.11.11 GSEABase_1.3.6  XML_1.98-1

------------------------------

Message: 8
Date: Wed, 22 Oct 2008 10:02:35 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
	array-based CGH data
To: "Leon Yee" <yee.leon at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
	<264855a00810220702m3ee17381y221a8e5ec18ee6ff at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
> Dear all,
>
>    Is there any well-established routine for quality assessment and
> preprocessing of array CGH data, especially tiling array-based CGH data? I
> found most of the quality assessment of array data are about expression
> array, while few are related to array CGH data.
>    We are using agilent 244k CGH array of rat, and now we have the text
> files produced by Feature Extraction, don't know whether they are of good
[[elided Yahoo spam]]
>
>    After read.maimage(), we got the RGlist object, which contain several
> components including R, G, Rb, Gb, and so on.  The probes are of 3 types:
> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe
> sequence]; 1 means positive control, i guess, and the probe names are like
> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
> corresponding probe sequence].  The number of -1 is 1275, while the number
> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
> values for quality assessment and normalization? How?
>    In addition, in the normal probes, we have 1000 probes repeating 3 times
> in the array. How could we use these data for quality assessment and
> normalization?

You generally will not want to do any normalization besides a possible
shift of the center.  Any linear normalization that affects the slope
of the M vs. A plot or nonlinear normalization will likely decrease
signal.  As for quality control, a good, general measure to track is
the dlrs, a robust measure of the standard deviation.

dlrs <-
  function(x) {
    nx <- length(x)
    if (nx<3) {
      stop("Vector length>2 needed for computation")
    }
    tmp <- embed(x,2)
    diffs <- tmp[,2]-tmp[,1]
    dlrs <- IQR(diffs)/(sqrt(2)*1.34)
    return(dlrs)
  }

For agilent arrays, most of the dlrs should be around or under 0.2,
generally.  However, this might vary a bit based on lab-to-lab
variation.  In any case, if there is a significant outlier, that is
suspect.  The input to the above function is the log ratios for a
single array arranged in chromosome and position order.

Sean

------------------------------

Message: 9
Date: Wed, 22 Oct 2008 07:24:50 -0700 (PDT)

Subject: Re: [BioC] GOstat: listing genes from hyperGTest
To: "James W. MacDonald" <jmacdon at med.umich.edu>
Cc: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <418040.17653.qm at web58004.mail.re3.yahoo.com>
Content-Type: text/plain

Thanks James. If I can tweak that function, I'll get exactly what I want. 

I tried what you suggested and got the following error:

---------------------------
 ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 

  paramsGO <- new("GOHyperGParams", geneIds = genes1,
           universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
           ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
           testDirection = "over")

 GO <- hyperGTest(paramsGO)
 ps <- probeSetSummary(GO)

Error in get(mapName, envir = pkgEnv, inherits = FALSE) : 
  variable "org.Hs.egENTREZID" was not found
--------------------------------

I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input.

Or am I doing something wrong?

thanks!

----- Original Message ----
From: James W. MacDonald <jmacdon at med.umich.edu>

Cc: bioc <bioconductor at stat.math.ethz.ch>
Sent: Wednesday, October 22, 2008 9:10:39 AM
Subject: Re: [BioC] GOstat: listing genes from hyperGTest

Hi Tim,

Does probeSetSummary() do what you want?

Best,

Jim

Tim Smith wrote:
>  
> Hi,
> 
> I
> was performing a hyperGTest for genes in homo-sapiens. For a set of
> input genes, this function returns some 'significant' GO terms. What I
> wanted to now do was to co-relate each significant GO term (returned by
> this function) with genes (from my set of input genes) associated with
> that GO term. However, I think that I may be using the wrong
> package/function to get the releveant set of genes.
> 
> Currently, what I'm doing is finding the significant GO terms by using the following code:
> 
> -----------------------
> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 
> 
>  paramsGO <- new("GOHyperGParams", geneIds = genes1,
>           universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
>           ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
>           testDirection = "over")
> 
> GO <- hyperGTest(paramsGO)
> --------------------------
> This
> gives me a set of significant GO terms. Now, I would like to find which
> subset of genes in 'genes1' is associated with each of the significant
> GO term. To do this I map all GO terms to their Entrez IDs using the
> 'org.Hs.eg.db' package using the following:
> 
> xx <- as.list(org.Hs.egGO2EG)
> 
> to
> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
> this number small?) that map to at least one Entrez ID. So, from here I
> look up which Entrez IDs are associated with my GO term of interest.
> 
> My
> problem is that often, the GO term from hyperGTest is not associated
> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
> above ), i.e. the GO term/ID is not in the list obtained from
> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
> hyperGTest, but does not appear to be associated with any Entrez IDs in
> the org.Hs.eg.db package. Where could I be going wrong?
> 
[[elided Yahoo spam]]
> 
> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
> 
> My sessionInfo() is:
> --------------------------------
> R version 2.7.2 (2008-08-25) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United
> States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
>  [1] grid      splines   tools     stats     graphics  grDevices utils     datasets  methods   base    
> 
> other attached packages:
>  [1]
> gplots_2.6.0         gmodels_2.14.1       gtools_2.4.0        
> gdata_2.4.1          Rgraphviz_1.18.1     GOstats_2.6.0      
> Category_2.6.0      
>  [8] RBGL_1.16.0          annotate_1.18.0    
> xtable_1.5-2         graph_1.18.0         PFAM.db_2.2.0      
> GO.db_2.2.0          KEGG.db_2.2.0      
> [15] org.Hs.eg.db_2.2.0   AnnotationDbi_1.2.0  RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0    survival_2.34-1      affy_1.18.0        
> [22] preprocessCore_1.2.0 affyio_1.8.0         Biobase_2.0.0      
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 MASS_7.2-44    
> 
> 
> ---------------------------------
> 
> 
>      
>     [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662

	[[alternative HTML version deleted]]

------------------------------

Message: 10
Date: Wed, 22 Oct 2008 22:32:34 +0800
From: Leon Yee <yee.leon at gmail.com>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
	array-based CGH data
To: Sean Davis <sdavis2 at mail.nih.gov>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF3982.9020005 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

Sean Davis wrote:
> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
>> Dear all,
>>
>>    Is there any well-established routine for quality assessment and
>> preprocessing of array CGH data, especially tiling array-based CGH data? I
>> found most of the quality assessment of array data are about expression
>> array, while few are related to array CGH data.
>>    We are using agilent 244k CGH array of rat, and now we have the text
>> files produced by Feature Extraction, don't know whether they are of good
[[elided Yahoo spam]]
>>
>>    After read.maimage(), we got the RGlist object, which contain several
>> components including R, G, Rb, Gb, and so on.  The probes are of 3 types:
>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding probe
>> sequence]; 1 means positive control, i guess, and the probe names are like
>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>> corresponding probe sequence].  The number of -1 is 1275, while the number
>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
>> values for quality assessment and normalization? How?
>>    In addition, in the normal probes, we have 1000 probes repeating 3 times
>> in the array. How could we use these data for quality assessment and
>> normalization?
> 
> You generally will not want to do any normalization besides a possible
> shift of the center.  Any linear normalization that affects the slope
> of the M vs. A plot or nonlinear normalization will likely decrease
> signal.  As for quality control, a good, general measure to track is
> the dlrs, a robust measure of the standard deviation.
> 
> 
> dlrs <-
>   function(x) {
>     nx <- length(x)
>     if (nx<3) {
>       stop("Vector length>2 needed for computation")
>     }
>     tmp <- embed(x,2)
>     diffs <- tmp[,2]-tmp[,1]
>     dlrs <- IQR(diffs)/(sqrt(2)*1.34)
>     return(dlrs)
>   }
> 
> For agilent arrays, most of the dlrs should be around or under 0.2,
> generally.  However, this might vary a bit based on lab-to-lab
> variation.  In any case, if there is a significant outlier, that is
> suspect.  The input to the above function is the log ratios for a
> single array arranged in chromosome and position order.
> 
> Sean
> 

Hi, Sean

    Thanks for your advice. However, I have still several questions:

    1. The input of dlrs is the log ratios, the log ration extracted 
from the text file produced by Feature Extraction? or calculated from 
RGlist --> MAlist ?  I have searched the mailist and seen a post of you 
mentioned the difference of log ration from Feature Extraction and the 
default M value from read.maimages.

    2. I can get the log ratios of all features including control type 
of -1 and 1, but these features don't have chromosome positions, does 
this mean I don't need all of them for quality assessment?

    3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not 
get a proper mapping on the chromosome, so I should remove these values 
from the input of dlrs. Is it so?

    4. How could I handle those 1000 probes repeating 3 times?  They 
will be mapped on the same chromosome position by three per group.

Regards,
Leon

------------------------------

Message: 11
Date: Wed, 22 Oct 2008 15:42:11 +0100
From: "Mark Dunning" <md392 at cam.ac.uk>
Subject: Re: [BioC] Beadarray and illumina methylation arrays
To: "'Katrina bell'" <katrina.bell at mcri.edu.au>,
	<bioconductor at stat.math.ethz.ch>
Message-ID: <000c01c93454$5dc9e720$195db560$@ac.uk>
Content-Type: text/plain;	charset="us-ascii"

Hi Katrina,

I only have limited experience with methylation data, but hopefully I might
be able to give you a few pointers.

-The error with readIllumina is quite hard to diagnose without seeing the
example. I haven't seen any data from this type of Methylation array. What
do the first few lines of the bead-level text files (.csv in your case) look
like? It could be that the x and y coordinates are in a slightly different
format to that we have seen before.

-Yes, 25% of outliers does seem a little high I'm afraid. Have you also
looked at whereabouts the outliers are located on the arrays or made some
imageplots? We have just developed a new function for automatic artefact
detection called BASH that will be available in the forthcoming Bioconductor
release. I could be interesting to run that on your data as Illumina do
sometimes miss some beads in obvious artefacts and remove too many beads on
the rest of the array. BASH should be a good compromise.

-Yes, currently the only way of reading methylation data into beadarray is
by using the bead-level data.

-I'm not very familiar with the output of BeadStudio. Do you get separate
detection values for the green and red channels? If so, then I don't think
it would be problem if a bead-type was detected in one channel but not the
other (since they are measure of either methylated or unmethylated
respectively). Bead types that are not detected in either channel could be a
problem though.

-I haven't really seen many guidelines for normalization and this is
something I would like to look into. There is an obvious dye-bias that needs
to be corrected and the background normalisation used by Illumina might
actually help in this regard (although I wouldn't usually recommend it for
other Illumina data). Quantile normalisation has been used for other types
of two-colour Illumina data (http://www.biomedcentral.com/1471-2105/9/409,
http://genome.cshlp.org/cgi/content/abstract/17/3/368) so it could work
here. 

Hope this helps,

Mark

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Katrina bell
Sent: 21 October 2008 03:04
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Beadarray and illumina methylation arrays

Hello,

This is the first time I have used beadarray . I am using it  for the 
analysis of an illumina 27 methylation array and I am having a few issues I 
hope that you could help me with.

1.      The first time I tried to load the methylation data, I didn't write 
in singleChannel=FALSE. It happily read in my 12 arrays with no problems 
what so ever. I tried plotting a few things which worked fine. Seeing my 
mistake, I then went back to reload my data with the red channel 
(singleChannel=FALSE) and got the following error.

 > BLData = readIllumina(arrayNames = targets$ChipBarcode, textType=".csv", 
targets=targets, backgroundMethod="none", singleChannel=FALSE)
Found 12 arrays
Reading pixels of ./4408100017_A_Grn.tif
Calculating background
Sharpening Image
Calculating foregound
Background correcting: method = none
Reading pixels of ./4408100017_A_Red.tif
Calculating background

*** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
1: .C("readBeadImage", as.character(tifFiles2[i]), as.double(RedX[ord]), 
as.double(RedY[ord]), as.integer(numBeads), foreGround = double(length = 
numBeads), backGround = double(length = numBeads), 
as.integer(backgroundSize), as.integer(manip), as.integer(fground), PACKAGE 
= "beadarray")
2: readIllumina(arrayNames = targets$ChipBarcode, textType = ".csv", 
targets = targets, backgroundMethod = "none", singleChannel = FALSE)

session info Below.

So I ended up loading in the data with images=FALSE, which worked, but I 
would like to be able to look at the background. Is there a way around this 
issue?

2. When I plotted the outliers (bar chart) I got an astounding 25% for the 
majority of my 12 samples, both in the red and green channel (unlogged 
data). In addition, 3 of the samples had a peak of intensity at 5 in the 
green channel, leading me to believe that I have some real quality control 
issues with my samples. Any opinions/suggestions on these results would be 
most welcome.

3. Is it correct that readBeadSummaryData, is not set up for two colour 
arrays such as the methylation arrays? So the only way to look at 
methylation data is through reading in BLData?

4. Some of my samples seem to have a large number of targets which have a p 
value detection rate above 0.05 (beadstudio output). Illumina have 
indicated that they disregard these. If I can not read in the bead summary 
data from bead studio, I am assuming that these detection p values can not 
be taken into account in the analysis? Or are there other methods that 
remove/down grade these less than optimal probes (most removed as
outliers?).

5. Has any one had any experience with normalisation of the methylation 
arrays? I know that many of the usual array methods are out of the question 
due to the assumption that most probes on the array will not be 
differentially expressed is invalid. I read in a bioconductor list someone 
suggesting quantile normalisation? I would really appreciate any feeback 
from people who have tried this or other methods, especially if they have 
verified their methylation results.

Thanks for any help/advice you may be able to give.

Cheers
Katrina

 > sessionInfo() below
R version 2.7.0 (2008-04-22)
x86_64-redhat-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8
;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADD
RESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] beadarray_1.8.0 affy_1.18.2 preprocessCore_1.2.1
[4] affyio_1.8.0 geneplotter_1.18.0 annotate_1.18.0
[7] xtable_1.5-2 AnnotationDbi_1.2.2 RSQLite_0.6-9
[10] DBI_0.2-4 lattice_0.17-6 Biobase_2.0.1
[13] limma_2.14.5

loaded via a namespace (and not attached):
[1] grid_2.7.0 KernSmooth_2.22-22 RColorBrewer_1.0-2

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

------------------------------

Message: 12
Date: Wed, 22 Oct 2008 10:46:24 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
	array-based CGH data
To: "Leon Yee" <yee.leon at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
	<264855a00810220746w6d5ecb11ica5542bf46042f20 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 22, 2008 at 10:32 AM, Leon Yee <yee.leon at gmail.com> wrote:
> Sean Davis wrote:
>>
>> On Wed, Oct 22, 2008 at 9:51 AM, Leon Yee <yee.leon at gmail.com> wrote:
>>>
>>> Dear all,
>>>
>>>   Is there any well-established routine for quality assessment and
>>> preprocessing of array CGH data, especially tiling array-based CGH data?
>>> I
>>> found most of the quality assessment of array data are about expression
>>> array, while few are related to array CGH data.
>>>   We are using agilent 244k CGH array of rat, and now we have the text
>>> files produced by Feature Extraction, don't know whether they are of good
[[elided Yahoo spam]]
>>>
>>>   After read.maimage(), we got the RGlist object, which contain several
>>> components including R, G, Rb, Gb, and so on.  The probes are of 3 types:
>>> -1, 1 and 0. 0 means normal probe; -1 mean negative control, i guess, and
>>> the probe names are like (-)3xSLv1, NC1_00000002, etc[no corresponding
>>> probe
>>> sequence]; 1 means positive control, i guess, and the probe names are
>>> like
>>> DarkCorner, DCP_008001.0, RnCGHBrightCorner, SRN_800002, etc[no
>>> corresponding probe sequence].  The number of -1 is 1275, while the
>>> number
>>> of 1 is 4217, each of which has its R, Rb, G, Gb values. Do we need these
>>> values for quality assessment and normalization? How?
>>>   In addition, in the normal probes, we have 1000 probes repeating 3
>>> times
>>> in the array. How could we use these data for quality assessment and
>>> normalization?
>>
>> You generally will not want to do any normalization besides a possible
>> shift of the center.  Any linear normalization that affects the slope
>> of the M vs. A plot or nonlinear normalization will likely decrease
>> signal.  As for quality control, a good, general measure to track is
>> the dlrs, a robust measure of the standard deviation.
>>
>>
>> dlrs <-
>>  function(x) {
>>    nx <- length(x)
>>    if (nx<3) {
>>      stop("Vector length>2 needed for computation")
>>    }
>>    tmp <- embed(x,2)
>>    diffs <- tmp[,2]-tmp[,1]
>>    dlrs <- IQR(diffs)/(sqrt(2)*1.34)
>>    return(dlrs)
>>  }
>>
>> For agilent arrays, most of the dlrs should be around or under 0.2,
>> generally.  However, this might vary a bit based on lab-to-lab
>> variation.  In any case, if there is a significant outlier, that is
>> suspect.  The input to the above function is the log ratios for a
>> single array arranged in chromosome and position order.
>>
>> Sean
>>
>
> Hi, Sean
>
>   Thanks for your advice. However, I have still several questions:
>
>   1. The input of dlrs is the log ratios, the log ration extracted from the
> text file produced by Feature Extraction? or calculated from RGlist -->
> MAlist ?  I have searched the mailist and seen a post of you mentioned the
> difference of log ration from Feature Extraction and the default M value
> from read.maimages.

You can read the Agilent FE manual for more details, but you can
probably use either and come to very similar conclusions.  If you use
the MAlist version, make sure to use only median centering or none for
normalization.

>   2. I can get the log ratios of all features including control type of -1
> and 1, but these features don't have chromosome positions, does this mean I
> don't need all of them for quality assessment?

We have not routinely used these probes, no.  If an array fails
miserably, then these control probes might be useful for determining
the reason for the failure, though.

>   3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a
> proper mapping on the chromosome, so I should remove these values from the
> input of dlrs. Is it so?

You can either remove them or treat chr2_random as a separate chromosome.

>   4. How could I handle those 1000 probes repeating 3 times?  They will be
> mapped on the same chromosome position by three per group.

You could choose one at random or use a mean or median of them.  My
guess is that they agree very closely with one another so the choice
should not affect the results much.

Sean

------------------------------

Message: 13
Date: Wed, 22 Oct 2008 15:54:25 +0100
From: "Dan Bolser" <dan.bolser at gmail.com>
Subject: [BioC] Problem using Rgraphviz (edge weights going missing).
To: bioconductor at stat.math.ethz.ch
Message-ID:
	<2c8757af0810220754r3a16507l29dd77f43a5151d7 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi all,

After successfully installing Rgraphviz on this Centos 5.2 (RHEL 4)
box (basically with yum install *graphviz*** ..., then using the
standard BioC install procedure) and then navigating the packaging
error (that actually caused R to crash) described here:

http://lists.rpmforge.net/pipermail/users/2007-August/000958.html

(Just manually run dot -c as root)

I finally got to produce some nice graphs of my data. However, I found
that when I tried to write my graphs to file in dot format, the edge
weights go missing.

I load my graph like this:

library(Rgraphviz)

myNodes <-
  paste(nodes$MAPC)

myG <- new("graphNEL", nodes=myNodes)

for(i in 1:nrow(dat)){
  paste(dat[i,"A"], dat[i,"B"])
  myG <-
    addEdge(paste(dat[i,"A"]),
            paste(dat[i,"B"]), myG, dat[i,"N"])
}

plot(myG, "neato")

and all well and good (except that I couldn't get the edge weights to
translate into line thickness - more on that below).

Then I (foolishly?) tried:

agwrite(agopen(myG,"test"), filename="test")

This produced a file, but the weight attribute of the edges (in dot
format?) were all clearly set to 1 (and not the range of values that I
have stored in dat$N). I confirmed that my loop was putting the
weights somewhere with the following one-liner:

lapply(myG at edgeData@data, function(l){l$weight})

but I don't really know what this means. So I think its a bug. The
graph should be written to file with the correct weight attributes.
AFAICT.

Poking around in the data structure I was confused by the apparent
number of edges...

ew <-
  unlist(edgeWeights(myG))
length(ew)

Why is this not equal to the number of edges that I put in (and the
number reported by just printing the graph object)?

I tried to set the width of the edges plotted on the graph by using
the penwidth attribute, but it had no effect, and was not written to
file as an attribute of the graph.

## Try defining some EDGE attributes
eAttrs <- list()

## Get the edge weights
ew <-
  unlist(edgeWeights(myG))
length(ew)

## Some need removing for some reason somehow...
ew <-
  ew[setdiff(seq(along = ew), removedEdges(myG))]
length(ew)

## Label the attributes vector (so it can be used properly)
names(ew) <-
  edgeNames(myG)

eAttrs$penwidth <-
  ew

plot(myG, "neato", edgeAttrs=eAttrs)

toFile(agopen(myG, name="test"), filename="test", edgeAttrs=eAttrs)

So I read in the vignette,

"                                   A list of all available attributes
is accessible
online at: http://www.graphviz.org/pub/scm/graphviz2/doc/info/attrs.
html. Note that there are some di?erences between default values and also
some attributes will not have an e?ect in Rgraphviz. Please see the man page
for graphvizAttributes for more details."

Where is the "man page for graphvizAttributes for more details" ??

It would be good to know what attributes Rgraphviz is ignoring and if
penwidth is one of them in particular.

Also the vignette didn't tell me who to contact in case of feedback,
bugs, questions, etc. Who else should I be bugging with this email?

Thanks for reading!

Dan.

--

R2spec --bioc -u
http://www.bioconductor.org/packages/release/bioc/src/contrib/Rgraphviz_1.18.1.tar.gz

------------------------------

Message: 14
Date: Wed, 22 Oct 2008 11:07:02 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Marc Carlson" <mcarlson at fhcrc.org>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
	<6b93d1830810220807l788e52f8wa4bafef9726892d0 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Build with AnnotationDbi was uneventful, but I have been unable to
install the package or use it as is.

If the package is just placed in my site-library, I get:
   'ragene10stv1.db' is not a valid package -- installed < 2.0.0?

If I tar the package and try R CMD INSTALL, I get:
   cannot extract package from 'ragene10stv1.db.tar.gz'

What approach should I be using?

Mark

------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219  Home
Skype:  mkimpel

"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************

On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>
> Hi Mark,
>
> AnnBuilder is on its way out.  Please have a look at the SQLforge.pdf
> vignette which can be found here:
>
> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>
> If you have further questions after reading this, then please just ask,
> and we should be able to help you out.
>
>
>  Marc
>
>
>
> Mark Kimpel wrote:
> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST
> > array data supplied by Affy. Using the code chunk below, I am able to get
> > AnnBuilder to create the annotation object, but it crashes, I believe, when
> > it tries to save it. I should also mention that I had a previous crash when
> > I had a madecdfenv package directory in place that used the same name. I got
> > a "file lock" error, so I temporarily renamed the directory to see if this
> > fixed the problem. As below, the error changed, but I still can't get the
> > script to work.
> >
> > I suspect that there is a fundamental misunderstaning on my part related to
> > how the annotation package should relate to the cdf package or some naming
> > convention related to one or both.
> >
> > Mark
> >
> >
> >> require(AnnBuilder); require(makecdfenv)
> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-entrez_gene.csv"
> >> myBaseType <- "ll"
> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
> >>
> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
> >>
> > +           myBaseType, pkgName =
> > substring(cleancdfname("RaGene-1_0-st-v1"),
> > +                         1, (nchar(cleancdfname("RaGene-1_0-st-v1")) - 3)),
> > +           pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus
> > norvegicus", version = "1.0",
> > +           author = list(authors = "Mark W Kimpel", maintainer =
> > +                           "mkimpel at iupui.edu"), fromWeb = TRUE)
> > Read 1 item
> > Read 1 item
> > [1] "4450 2 2"
> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) :
> >   cannot open file
> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or
> > directory
> >
> > Enter a frame number, or 0 to exit
> >
> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
> > myBaseTy
> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
> > envhook)
> > 5: function (e)
> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
> >
> > Selection: 1
> > Browse[1]> annotation[1:2,]
> >      ENTREZID    PROBE      ACCNUM GO
> > [1,] "100008565" "10766774" NA     NA
> > [2,] "100034253" "10937540" NA     "GO:0005525 at IEA;GO:0005622 at IEA"
> >      PMID                SYMBOL
> > [1,] "16804107;17292978" "Slc39a4l"
> > [2,] "8889548"           "Gnl3l"
> >      GENENAME                                                     CHR
> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA
> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X"
> >      MAP        REFSEQ                      UNIGENE     CHRLOC        PATH
> > [1,] NA         "NM_001101021;NP_001094491" "Rn.10120"  "NA"          NA
> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA
> >      ENZYME PFAM                  PROSITE
> > [1,] NA     "NA at IPI00817057"      "NA at IPI00817057"
> > [2,] NA     "PF01926 at IPI00362844" "NA at IPI00362844"
> > ------------------------------------------------------------
> > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> > Indiana University School of Medicine
> >
> > 15032 Hunter Court, Westfield, IN  46074
> >
> > (317) 490-5129 Work, & Mobile & VoiceMail
> > (317) 399-1219  Home
> > Skype:  mkimpel
> >
> > "The real problem is not whether machines think but whether men do." -- B.
> > F. Skinner
> > ******************************************************************
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
>

------------------------------

Message: 15
Date: Wed, 22 Oct 2008 11:53:59 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Mark Kimpel" <mwkimpel at gmail.com>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
	<264855a00810220853m5afad74apfb3df17499f90f49 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at gmail.com> wrote:
> Build with AnnotationDbi was uneventful, but I have been unable to
> install the package or use it as is.
>
> If the package is just placed in my site-library, I get:
>   'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
>
> If I tar the package and try R CMD INSTALL, I get:
>   cannot extract package from 'ragene10stv1.db.tar.gz'
>
> What approach should I be using?

You can just

R CMD INSTALL ragene10stv1.db

where ragene10stv1.db is the directory that contains the package
(right above the DESCRIPTION file).

Sean

> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>>
>> Hi Mark,
>>
>> AnnBuilder is on its way out.  Please have a look at the SQLforge.pdf
>> vignette which can be found here:
>>
>> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>
>> If you have further questions after reading this, then please just ask,
>> and we should be able to help you out.
>>
>>
>>  Marc
>>
>>
>>
>> Mark Kimpel wrote:
>> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST
>> > array data supplied by Affy. Using the code chunk below, I am able to get
>> > AnnBuilder to create the annotation object, but it crashes, I believe, when
>> > it tries to save it. I should also mention that I had a previous crash when
>> > I had a madecdfenv package directory in place that used the same name. I got
>> > a "file lock" error, so I temporarily renamed the directory to see if this
>> > fixed the problem. As below, the error changed, but I still can't get the
>> > script to work.
>> >
>> > I suspect that there is a fundamental misunderstaning on my part related to
>> > how the annotation package should relate to the cdf package or some naming
>> > convention related to one or both.
>> >
>> > Mark
>> >
>> >
>> >> require(AnnBuilder); require(makecdfenv)
>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-entrez_gene.csv"
>> >> myBaseType <- "ll"
>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
>> >>
>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>> >>
>> > +           myBaseType, pkgName =
>> > substring(cleancdfname("RaGene-1_0-st-v1"),
>> > +                         1, (nchar(cleancdfname("RaGene-1_0-st-v1")) - 3)),
>> > +           pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus
>> > norvegicus", version = "1.0",
>> > +           author = list(authors = "Mark W Kimpel", maintainer =
>> > +                           "mkimpel at iupui.edu"), fromWeb = TRUE)
>> > Read 1 item
>> > Read 1 item
>> > [1] "4450 2 2"
>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) :
>> >   cannot open file
>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or
>> > directory
>> >
>> > Enter a frame number, or 0 to exit
>> >
>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>> > myBaseTy
>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
>> > envhook)
>> > 5: function (e)
>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
>> >
>> > Selection: 1
>> > Browse[1]> annotation[1:2,]
>> >      ENTREZID    PROBE      ACCNUM GO
>> > [1,] "100008565" "10766774" NA     NA
>> > [2,] "100034253" "10937540" NA     "GO:0005525 at IEA;GO:0005622 at IEA"
>> >      PMID                SYMBOL
>> > [1,] "16804107;17292978" "Slc39a4l"
>> > [2,] "8889548"           "Gnl3l"
>> >      GENENAME                                                     CHR
>> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA
>> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X"
>> >      MAP        REFSEQ                      UNIGENE     CHRLOC        PATH
>> > [1,] NA         "NM_001101021;NP_001094491" "Rn.10120"  "NA"          NA
>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA
>> >      ENZYME PFAM                  PROSITE
>> > [1,] NA     "NA at IPI00817057"      "NA at IPI00817057"
>> > [2,] NA     "PF01926 at IPI00362844" "NA at IPI00362844"
>> > ------------------------------------------------------------
>> > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> > Indiana University School of Medicine
>> >
>> > 15032 Hunter Court, Westfield, IN  46074
>> >
>> > (317) 490-5129 Work, & Mobile & VoiceMail
>> > (317) 399-1219  Home
>> > Skype:  mkimpel
>> >
>> > "The real problem is not whether machines think but whether men do." -- B.
>> > F. Skinner
>> > ******************************************************************
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

------------------------------

Message: 16
Date: Wed, 22 Oct 2008 12:05:25 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: Re: [BioC] newbie problems with AnnBuilder
To: "Sean Davis" <sdavis2 at mail.nih.gov>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
	<6b93d1830810220905m46a4e41kfe052fd768d458e5 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Interesting, it works from a command line outside of R, but when I had
tried R> system('R CMD INSTALL ragene10stv1.db') I received an error
message. I have used this approach successfully with other packages to
avoid leaving R and starting a shell, but with this package I get:
'* Installing to library '/home/mkimpel/R_HOME/site-library-2.8.0'
ERROR: no packages specified'

Well, thanks, it now is installed. Any comment on why my system()
approach might not have worked?

Mark
------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219  Home
Skype:  mkimpel

"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************

On Wed, Oct 22, 2008 at 11:53 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Wed, Oct 22, 2008 at 11:07 AM, Mark Kimpel <mwkimpel at gmail.com> wrote:
>> Build with AnnotationDbi was uneventful, but I have been unable to
>> install the package or use it as is.
>>
>> If the package is just placed in my site-library, I get:
>>   'ragene10stv1.db' is not a valid package -- installed < 2.0.0?
>>
>> If I tar the package and try R CMD INSTALL, I get:
>>   cannot extract package from 'ragene10stv1.db.tar.gz'
>>
>> What approach should I be using?
>
> You can just
>
> R CMD INSTALL ragene10stv1.db
>
> where ragene10stv1.db is the directory that contains the package
> (right above the DESCRIPTION file).
>
> Sean
>
>
>> On Tue, Oct 21, 2008 at 6:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>>>
>>> Hi Mark,
>>>
>>> AnnBuilder is on its way out.  Please have a look at the SQLforge.pdf
>>> vignette which can be found here:
>>>
>>> http://www.bioconductor.org/packages/2.3/bioc/html/AnnotationDbi.html
>>>
>>> If you have further questions after reading this, then please just ask,
>>> and we should be able to help you out.
>>>
>>>
>>>  Marc
>>>
>>>
>>>
>>> Mark Kimpel wrote:
>>> > I'm having problems getting AnnBuilder to work with the Affy Rat Gene ST
>>> > array data supplied by Affy. Using the code chunk below, I am able to get
>>> > AnnBuilder to create the annotation object, but it crashes, I believe, when
>>> > it tries to save it. I should also mention that I had a previous crash when
>>> > I had a madecdfenv package directory in place that used the same name. I got
>>> > a "file lock" error, so I temporarily renamed the directory to see if this
>>> > fixed the problem. As below, the error changed, but I still can't get the
>>> > script to work.
>>> >
>>> > I suspect that there is a fundamental misunderstaning on my part related to
>>> > how the annotation package should relate to the cdf package or some naming
>>> > convention related to one or both.
>>> >
>>> > Mark
>>> >
>>> >
>>> >> require(AnnBuilder); require(makecdfenv)
>>> >> myBase <- "RaGene-1_0-st-v1.na26.rn4.transcript.probe-entrez_gene.csv"
>>> >> myBaseType <- "ll"
>>> >> mySrcUrls <- getSrcUrl("all", "Rattus norvegicus")
>>> >>
>>> >> ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>>> >>
>>> > +           myBaseType, pkgName =
>>> > substring(cleancdfname("RaGene-1_0-st-v1"),
>>> > +                         1, (nchar(cleancdfname("RaGene-1_0-st-v1")) - 3)),
>>> > +           pkgPath = '~/R_HOME/site-library-2.8.0', organism = "Rattus
>>> > norvegicus", version = "1.0",
>>> > +           author = list(authors = "Mark W Kimpel", maintainer =
>>> > +                           "mkimpel at iupui.edu"), fromWeb = TRUE)
>>> > Read 1 item
>>> > Read 1 item
>>> > [1] "4450 2 2"
>>> > Error in lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook) :
>>> >   cannot open file
>>> > '~/R_HOME/site-library-2.8.0/ragene10stv1/data/Rdata.rdb': No such file or
>>> > directory
>>> >
>>> > Enter a frame number, or 0 to exit
>>> >
>>> > 1: ABPkgBuilder(baseName = myBase, srcUrls = mySrcUrls, baseMapType =
>>> > myBaseTy
>>> > 2: makeLLDB(file.path(pkgPath, pkgName), compress = TRUE)
>>> > 3: tools:::makeLazyLoadDB(dataEnv, dbbase, compress = compress)
>>> > 4: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
>>> > envhook)
>>> > 5: function (e)
>>> > 6: lazyLoadDBinsertValue(data, datafile, ascii, compress, envhook)
>>> >
>>> > Selection: 1
>>> > Browse[1]> annotation[1:2,]
>>> >      ENTREZID    PROBE      ACCNUM GO
>>> > [1,] "100008565" "10766774" NA     NA
>>> > [2,] "100034253" "10937540" NA     "GO:0005525 at IEA;GO:0005622 at IEA"
>>> >      PMID                SYMBOL
>>> > [1,] "16804107;17292978" "Slc39a4l"
>>> > [2,] "8889548"           "Gnl3l"
>>> >      GENENAME                                                     CHR
>>> > [1,] "solute carrier family 39 (zinc transporter), member 4-like" NA
>>> > [2,] "guanine nucleotide binding protein-like 3 (nucleolar)-like" "X"
>>> >      MAP        REFSEQ                      UNIGENE     CHRLOC        PATH
>>> > [1,] NA         "NM_001101021;NP_001094491" "Rn.10120"  "NA"          NA
>>> > [2,] "Xq14-q21" "NM_001081958;NP_001075427" "Rn.164274" "-40301218 at X" NA
>>> >      ENZYME PFAM                  PROSITE
>>> > [1,] NA     "NA at IPI00817057"      "NA at IPI00817057"
>>> > [2,] NA     "PF01926 at IPI00362844" "NA at IPI00362844"
>>> > ------------------------------------------------------------
>>> > Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>>> > Indiana University School of Medicine
>>> >
>>> > 15032 Hunter Court, Westfield, IN  46074
>>> >
>>> > (317) 490-5129 Work, & Mobile & VoiceMail
>>> > (317) 399-1219  Home
>>> > Skype:  mkimpel
>>> >
>>> > "The real problem is not whether machines think but whether men do." -- B.
>>> > F. Skinner
>>> > ******************************************************************
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioconductor mailing list
>>> > Bioconductor at stat.math.ethz.ch
>>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >
>>> >
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

------------------------------

Message: 17
Date: Wed, 22 Oct 2008 09:49:20 -0700
From: Robert Gentleman <rgentlem at fhcrc.org>
Subject: Re: [BioC] GOstats and org.EcK12.eg.db
To: Robert Castelo <robert.castelo at upf.edu>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF5990.20501 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Robert,
   Yet another one that I will need to write some specialized code for. 
  I should get it done by the end of the week, and will push it to both 
release and devel.  I will post an email when it is done,

  best wishes
    Robert

Robert Castelo wrote:
> dear list,
> 
> I cannot get to work GOstats with the annotation for E. coli in
> org.EcK12.eg.db. Please find below the code that reproduces the problem
> including the error message, and my sessionInfo() at the end of this
> email. I have included the same exercise with the human annotation
> package org.Hs.eg.db which runs fine in my system. Any help with this
> will be very much appreciated.
> 
> thanks!!
> robert.
> ==========CODE STARTS HERE===========
> 
> library(org.Hs.eg.db)
> library(org.EcK12.eg.db)
> library(GOstats)
> 
> geneuniverse <- mappedkeys(org.EcK12.egSYMBOL)
> set.seed(12345)
> geneset <- sample(geneuniverse, size=100, replace=FALSE)
> 
> 
> goHypGparams <- new("GOHyperGParams",
>                     geneIds=geneset,
>                     universeGeneIds=geneuniverse,
>                     annotation="org.EcK12.eg.db", ontology="BP",
>                     pvalueCutoff=1.0, conditional=TRUE,
>                     testDirection="over")
> 
> goHypGcond <- hyperGTest(goHypGparams)
> 
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) : 
>   variable "org.EcK12.egENTREZID" was not found
> Error in mget(probes, ID2EntrezID(datPkg)) : 
>   error in evaluating the argument 'envir' in selecting a method for
> function 'mget'
> 
> geneuniverse <- mappedkeys(org.Hs.egSYMBOL)
> set.seed(12345)
> geneset <- sample(geneuniverse, size=100, replace=FALSE)
> 
> 
> goHypGparams <- new("GOHyperGParams",
>                     geneIds=geneset,
>                     universeGeneIds=geneuniverse,
>                     annotation="org.Hs.eg.db", ontology="BP",
>                     pvalueCutoff=1.0, conditional=TRUE,
>                     testDirection="over")
> 
> goHypGcond <- hyperGTest(goHypGparams)
> 
> 
> sessionInfo()
> R version 2.8.0 beta (2008-10-05 r46601) 
> x86_64-unknown-linux-gnu 
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils
> datasets 
> [8] methods   base     
> 
> other attached packages:
>  [1] org.Hs.eg.db_2.2.6    GOstats_2.7.0         Category_2.7.6       
>  [4] genefilter_1.21.5     survival_2.34-1       RBGL_1.17.2          
>  [7] annotate_1.19.2       xtable_1.5-4          GO.db_2.2.5          
> [10] graph_1.19.6          org.EcK12.eg.db_2.2.6 AnnotationDbi_1.3.12 
> [13] RSQLite_0.7-0         DBI_0.2-4             Biobase_2.1.7        
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 GSEABase_1.3.6  XML_1.98-1
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org

------------------------------

Message: 18
Date: Thu, 23 Oct 2008 01:14:09 +0800
From: Leon Yee <yee.leon at gmail.com>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
	array-based CGH data
To: Sean Davis <sdavis2 at mail.nih.gov>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF5F61.5060906 at gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

Sean Davis wrote:
>> Hi, Sean
>>
>>   Thanks for your advice. However, I have still several questions:
>>
>>   1. The input of dlrs is the log ratios, the log ration extracted from the
>> text file produced by Feature Extraction? or calculated from RGlist -->
>> MAlist ?  I have searched the mailist and seen a post of you mentioned the
>> difference of log ration from Feature Extraction and the default M value
>> from read.maimages.
> 
> You can read the Agilent FE manual for more details, but you can
> probably use either and come to very similar conclusions.  If you use
> the MAlist version, make sure to use only median centering or none for
> normalization.
> 
>>   2. I can get the log ratios of all features including control type of -1
>> and 1, but these features don't have chromosome positions, does this mean I
>> don't need all of them for quality assessment?
> 
> We have not routinely used these probes, no.  If an array fails
> miserably, then these control probes might be useful for determining
> the reason for the failure, though.
> 
>>   3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get a
>> proper mapping on the chromosome, so I should remove these values from the
>> input of dlrs. Is it so?
> 
> You can either remove them or treat chr2_random as a separate chromosome.
> 
>>   4. How could I handle those 1000 probes repeating 3 times?  They will be
>> mapped on the same chromosome position by three per group.
> 
> You could choose one at random or use a mean or median of them.  My
> guess is that they agree very closely with one another so the choice
> should not affect the results much.

Hi, Sean

     Thank you very much for your detailed reply and help.

     Where can I get the references or official documentations about 
dlrs method?

     In addition, we have design our array with dye-swap [test-cy3 vs 
ref-cy5, and test-cy5 vs ref-cy3]. Is there any method for utilizing the 
information here for quality assessment?

Best wishes!
Leon

------------------------------

Message: 19
Date: Wed, 22 Oct 2008 17:13:10 +0000 (UTC)
From: Shinichiro Wachi <swachi at ucdavis.edu>
Subject: [BioC] Bioconductor installation problem: unable to access
	repository
To: bioconductor at stat.math.ethz.ch
Message-ID: <loom.20081022T170334-485 at post.gmane.org>
Content-Type: text/plain; charset=us-ascii

I am installing Bioconductor on a new machine (Intel Mac running OSX 10.5.5), R
version is 2.8.0.

sudo R is used for this session.

These are the error messages I get:

source("http://bioconductor.org/biocLite.R")
biocinstall()

Running biocinstall version 2.3.8 with R version 2.8.0 (under development)
Your version of R requires version 2.3 of Bioconductor.
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8
Warning: unable to access index for repository
http://cran.fhcrc.org/bin/macosx//contrib/2.8
Warning message:
package 'Biobase' is not available 

http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES
will produce a web page displaying packages. (I read a post that asked for this.
I am assuming this was a quick server diagnostics).

Is there a problem with the server, or is this a temporary glitch in the
installer?  Is there a workaround?

Much thanks.

Shin

------------------------------

Message: 20
Date: Wed, 22 Oct 2008 13:29:58 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] quality assessment and preprocessing for tiling
	array-based CGH data
To: "Leon Yee" <yee.leon at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
	<264855a00810221029i18a08bc3nb1a34d786a6b30a6 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 22, 2008 at 1:14 PM, Leon Yee <yee.leon at gmail.com> wrote:
> Sean Davis wrote:
>>>
>>> Hi, Sean
>>>
>>>  Thanks for your advice. However, I have still several questions:
>>>
>>>  1. The input of dlrs is the log ratios, the log ration extracted from
>>> the
>>> text file produced by Feature Extraction? or calculated from RGlist -->
>>> MAlist ?  I have searched the mailist and seen a post of you mentioned
>>> the
>>> difference of log ration from Feature Extraction and the default M value
>>> from read.maimages.
>>
>> You can read the Agilent FE manual for more details, but you can
>> probably use either and come to very similar conclusions.  If you use
>> the MAlist version, make sure to use only median centering or none for
>> normalization.
>>
>>>  2. I can get the log ratios of all features including control type of -1
>>> and 1, but these features don't have chromosome positions, does this mean
>>> I
>>> don't need all of them for quality assessment?
>>
>> We have not routinely used these probes, no.  If an array fails
>> miserably, then these control probes might be useful for determining
>> the reason for the failure, though.
>>
>>>  3. Some probes with the name of "chr2_random:xxxxx-yyyyyy" will not get
>>> a
>>> proper mapping on the chromosome, so I should remove these values from
>>> the
>>> input of dlrs. Is it so?
>>
>> You can either remove them or treat chr2_random as a separate chromosome.
>>
>>>  4. How could I handle those 1000 probes repeating 3 times?  They will be
>>> mapped on the same chromosome position by three per group.
>>
>> You could choose one at random or use a mean or median of them.  My
>> guess is that they agree very closely with one another so the choice
>> should not affect the results much.
>
> Hi, Sean
>
>    Thank you very much for your detailed reply and help.
>
>    Where can I get the references or official documentations about dlrs
> method?

It is a standard robust estimator of the variance and is not specific
to microarrays.  If you look at the code, it simply subtracts the
difference between adjacent probes and then normalizes the result.  If
the array is "noisy", the dlrs will be high.  This assumes that the
contribution due to large copy number changes is negligible which is
likely true since even the most abnormal cancer samples have fewer
than 1000 breaks.

>    In addition, we have design our array with dye-swap [test-cy3 vs ref-cy5,
> and test-cy5 vs ref-cy3]. Is there any method for utilizing the information
> here for quality assessment?

Not that I know of, but you could certainly look at correlations
between replicates, etc.  Our experience with Agilent CGH arrays is
that the contribution due to dye bias is small compared to changes due
to copy number.

Sean

Sean

------------------------------

Message: 21
Date: Wed, 22 Oct 2008 13:40:01 -0400
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Subject: Re: [BioC] GOstat: listing genes from hyperGTest

Cc: bioc <bioconductor at stat.math.ethz.ch>
Message-ID: <48FF6571.5090806 at med.umich.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Tim,

Yeah, probeSetSummary() is probably not what you want, if you are not 
starting with an Affy chip. There are some gymnastics required to map 
things back to the original Affy chip that you won't need to do. In 
addition, if you are not using a conditional hypergeometric analysis, it 
should be pretty simple to get what you want without even needing to 
parse things out of the GOHyperGResult object. An example:

## fake up some data

 > geneIds <- Lkeys(org.Hs.egGO)[sample(1:5000, 500)]
 > univ <- Lkeys(org.Hs.egGO)
 > param <- new("GOHyperGParams", geneIds = geneIds, 
universeGeneIds=univ, annotation="org.Hs.eg.db", ontology="BP")
 > hyp <- hyperGTest(param)
 > summary(hyp, categorySize=10)
       GOBPID      Pvalue OddsRatio   ExpCount Count Size 
   Term
1 GO:0007338 0.002723500  29.25101 0.07808304     2   54 single 
fertilization
2 GO:0009566 0.002925855  28.16374 0.08097501     2   56 
fertilization

So we have two terms of interest. Getting the Entrez Gene IDs from the 
input set that map to these terms is easy:

 > geneIds[geneIds %in% get("GO:0007338", revmap(org.Hs.egGO))]
[1] "100131137" "10007"

Now you might also want to know which 54 Entrez Gene IDs map to that 
particular GO term. Since you are not conditioning, this includes that 
particular GO term and all its offspring.

 > offspring <- get("GO:0007338", GOBPOFFSPRING)
 > egids <- unique(unlist(mget(c("GO:0007338", offspring), 
revmap(org.Hs.egGO), ifnotfound=NA), use.names=FALSE))
 > egids[!is.na(egids)]
  [1] "1047"      "4179"      "4240"      "4486"      "4809"      "5016" 

  [7] "6674"      "7783"      "7784"      "7802"      "7993"      "8747" 

[13] "8748"      "8852"      "9082"      "10007"     "10361"     "22917"
[19] "26476"     "53340"     "57055"     "57829"     "64100"     "93185"
[25] "158062"    "442868"    "100131137" "49"        "410"       "2683"
[31] "3010"      "4184"      "6677"      "7142"      "7455"      "8857"
[37] "11055"     "124626"    "2054"      "2741"      "10343"     "10566"
[43] "27297"     "152015"    "3074"      "167"       "928"       "2515"
[49] "5104"      "23553"     "284359"    "164684"    "7141"      "79400"

Best,

Jim

Tim Smith wrote:
> Thanks James. If I can tweak that function, I'll get exactly what I want. 
> 
> I tried what you suggested and got the following error:
> 
> ---------------------------
>  ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 
>  
>   paramsGO <- new("GOHyperGParams", geneIds = genes1,
>            universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
>            ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
>            testDirection = "over")
>  
>  GO <- hyperGTest(paramsGO)
>  ps <- probeSetSummary(GO)
> 
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) : 
>   variable "org.Hs.egENTREZID" was not found
> --------------------------------
> 
> I guess the function would return the probe ids if I was using them, but I have Entrez IDs as input.
> 
> Or am I doing something wrong?
> 
> thanks!
> 
> 
> 
> 
> 
> ----- Original Message ----
> From: James W. MacDonald <jmacdon at med.umich.edu>
> 
> Cc: bioc <bioconductor at stat.math.ethz.ch>
> Sent: Wednesday, October 22, 2008 9:10:39 AM
> Subject: Re: [BioC] GOstat: listing genes from hyperGTest
> 
> Hi Tim,
> 
> Does probeSetSummary() do what you want?
> 
> Best,
> 
> Jim
> 
> 
> 
> Tim Smith wrote:
>>  
>> Hi,
>>
>> I
>> was performing a hyperGTest for genes in homo-sapiens. For a set of
>> input genes, this function returns some 'significant' GO terms. What I
>> wanted to now do was to co-relate each significant GO term (returned by
>> this function) with genes (from my set of input genes) associated with
>> that GO term. However, I think that I may be using the wrong
>> package/function to get the releveant set of genes.
>>
>> Currently, what I'm doing is finding the significant GO terms by using the following code:
>>
>> -----------------------
>> ### 'genes1' are the Entrez IDs of my genes of interest, and 'allGenes' is the universe of Entrez IDs 
>>
>>  paramsGO <- new("GOHyperGParams", geneIds = genes1,
>>           universeGeneIds = allGenes, annotation = "org.Hs.eg.db", 
>>           ontology = "BP", pvalueCutoff = 1, conditional = FALSE, 
>>           testDirection = "over")
>>
>> GO <- hyperGTest(paramsGO)
>> --------------------------
>> This
>> gives me a set of significant GO terms. Now, I would like to find which
>> subset of genes in 'genes1' is associated with each of the significant
>> GO term. To do this I map all GO terms to their Entrez IDs using the
>> 'org.Hs.eg.db' package using the following:
>>
>> xx <- as.list(org.Hs.egGO2EG)
>>
>> to
>> get a mapping of GO terms to Entrez IDs. I get 6,756 GO terms (isn't
>> this number small?) that map to at least one Entrez ID. So, from here I
>> look up which Entrez IDs are associated with my GO term of interest.
>>
>> My
>> problem is that often, the GO term from hyperGTest is not associated
>> with any Entrez ID (using xx <- as.list(org.Hs.egGO2EG) described
>> above ), i.e. the GO term/ID is not in the list obtained from
>> 'org.Hs.egGO2EG'). For example, the term 'GO:0043284' is thrown up by
>> hyperGTest, but does not appear to be associated with any Entrez IDs in
>> the org.Hs.eg.db package. Where could I be going wrong?
>>
> [[elided Yahoo spam]]
>> Thanks for any comments/suggestions. I realize that I'm probably doing something really stupid here....
>>
>> My sessionInfo() is:
>> --------------------------------
>> R version 2.7.2 (2008-08-25) 
>> i386-pc-mingw32 
>>
>> locale:
>> LC_COLLATE=English_United
>> States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>>  [1] grid      splines   tools     stats     graphics  grDevices utils     datasets  methods   base    
>>
>> other attached packages:
>>  [1]
>> gplots_2.6.0         gmodels_2.14.1       gtools_2.4.0        
>> gdata_2.4.1          Rgraphviz_1.18.1     GOstats_2.6.0      
>> Category_2.6.0      
>>  [8] RBGL_1.16.0          annotate_1.18.0    
>> xtable_1.5-2         graph_1.18.0         PFAM.db_2.2.0      
>> GO.db_2.2.0          KEGG.db_2.2.0      
>> [15] org.Hs.eg.db_2.2.0   AnnotationDbi_1.2.0  RSQLite_0.6-8        DBI_0.2-4            genefilter_1.20.0    survival_2.34-1      affy_1.18.0        
>> [22] preprocessCore_1.2.0 affyio_1.8.0         Biobase_2.0.0      
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.11.11 MASS_7.2-44    
>>
>>
>> ---------------------------------
>>
>>
>>      
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-0646
734-936-8662

------------------------------

Message: 22
Date: Wed, 22 Oct 2008 10:48:02 -0700
From: Patrick Aboyoun <paboyoun at fhcrc.org>
Subject: Re: [BioC] Bioconductor installation problem: unable to
	access	repository
To: Shinichiro Wachi <swachi at ucdavis.edu>
Cc: bioconductor at stat.math.ethz.ch
Message-ID: <48FF6752.1020901 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Shin,
I appears that R 2.8 has changed the way it regulates Mac OS X binary packages for users running Mac OS X 10.5 (Leopard). We have just become aware of this change and will be adjusting the Bioconductor Mac OS X repositories accordingly over the next few days to adjust to these changes. The good news is that R 2.8 supports binary packages for both Mac OS X 10.4 (Tiger) and Mac OS X 10.5 (Leopard). I'll send out an e-mail to this group when the Mac OS X 10.5 (Leopard) packages are available for BioC 2.3.

Patrick

Shinichiro Wachi wrote:
> I am installing Bioconductor on a new machine (Intel Mac running OSX 10.5.5), R
> version is 2.8.0.
>
> sudo R is used for this session.
>
> These are the error messages I get:
>
> source("http://bioconductor.org/biocLite.R")
> biocinstall()
>
> Running biocinstall version 2.3.8 with R version 2.8.0 (under development)
> Your version of R requires version 2.3 of Bioconductor.
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/bioc/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/data/annotation/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/data/experiment/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://bioconductor.org/packages/2.3/extra/bin/macosx//contrib/2.8
> Warning: unable to access index for repository
> http://cran.fhcrc.org/bin/macosx//contrib/2.8
> Warning message:
> package 'Biobase' is not available 
>
> http://bioconductor.org/packages/2.3/bioc/src/contrib/PACKAGES
> will produce a web page displaying packages. (I read a post that asked for this.
> I am assuming this was a quick server diagnostics).
>
> Is there a problem with the server, or is this a temporary glitch in the
> installer?  Is there a workaround?
>
> Much thanks.
>
> Shin
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

------------------------------

Message: 23
Date: Wed, 22 Oct 2008 13:36:30 -0700
From: Patrick Aboyoun <paboyoun at fhcrc.org>
Subject: [BioC] Bioconductor 2.3 is released
To: BioC list <bioconductor at stat.math.ethz.ch>
Message-ID: <48FF8ECE.4060201 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Bioconductors:

We are pleased to announce the release of Bioconductor 2.3. This release 
includes 36 new software packages, and many changes to existing 
packages. Bioconductor 2.3 is comprised of 294 software packages and is 
compatible with the recently released R 2.8.0.

Please visit http://bioconductor.org for details and downloads.

IMPORTANT NOTE FOR MAC USERS:  R 2.8.0 is using a new Mac OS X binary 
package distribution system and the CRAN and BioC repositories need to 
catch up with this change. If you are using Mac OS X, please refrain 
from migrating to R-2.8.0 until these new binary package repositories 
are put in place, or use 'type="source"' when installing packages using 
biocLite.

Contents
========

  o Release Highlight
  o Getting Started with Bioconductor 2.3
  o New Software Packages
  o Software Packages in 2.2 that didn't make it to 2.3

Release Highlight
=================

This release contains a septet of packages (BSgenome, Biostrings, 
ShortRead, IRanges, HilbertVis, HilbertVisGUI, and rtracklayer) that are 
suited to analyze 'next generation' high-throughput DNA sequence data. 
The BSgenome package provides the backbone for representing genome 
sequences from many different organisms including human, mouse, rat, 
dog, chimp, chicken, cow, fruit fly, honey bee, yeast, E. coli, C. 
elegans, and arabidopsis. The Biostrings package performs fast or 
flexible alignments between reads and genomes. The ShortRead package 
provides tools for importation/exportation and quality assurance of 
common data formats. The IRanges package offers an emerging 
infrastructure for managing very large data objects and for range-based 
data representation. The packages HilbertVis and HilbertVisGUI display 
data with space-filling (Hilbert) curves that maintain the spatial 
information implied by the linearity of chromosomes. The rtracklayer 
package provides an interface to genome browsers and their annotation 
tracks.

Getting Started with Bioconductor 2.3
=====================================

IMPORTANT: MAC USERS: see the important note above.

To install Bioconductor 2.3

1. Install R 2.8.0.  Bioconductor 2.3 has been designed expressly for 
this version of R.

2. Follow the instructions here:

   http://bioconductor.org/docs/install

Please visit http://bioconductor.org for details and downloads.

New Packages
============

The following packages are new in this release of Bioconductor; visit

  http://bioconductor.org/packages/release/Software.html

for links to all package descriptions.

affyContam
 Structured corruption of cel file data to demonstrate QA effectiveness

Agi4x44PreProcess
 Preprocesses Agilent 4x44 array data

ArrayExpress
 Accesses the ArrayExpress microarray database at EBI

arrayMvout
 Analyzes AffyBatch instances

ArrayTools
 Quality assessment and differentially gene expression detection for 
Affymetrix GeneChips

BicARE
 Biclustering Analysis and Results Exploration

CGHbase
 Base functions and classes for arrayCGH data analysis

CGHregions
 Dimension reduction for arrayCGH data with minimal information loss

ChemmineR
 Compound Data Mining Framework

CMA
 Synthesis of microarray-based classification

DFP
 Supervised technique for identifying differentially expressed genes 
using Fuzzy Patterns (FPs).

domainsignatures
 Finds significantly enriched gene classifications based on their 
InterPro domain structure

dualKS
 Training and classifying gene expression data sets using a 
Kolmogorov-Smirnov rank-sum based algorithm

edgeR
 Estimates and tests for differential expression in multiple digital 
gene expression libraries

HELP
 Pipeline for analyzing HELP microarray data that includes graphical and 
mathematical tools

HilbertVis
 Functions to visualize long vectors of integer data by means of Hilbert 
curves

HilbertVisGUI
 An interactive tool to visualize long vectors of integer data by means 
of Hilbert curves

IRanges
 Infrastructure for managing large data objects and range-based data 
representations

ITALICS
 Normalizes Affymetrix GeneChip Human Mapping 100K and 500K set

iterativeBMA
 Bayesian Model Averaging (BMA) of classification models of 2-class 
microarray samples

iterativeBMAsurv
 Uses Bayesian Model Averaging (BMA) of survival analysis models of 
microarray data

KCsmart
 Multi-sample aCGH analysis package using kernel convolution

logitT
 Implements the Logit-t algorithm

LPEadj
 Extends the LPE algorithm

MEDME
 Determines absolute and relative DNA methylation scores from MeDIP 
enrichment measurements

miRNApath
 Provides pathway enrichment techniques for miRNA expression data

microRNA
 Accesses different data resources for microRNAs

minet
 Implements methods for inferring mutual information networks from data.

multiscan
 Estimates gene expressions from several laser scans of the same microarray

parody
 Provides routines for univariate and multivariate outlier detection

PLPE
 Performs tests for paired high-throughput data

RNAither
 Analyzes cell-based RNAi screens

RpsiXML
 Queries, data structure and interface to visualization of interaction 
datasets

SIM
 Finds associations between DNA copy number and gene expression

ShortRead
 Representation of high-throughput, short-read sequencing data

xmapbridge
 Plots graphs in the X:Map genome browser

Software Packages in 2.2 that didn't make it to 2.3
===================================================

1. SemSim
2. widgetInvoke

[[elided Yahoo spam]]

        The Biocore Team

------------------------------

Message: 24
Date: Wed, 22 Oct 2008 16:04:53 -0500
From: Jenny Drnevich <drnevich at illinois.edu>
Subject: Re: [BioC] How to save result from limma
To: Gordon K Smyth <smyth at wehi.EDU.AU>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID: <200810222104.m9ML4ss3012169 at expredir5.cites.uiuc.edu>
Content-Type: text/plain; charset="us-ascii"; format=flowed

Hi Gordon,

I just downloaded the new R 2.8.0 release and limma 2.16.0. I was 
checking out the new sort="none" option in topTable and I found an 
error in the help page for ?topTable. It doesn't list sort.by="none" 
as a possibility for topTable or toptable, but does list it for 
topTableF. However, it's actually the reverse; sort.by="none" works 
when using topTable with only 1 coef but not with more than 1 coef. 
Just thought I'd let you and the archives know...

Thanks again!
Jenny

At 07:15 PM 8/19/2008, Gordon K Smyth wrote:

>On Tue, 19 Aug 2008, Jenny Drnevich wrote:
>
>>At 06:14 PM 8/13/2008, Gordon K Smyth wrote:
>>>OK, I've added sort="none" to the possibilities.
>>>Best wishes
>>>Gordon
>>
>>Hi Gordon,
>>
>>Should this change to topTable be up on BioC by now? I just updated 
>>my packages on R 2.7.1 and the latest limma_2.14.5 does not have 
>>it. Neither does the developmental version limma_2.15.10 on R 2.8.0 
>>dev. Usually your changes appear very quickly...
>>
[[elided Yahoo spam]]
>>Jenny
>
>Not committed to BioC yet.  I'm getting older and slower.  Also, 
>there will be a number of code additions in my next commit to BioC, 
>which I'm still finalising.
>
>Best wishes
>Gordon

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at illinois.edu

------------------------------

Message: 25
Date: Wed, 22 Oct 2008 17:39:24 -0400
From: "Hui-Yi Chu" <huiyi.chu at gmail.com>
Subject: [BioC] scale questions
To: bioconductor at stat.math.ethz.ch
Message-ID:
	<aaeddd3f0810221439r19fe9f56rc06ffc812fa8405e at mail.gmail.com>
Content-Type: text/plain

Dear List,

I think this may be a simple question for you but I wanna make it sure for
further steps.
I have already done some of data pre-processing procedures for my affymetrix
yeast2 arrays. My next step is to get *ratios* from various conditions in wt
and mutant following by fold-change comparison. So my question is which step
I should scale my dataframe for comparison?

Here are parts of my codes (codes with underline are the questions):

wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
*wt.pt.f <- scale(exprs(wt.pt.f))      ### not sure

*mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
*mut.pt.f <- scale(exprs(mut.pt.f))    **### not sure*

gg      <- cbind(wt.pt.f, mut.pt.f)
*gg      <- scale(gg)*              *### not sure*
pt.f1   <- gg[,3]-gg[,1]
pt.f2   <- gg[,4]-gg[,2]
gg1     <- cbind(gg, pt.f1, pt.f2)
gg2     <- new("ExpressionSet", exprs=as.matrix(gg))
....... followed by further steps

As seen above, where should I scale the ratios? Scale wt and mut separately
or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]

Hui-Yi

	[[alternative HTML version deleted]]

------------------------------

Message: 26
Date: Wed, 22 Oct 2008 15:09:08 -0700
From: Florian Hahne <fhahne at fhcrc.org>
Subject: Re: [BioC] [Fwd: batch info for cellHTS]
To: Yan Zhou <Yan.Zhou at fccc.edu>
Cc: Bioconductor_help <bioconductor at stat.math.ethz.ch>
Message-ID: <48FFA484.7050405 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Yan,
the idea was that the user constructs the batch array manually and 
assigns it to the batch slot using the accessor method, e.g.
bt <- array(rep(1:2, each=5, dim=c(5,2,1))
batch(x) <- bt

I agree that it would be useful to have that functionality directly in 
the import functions. The only natural place where the batch information 
could go is in the platelist file. In the plateconf file, we don't have 
the notion of  plate replicates or samples any more.
Attached you find a slightly modified readPlateList function which 
evaluates an (optional) "Batch" column in the platelist file.
Hope that is the functionality you where looking for. Please let me know 
if you have further input.
Thanks,
Florian

PS: I forwarded this conversation to the mailing list. Others might 
benefit from it...

readPlateList <- function(filename,
                          path=dirname(filename),
                          name,
                          importFun,
                          verbose=interactive())
{
    file <- basename(filename)
    dfiles <- dir(path)

    if(!(is.character(path)&&length(path)==1))
        stop("'path' must be character of length 1")

    pd <- read.table(file.path(path, file), sep="\t", header=TRUE, 
as.is=TRUE)

    checkColumns(pd, file, mandatory=c("Filename", "Plate", "Replicate"),
                 numeric=c("Plate", "Replicate", "Channel", "Batch"))

    ## consistency check for "importFun"
    if (!missing(importFun)) {
        if (!is(importFun, "function"))
            stop("'importFun' should be a function to use to read the 
raw data files")
    } else {
        ## default function (compatible with the file format of the 
plate reader)
        importFun <- function(f) {
            txt <- readLines(f, warn=FALSE)
            sp <- strsplit(txt, "\t")
            well <- sapply(sp, "[", 2)
            val <- sapply(sp, "[", 3)
            out <- list(data.frame(well=I(well), val=as.numeric(val)), 
txt=I(txt))
            return(out)
        }
    }

    ## check if the data files are in the given directory
    a <- unlist(sapply(pd$Filename, function(z) grep(z, dfiles, 
ignore.case=TRUE)))
    if (length(a)==0)
        stop(sprintf("None of the files were found in the given 'path': 
%s", path))

    f <- file.path(path, dfiles[a])

    ## check if 'importFun' gives the output in the desired form
    aux <- importFun(f[1])
    if (which(unlist(lapply(aux, is, "data.frame"))) != 1 |
        !all(c("val", "well") %in% names(aux[[1]])) | length(aux)!=2)
        stop("The output of 'importFun' must be a list with 2 
components;\n",
             "the first component should be a 'data.frame' with slots 
'well' and 'val'.")

    ## auto-determine the plate format
    well <- as.character(importFun(f[1])[[1]]$well)
    let <- substr(well, 1, 1)
    num <- substr(well, 2, 3)
    let <- match(let, LETTERS)
    num <- as.integer(num)
    if(any(is.na(let))||any(is.na(num)))
        stop(sprintf("Malformated column 'well' in input file %s", f[1]))

    dimPlate <- c(nrow=max(let), ncol=max(num))
    nrWell <- prod(dimPlate)

    if(verbose)
        cat(sprintf("%s: found data in %d x %d (%d well) format.\n", name,
                    dimPlate[1], dimPlate[2], nrWell))

    ## Should we check whether these are true?
    ##     "96"  = c(nrow=8, ncol=12),
    ##     "384" = c(nrow=16, ncol=24),

    nrRep <- max(pd$Replicate)
    nrPlate <- max(pd$Plate)

    combo <- paste(pd$Plate, pd$Replicate)

    ## Channel: if not given, this implies that there is just one
    if("Channel" %in% colnames(pd)) {
        nrChannel <- max(pd$Channel)
        channel <- pd$Channel
        combo <- paste(combo, pd$Channel)
    } else {
        nrChannel <- 1L
        channel <- rep(1L, nrow(pd))
        pd$Channel <- channel   
    }

    whDup <- which(duplicated(combo))
    if(length(whDup)>0L) {
        idx <- whDup[1:min(5L, length(whDup))]
        msg <- paste("The following rows are duplicated in the plateList 
table:\n",
                     "\tPlate Replicate Channel\n", "\t",
                     paste(idx, combo[idx], sep="\t", collapse="\n\t"),
                     if(length(whDup)>5) sprintf("\n\t...and %d more.\n",
                                                 length(whDup)-5), "\n", 
sep="")
        stop(msg)
    }

    xraw <- array(NA_real_, dim=c(nrWell, nrPlate, nrRep, nrChannel))
    intensityFiles <- vector(mode="list", length=nrow(pd))
    names(intensityFiles) <- pd[, "Filename"]

    status <- character(nrow(pd))

    for(i in seq_len(nrow(pd))) {
        if(verbose)
            cat("\rReading ", i, ": ", pd$Filename[i], sep="")

        ff <- grep(pd[i, "Filename"], dfiles, ignore.case=TRUE)

        if (length(ff)!=1) {
            f <- file.path(path, pd[i, "Filename"])
            status[i] <- sprintf("File not found: %s", f)
        } else {
            f <- file.path(path, dfiles[ff])
            names(intensityFiles)[i] <- dfiles[ff]
            status[i] <- tryCatch({
                out <- importFun(f)
                pos <- convertWellCoordinates(out[[1]]$well, dimPlate)$num
                intensityFiles[[i]] <- out[[2]]
                xraw[pos, pd$Plate[i], pd$Replicate[i], channel[i]] <- 
out[[1]]$val
                "OK"
            }, warning=function(e) paste(class(e)[1], e$message, sep=": "),
                                  error=function(e)   paste(class(e)[1], 
e$message, sep=": ")
                                  ) ## tryCatch

        } ## else
    } ## for

    if(verbose)
        cat("\rRead", nrow(pd), "plates.             \n\n")

    ## ----  Store the data as a "cellHTS" object ----
    ## arrange the assayData slot:
    dat <- lapply(seq_len(nrChannel), function(ch)
                  matrix(xraw[,,,ch], ncol=nrRep, nrow=nrWell*nrPlate))
    names(dat) <- paste("ch", seq_len(nrChannel), sep="")

    adata <- do.call("assayDataNew", c(storage.mode="lockedEnvironment", 
dat))

    ## arrange the phenoData slot:
    pdata <- new("AnnotatedDataFrame",
                 data <- data.frame(replicate=seq_len(nrRep),
                                    assay=rep(name, nrRep),
                                    stringsAsFactors=FALSE),
                 varMetadata=data.frame(labelDescription=c("Replicate 
number",
                                                           "Biological 
assay"),
                                        channel=factor(rep("_ALL_", 2L),

levels=c(names(dat), "_ALL_")),
                                        row.names=c("replicate", "assay"),
                                        stringsAsFactors=FALSE))

    ## arrange the featureData slot:
    well <- convertWellCoordinates(seq_len(nrWell), pdim=dimPlate)$letnum
    fdata <- new("AnnotatedDataFrame",
                 data <- data.frame(plate=rep(seq_len(nrPlate), 
each=nrWell),
                                    well=rep(well, nrPlate),
                                    controlStatus=factor(rep("unknown", 
nrWell*nrPlate)),
                                    stringsAsFactors=FALSE),
                 varMetadata=data.frame(labelDescription=c("Plate 
number", "Well ID",
                                                           "Well 
annotation"),
                                        row.names=c("plate", "well", 
"controlStatus"),
                                        stringsAsFactors=FALSE))
    res <- new("cellHTS",
               assayData=adata,
               phenoData=pdata,
               featureData=fdata,
               plateList=cbind(pd[,1L,drop=FALSE], status=I(status), 
pd[,-1L,drop=FALSE]),
               intensityFiles=intensityFiles)

    ## if there is a batch column in the platelist file we want to import it
    if("Batch" %in% colnames(pd)){
        bat <- pd$Batch[order(pd$Replicate, pd$Channel)]
        dim(bat) <- c(max(plate(res)), ncol(res), length(channelNames(res)))
        res at batch <- bat
    }

    ## output the possible errors that were encountered along the way:
    whHadProbs <- which(status!="OK")
    if(length(whHadProbs)>0 & verbose) {
        idx <- whHadProbs[1:min(5, length(whHadProbs))]
        msg <- paste("Please check the following problems encountered 
while reading the data:\n",
                     "\tFilename \t Error\n", "\t",
                     paste(plateList(res)$Filename[idx], status[idx], 
sep="\t", collapse="\n\t"),
                     if(length(whHadProbs)>5) sprintf("\n\t...and %d 
more.\n",

length(whHadProbs)-5), "\n", sep="")
        stop(msg)
    }

    return(res)
}

Yan Zhou wrote:
> Dear Florian,
>
> I understand the different meaning of batch from the 2 cellHTS 
> packages. I just don't know how to add the "batch" slot.(we have a 
> large screen with screen done on different day,we want to use the 
> variance adjustment by batch function). I tried to add a "batch" 
> column in the plate configuration file, but doesn't seem to be taken 
> into cellHTS2 object. then I tried to add "batch" column in the plate 
> list file , still didn't do anything. In another word, I couldn't 
> figure out how to add the batch slot to the cellHTS object. please 
[[elided Yahoo spam]]
>
> yan
>
> Florian Hahne wrote:
>
>> Hi Yan,
>> Ligia forwarded your mail to me.
>>
>> The batch concept is a little bit different in cellHTS2. Basically, 
>> we separated the ability to included several plate configurations and 
>> the batch-specific parameter estimation (e.g. in the normalizePlate 
>> function). For the former, you can use a regular expression syntax in 
>> your plate configuration file, while the latter has to be added 
>> manually into the new 'batch' slot. All of this is explained in much 
>> more detail in the package vignette: cellHTS2 - Main vignette 
>> (complete version): End-to-end analysis of cell-based screens
>>
>> Let me know if this doesn't get you anywhere.
>>
>> Florian
>>
>> ligia at ebi.ac.uk wrote:
>>
>>> Hi Florian,
>>>
>>> Hope you're doing fine.
>>> Could you take care of this for me?
>>> Cheers,
>>> Ligia
>>>
>>>
>>> ------------------------- Original Message ----------------------------
>>> Subject: batch info for cellHTS
>>> From:    "Yan Zhou" <Yan.Zhou at fccc.edu>
>>> Date:    Fri, October 17, 2008 19:14
>>> To:      ligia at ebi.ac.uk
>>> -------------------------------------------------------------------------- 
>>>
>>>
>>> Dear Ligia,
>>>
>>> I'm using the cellHTS2 package for HTS data analysis. For the old
>>> cellHTS package, I knew how to incorporate the batch information. But
>>> for the new cellHTS2 package, I couldn't have it done right. I attached
>>> my plate configuration file and also plate list file with this email. I
>>> was wondering whether you would have time to help me take a look, and
>>> point me to the right directions. Thanks a lot for your time and 
>>> kind help.
>>>
>>> yan
>>>   
>>
>>
>>
>

-- 
Florian Hahne, PhD
Computational Biology Program
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-3148
fhahne at fhcrc.org

------------------------------

Message: 27
Date: Wed, 22 Oct 2008 19:19:26 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: [BioC] problem with Category package and custom annotationDbi
To: "Bioconductor Newsgroup" <bioconductor at stat.math.ethz.ch>
Message-ID:
	<6b93d1830810221619x7ad5a565hf08bf287839f74f0 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Today I believe I successfuly built a annotation package for my Affy
Rat Gene ST data using annotationDbi, at least I got no errors during
the build and it loads properly. I get the following error output,
however, when I try to run hyperGTest, package Category, on a vector
of Entrez Gene IDs and a vector of the gene universe of the chip. I
suspect I did something wrong when building the annotation package,
but I have no clue what that could be. I've used this same code with
chipsets whose annotation packages are built by the BioConductor team
without issue.

> params <- new("GOHyperGParams", geneIds = myEGs,
+                   universeGeneIds = myGeneUniverse,
+               annotation = annotation(AOP$eSet),
+                   ontology = "BP", pvalueCutoff = 0.05, conditional
= TRUE, testDirection = "over")
> params
A GOHyperGParams instance
  category: GO
annotation: ragene10stv1
> hyperGTest(params)
Error in getUniverseHelper(probes, datPkg, entrezIds) :
  No Entrez Gene ids left in universe

Enter a frame number, or 0 to exit

1: hyperGTest(params)
2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT
3: is(object, Cl)
4: is(object, Cl)
5: universeBuilder(p)
6: universeBuilder(p)
7: getUniverseViaGo(p)
8: getUniverseHelper(probes, datPkg, entrezIds)

Selection: 8
Called from: eval(expr, envir, enclos)
Browse[1]> ls()
[1] "datPkg"    "entrezIds" "probes"    "univ"
Browse[1]> datPkg
An object of class "ArabadopsisDatPkg"
Slot "name":
[1] "ragene10stv1"

Browse[1]> entrezIds[1:5]
[1] "65049"  "60444"  "313914" "140941" "306868"
Browse[1]> probes[1:5]
[1] "10701636" "10701643" "10701654" "10701663" "10701679"
Browse[1]> univ
character(0)
Browse[1]>

------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219  Home
Skype:  mkimpel

"The real problem is not whether machines think but whether men do."
-- B. F. Skinner

------------------------------

Message: 28
Date: Wed, 22 Oct 2008 16:41:38 -0700
From: Marc Carlson <mcarlson at fhcrc.org>
Subject: Re: [BioC] problem with Category package and custom
	annotationDbi
To: Mark Kimpel <mwkimpel at gmail.com>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID: <48FFBA32.3030406 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1

Hi Mark,

If the package you built and installed were called "MarkPackage.db",
then what would be the output of "MarkPackage()" (right after you loaded
it)?  Also, what is the output of sessionInfo()?

  Marc

Mark Kimpel wrote:
> Today I believe I successfuly built a annotation package for my Affy
> Rat Gene ST data using annotationDbi, at least I got no errors during
> the build and it loads properly. I get the following error output,
> however, when I try to run hyperGTest, package Category, on a vector
> of Entrez Gene IDs and a vector of the gene universe of the chip. I
> suspect I did something wrong when building the annotation package,
> but I have no clue what that could be. I've used this same code with
> chipsets whose annotation packages are built by the BioConductor team
> without issue.
>
>   
>> params <- new("GOHyperGParams", geneIds = myEGs,
>>     
> +                   universeGeneIds = myGeneUniverse,
> +               annotation = annotation(AOP$eSet),
> +                   ontology = "BP", pvalueCutoff = 0.05, conditional
> = TRUE, testDirection = "over")
>   
>> params
>>     
> A GOHyperGParams instance
>   category: GO
> annotation: ragene10stv1
>   
>> hyperGTest(params)
>>     
> Error in getUniverseHelper(probes, datPkg, entrezIds) :
>   No Entrez Gene ids left in universe
>
> Enter a frame number, or 0 to exit
>
> 1: hyperGTest(params)
> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT
> 3: is(object, Cl)
> 4: is(object, Cl)
> 5: universeBuilder(p)
> 6: universeBuilder(p)
> 7: getUniverseViaGo(p)
> 8: getUniverseHelper(probes, datPkg, entrezIds)
>
> Selection: 8
> Called from: eval(expr, envir, enclos)
> Browse[1]> ls()
> [1] "datPkg"    "entrezIds" "probes"    "univ"
> Browse[1]> datPkg
> An object of class "ArabadopsisDatPkg"
> Slot "name":
> [1] "ragene10stv1"
>
> Browse[1]> entrezIds[1:5]
> [1] "65049"  "60444"  "313914" "140941" "306868"
> Browse[1]> probes[1:5]
> [1] "10701636" "10701643" "10701654" "10701663" "10701679"
> Browse[1]> univ
> character(0)
> Browse[1]>
>
> ------------------------------------------------------------
> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
> Indiana University School of Medicine
>
> 15032 Hunter Court, Westfield, IN  46074
>
> (317) 490-5129 Work, & Mobile & VoiceMail
> (317) 399-1219  Home
> Skype:  mkimpel
>
> "The real problem is not whether machines think but whether men do."
> -- B. F. Skinner
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

------------------------------

Message: 29
Date: Wed, 22 Oct 2008 19:51:17 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] scale questions
To: "Hui-Yi Chu" <huiyi.chu at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
	<264855a00810221651n656b5a8ak226daa013f282fab at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:
> Dear List,
>
> I think this may be a simple question for you but I wanna make it sure for
> further steps.
> I have already done some of data pre-processing procedures for my affymetrix
> yeast2 arrays. My next step is to get *ratios* from various conditions in wt
> and mutant following by fold-change comparison. So my question is which step
> I should scale my dataframe for comparison?
>
> Here are parts of my codes (codes with underline are the questions):
>
> wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
> wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
> wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
> wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
> *wt.pt.f <- scale(exprs(wt.pt.f))      ### not sure
>
> *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
> mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
> mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
> mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
> *mut.pt.f <- scale(exprs(mut.pt.f))    **### not sure*
>
> gg      <- cbind(wt.pt.f, mut.pt.f)
> *gg      <- scale(gg)*              *### not sure*
> pt.f1   <- gg[,3]-gg[,1]
> pt.f2   <- gg[,4]-gg[,2]
> gg1     <- cbind(gg, pt.f1, pt.f2)
> gg2     <- new("ExpressionSet", exprs=as.matrix(gg))
> ....... followed by further steps
>
>
> As seen above, where should I scale the ratios? Scale wt and mut separately
> or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]

I may be misunderstanding, but why do you want to scale the log
ratios?  Generally, you would not scale them at all.  And I really
cannot tell why you are forming ratios in the first place?  Do you
have replicates of any kind?

Sean

------------------------------

Message: 30
Date: Wed, 22 Oct 2008 20:31:33 -0400
From: "Sean Davis" <sdavis2 at mail.nih.gov>
Subject: Re: [BioC] scale questions
To: "Hui-Yi Chu" <huiyi.chu at gmail.com>
Cc: bioconductor at stat.math.ethz.ch
Message-ID:
	<264855a00810221731p1abc243cj1a59dc6726ebc7e0 at mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Wed, Oct 22, 2008 at 8:16 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:
> Hi Sean,
>
> Yes, they are replicate in ratios. In other words, these ratios are from
> untreatment and treatment of wt1, wt2, mut1,mut2, so total is 8 ratios as
> below.
>
> untreatmen wt1, wt2, mut1, mut2
> treatment   wt1, wt2, mut1, mut2
>
> And I wanna get the second values like:
> r1: untreatment  mut1/wt1
> r2: untreatment  mut2/wt2
> r3: treatment     mut1/wt1
> r4: treatment     mut2/wt2
>
> Now we have 4 ratios. And then I will compare r3 vs r1, r4 vs r2 to get most
> fold changes genes. (I know I need three replicates, but  I cannot convince
> my adviser, therefore, this is the strategy I can use so far. ) So the
> question is should I scale the 8 ratios from wt and mut separately (twice)
> or together(once) before I get the four values?

Hi, Hui-Yi.

You should not do any scaling.  You can use the log fold changes
directly.  You seem to understand that this is not an optimal design,
so I won't belabor that point.

Sean

>
> On Wed, Oct 22, 2008 at 7:51 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>>
>> On Wed, Oct 22, 2008 at 5:39 PM, Hui-Yi Chu <huiyi.chu at gmail.com> wrote:
>> > Dear List,
>> >
>> > I think this may be a simple question for you but I wanna make it sure
>> > for
>> > further steps.
>> > I have already done some of data pre-processing procedures for my
>> > affymetrix
>> > yeast2 arrays. My next step is to get *ratios* from various conditions
>> > in wt
>> > and mutant following by fold-change comparison. So my question is which
>> > step
>> > I should scale my dataframe for comparison?
>> >
>> > Here are parts of my codes (codes with underline are the questions):
>> >
>> > wt.pt.f1 <- exprs(esetsub[, 1])- exprs(esetsub[, 17])
>> > wt.pt.f2 <- exprs(esetsub[, 2])- exprs(esetsub[, 18])
>> > wt.pt.f <- cbind(wt.pt.f1, wt.pt.f2)
>> > wt.pt.f <- new("ExpressionSet", exprs= as.matrix(wt.pt.f))
>> > *wt.pt.f <- scale(exprs(wt.pt.f))      ### not sure
>> >
>> > *mut.pt.f1 <- exprs(esetsub[, 9])- exprs(esetsub[, 21])
>> > mut.pt.f2 <- exprs(esetsub[, 10])- exprs(esetsub[, 22])
>> > mut.pt.f <- cbind(mut.pt.f1, mut.pt.f2)
>> > mut.pt.f <- new("ExpressionSet", exprs= as.matrix(mut.pt.f))
>> > *mut.pt.f <- scale(exprs(mut.pt.f))    **### not sure*
>> >
>> > gg      <- cbind(wt.pt.f, mut.pt.f)
>> > *gg      <- scale(gg)*              *### not sure*
>> > pt.f1   <- gg[,3]-gg[,1]
>> > pt.f2   <- gg[,4]-gg[,2]
>> > gg1     <- cbind(gg, pt.f1, pt.f2)
>> > gg2     <- new("ExpressionSet", exprs=as.matrix(gg))
>> > ....... followed by further steps
>> >
>> >
>> > As seen above, where should I scale the ratios? Scale wt and mut
>> > separately
>> > or together before getting pt.f1 and pt.f2 ratios??
[[elided Yahoo spam]]
>>
>> I may be misunderstanding, but why do you want to scale the log
>> ratios?  Generally, you would not scale them at all.  And I really
>> cannot tell why you are forming ratios in the first place?  Do you
>> have replicates of any kind?
>>
>> Sean
>
>

------------------------------

Message: 31
Date: Wed, 22 Oct 2008 21:26:16 -0400
From: "Mark Kimpel" <mwkimpel at gmail.com>
Subject: Re: [BioC] problem with Category package and custom
	annotationDbi
To: "Marc Carlson" <mcarlson at fhcrc.org>
Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
Message-ID:
	<6b93d1830810221826y71050c5ckb14dbca4644c9bf7 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> ragene10stv1()
Quality control information for ragene10stv1:

This package has the following mappings:

ragene10stv1ACCNUM has 0 mapped keys (of 29214 keys)
ragene10stv1ALIAS2PROBE has 31410 mapped keys (of 31410 keys)
ragene10stv1CHR has 20998 mapped keys (of 29214 keys)
ragene10stv1CHRLENGTHS has 23 mapped keys (of 23 keys)
ragene10stv1CHRLOC has 13092 mapped keys (of 29214 keys)
ragene10stv1CHRLOCEND has 13092 mapped keys (of 29214 keys)
ragene10stv1ENSEMBL has 15812 mapped keys (of 29214 keys)
ragene10stv1ENSEMBL2PROBE has 14570 mapped keys (of 14570 keys)
ragene10stv1ENTREZID has 21170 mapped keys (of 29214 keys)
ragene10stv1ENZYME has 1586 mapped keys (of 29214 keys)
ragene10stv1ENZYME2PROBE has 705 mapped keys (of 705 keys)
ragene10stv1GENENAME has 21170 mapped keys (of 29214 keys)
ragene10stv1GO has 13813 mapped keys (of 29214 keys)
ragene10stv1GO2ALLPROBES has 9850 mapped keys (of 9850 keys)
ragene10stv1GO2PROBE has 7430 mapped keys (of 7430 keys)
ragene10stv1MAP has 20351 mapped keys (of 29214 keys)
ragene10stv1PATH has 4357 mapped keys (of 29214 keys)
ragene10stv1PATH2PROBE has 206 mapped keys (of 206 keys)
ragene10stv1PFAM has 17187 mapped keys (of 29214 keys)
ragene10stv1PMID has 12198 mapped keys (of 29214 keys)
ragene10stv1PMID2PROBE has 36641 mapped keys (of 36641 keys)
ragene10stv1PROSITE has 17187 mapped keys (of 29214 keys)
ragene10stv1REFSEQ has 18683 mapped keys (of 29214 keys)
ragene10stv1SYMBOL has 21170 mapped keys (of 29214 keys)
ragene10stv1UNIGENE has 17585 mapped keys (of 29214 keys)
ragene10stv1UNIPROT has 9826 mapped keys (of 29214 keys)

Additional Information about this package:

DB schema: RATCHIP_DB
DB schema version: 1.0
Organism: Rattus norvegicus
Date for NCBI data: 2008-Sep2
Date for GO data: 200808
Date for KEGG data: 2008-Sep2
Date for Golden Path data: 2006-Jun20
Date for IPI data: 2008-Sep02
Date for Ensembl data: 2008-Jul23
> sessionInfo()
R version 2.8.0 (2008-10-20)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] ragene10stv1.db_1.0.0 RSQLite_0.7-0         DBI_0.2-4
[4] AnnotationDbi_1.4.0   Biobase_2.2.0         graph_1.20.0

loaded via a namespace (and not attached):
[1] cluster_1.11.11
>
------------------------------------------------------------
Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 399-1219  Home
Skype:  mkimpel

"The real problem is not whether machines think but whether men do."
-- B. F. Skinner
******************************************************************

On Wed, Oct 22, 2008 at 7:41 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
> Hi Mark,
>
> If the package you built and installed were called "MarkPackage.db",
> then what would be the output of "MarkPackage()" (right after you loaded
> it)?  Also, what is the output of sessionInfo()?
>
>  Marc
>
>
>
>
> Mark Kimpel wrote:
>> Today I believe I successfuly built a annotation package for my Affy
>> Rat Gene ST data using annotationDbi, at least I got no errors during
>> the build and it loads properly. I get the following error output,
>> however, when I try to run hyperGTest, package Category, on a vector
>> of Entrez Gene IDs and a vector of the gene universe of the chip. I
>> suspect I did something wrong when building the annotation package,
>> but I have no clue what that could be. I've used this same code with
>> chipsets whose annotation packages are built by the BioConductor team
>> without issue.
>>
>>
>>> params <- new("GOHyperGParams", geneIds = myEGs,
>>>
>> +                   universeGeneIds = myGeneUniverse,
>> +               annotation = annotation(AOP$eSet),
>> +                   ontology = "BP", pvalueCutoff = 0.05, conditional
>> = TRUE, testDirection = "over")
>>
>>> params
>>>
>> A GOHyperGParams instance
>>   category: GO
>> annotation: ragene10stv1
>>
>>> hyperGTest(params)
>>>
>> Error in getUniverseHelper(probes, datPkg, entrezIds) :
>>   No Entrez Gene ids left in universe
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: hyperGTest(params)
>> 2: .valueClassTest(standardGeneric("hyperGTest"), "HyperGResultBase", "hyperGT
>> 3: is(object, Cl)
>> 4: is(object, Cl)
>> 5: universeBuilder(p)
>> 6: universeBuilder(p)
>> 7: getUniverseViaGo(p)
>> 8: getUniverseHelper(probes, datPkg, entrezIds)
>>
>> Selection: 8
>> Called from: eval(expr, envir, enclos)
>> Browse[1]> ls()
>> [1] "datPkg"    "entrezIds" "probes"    "univ"
>> Browse[1]> datPkg
>> An object of class "ArabadopsisDatPkg"
>> Slot "name":
>> [1] "ragene10stv1"
>>
>> Browse[1]> entrezIds[1:5]
>> [1] "65049"  "60444"  "313914" "140941" "306868"
>> Browse[1]> probes[1:5]
>> [1] "10701636" "10701643" "10701654" "10701663" "10701679"
>> Browse[1]> univ
>> character(0)
>> Browse[1]>
>>
>> ------------------------------------------------------------
>> Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
>> Indiana University School of Medicine
>>
>> 15032 Hunter Court, Westfield, IN  46074
>>
>> (317) 490-5129 Work, & Mobile & VoiceMail
>> (317) 399-1219  Home
>> Skype:  mkimpel
>>
>> "The real problem is not whether machines think but whether men do."
>> -- B. F. Skinner
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>

------------------------------

Message: 32
Date: Thu, 23 Oct 2008 14:07:48 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
Subject: Re: [BioC] How to save result from limma
To: Jenny Drnevich <drnevich at illinois.edu>
Cc: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID: <Pine.WNT.4.64.0810231405491.2800 at PC602.alpha.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Dear Jenny,

Thanks for the heads-up.  I believe I have fixed it entirely now in limma 
2.16.2.  Try it out and see if you can break it.

Best wishes
Gordon

On Wed, 22 Oct 2008, Jenny Drnevich wrote:

> Hi Gordon,
>
> I just downloaded the new R 2.8.0 release and limma 2.16.0. I was checking 
> out the new sort="none" option in topTable and I found an error in the help 
> page for ?topTable. It doesn't list sort.by="none" as a possibility for 
> topTable or toptable, but does list it for topTableF. However, it's actually 
> the reverse; sort.by="none" works when using topTable with only 1 coef but 
> not with more than 1 coef. Just thought I'd let you and the archives know...
>
> Thanks again!
> Jenny
>
> At 07:15 PM 8/19/2008, Gordon K Smyth wrote:
>
>
>> On Tue, 19 Aug 2008, Jenny Drnevich wrote:
>> 
>>> At 06:14 PM 8/13/2008, Gordon K Smyth wrote:
>>>> OK, I've added sort="none" to the possibilities.
>>>> Best wishes
>>>> Gordon
>>> 
>>> Hi Gordon,
>>> 
>>> Should this change to topTable be up on BioC by now? I just updated my 
>>> packages on R 2.7.1 and the latest limma_2.14.5 does not have it. Neither 
>>> does the developmental version limma_2.15.10 on R 2.8.0 dev. Usually your 
>>> changes appear very quickly...
>>> 
[[elided Yahoo spam]]
>>> Jenny
>> 
>> Not committed to BioC yet.  I'm getting older and slower.  Also, there will 
>> be a number of code additions in my next commit to BioC, which I'm still 
>> finalising.
>> 
>> Best wishes
>> Gordon
>
> Jenny Drnevich, Ph.D.
>
> Functional Genomics Bioinformatics Specialist
> W.M. Keck Center for Comparative and Functional Genomics
> Roy J. Carver Biotechnology Center
> University of Illinois, Urbana-Champaign
>
> 330 ERML
> 1201 W. Gregory Dr.
> Urbana, IL 61801
> USA
>
> ph: 217-244-7355
> fax: 217-265-5066
> e-mail: drnevich at illinois.edu 
>

------------------------------

Message: 33
Date: Wed, 22 Oct 2008 22:28:59 -0500
From: "Wei,Caimiao" <caiwei at mdanderson.org>
Subject: [BioC] Package "xps" "import.expr.scheme" error
To: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
Message-ID:
	<19D18D962A716B4EA5D26CB24F14DC26339D0DE943 at DCPWVMBXC1VS3.mdanderson.edu>

Content-Type: text/plain

I am importing chip definition and annotation files to create a ROOT scheme and get this error:

Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir = scmdir,  :
  error in function 'ImportExprSchemes'

> libdir <- "/mypath/Affy/libraryfiles"
> anndir <- "/mypath/Affy/Annotation"
> scmdir <- "/mypath/CRAN/Workspaces/Schemes"
>
> scheme.hgu133p2.na26 <- import.expr.scheme(filename="Scheme_HGU133p2_na26",
+ filedir=scmdir,schemefile=paste(libdir,"HG-U133_Plus_2.cdf",sep="/"),
+ probefile=paste(libdir,"HG-U133_Plus_2.probe.tab",sep="/"),
+  annotfile=paste(anndir,"HG-U133_Plus_2.na26.annot.csv",sep="/"))

Error in import.expr.scheme(filename = "Scheme_HGU133p2_na26", filedir = scmdir,  :
  error in function 'ImportExprSchemes'

> sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] xps_1.0.2
>

Thanks for any help!

Caimiao

	[[alternative HTML version deleted]]

------------------------------

Message: 34
Date: Thu, 23 Oct 2008 02:49:47 -0400
From: Leon Peshkin <pesha at hms.harvard.edu>
Subject: Re: [BioC] Lumi and Beadstudio 1.5.13
To: <bioconductor at stat.math.ethz.ch>
Message-ID: <9EF526FD-9EA0-4489-83D1-6F3E9B93FAE9 at hms.harvard.edu>
Content-Type: text/plain; charset="US-ASCII"; format=flowed; delsp=yes

Hi Pan,
    I was wondering if you could help me resolve the issue with lumi  
package,
I am able to load Illumina data with lumiR, but then when I try  
background adjustment
it fails:

  A0 <- lumiR("killme.txt", convertNuID =FALSE, inputAnnotation =FALSE)
 > B0 <- lumiB(A0,method='bgAdjust')
Error in `[.data.frame`(control, , sampleNames(lumiBatch)) :
   undefined columns selected

-Leon

------------------------------

Message: 35
Date: Thu, 23 Oct 2008 10:43:18 +0200 (CEST)
From: Clara de Dessous Ch?ri <email at dessouscheri.emv1.com>
Subject: [BioC] Offre exceptionnelle suite au probl?me technique
To: <bioconductor at stat.math.ethz.ch>
Message-ID: <20276360648.3897244.1224751398367 at schr1>
Content-Type: text/plain

   Pour etre sur de recevoir tous nos emails, nous vous conseillons d'ajouter
             email at dessouscheri.emv1.com a votre carnet d'adresses.
Si cet email ne s'affiche pas correctement, vous pouvez le visualiser grace a ce
                                      lien

   Si vous ne souhaitez plus recevoir la newsletter de Dessous Cheri,utilisez
   le lien de desabonnement

	[[alternative HTML version deleted]]

------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor

End of Bioconductor Digest, Vol 68, Issue 23