[BioC] Bioconductor Digest, Vol 121, Issue 13

Thornton, Matthew Matthew.Thornton at med.usc.edu
Wed Mar 13 15:38:46 CET 2013



"bioconductor-request at r-project.org" <bioconductor-request at r-project.org> wrote:


Send Bioconductor mailing list submissions to
        bioconductor at r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://stat.ethz.ch/mailman/listinfo/bioconductor
or, via email, send a message with subject or body 'help' to
        bioconductor-request at r-project.org

You can reach the person managing the list at
        bioconductor-owner at r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Bioconductor digest..."


Today's Topics:

   1. edgeR: paired samples DGE and GLM (Alessandra [guest])
   2. Re: methyAnalysis - changes to GenoSet (Kasper Daniel Hansen)
   3. QuasiSeq vs DSS (Richard Friedman)
   4. Re: methyAnalysis - changes to GenoSet (Tim Triche, Jr.)
   5. displaying the image with EBImage (Nima [guest])
   6. Help on Illumina HumanHT12 v4 chromosome filtering (Kemal Akat)
   7. Re: QuasiSeq vs DSS (Ryan C. Thompson)
   8. Re: QuasiSeq vs DSS (Richard Friedman)
   9. Re: QuasiSeq vs DSS (Ryan C. Thompson)
  10. Re: edgeR: paired samples DGE and GLM (Ryan C. Thompson)
  11. Re: edgeR: new defaults of estimateTagwiseDisp and exactTest
      (Zhuzhu Zhang)
  12. Subsampling of BAM/SAM alignments (Julian Gehring)
  13. DESeq2 (Wolfgang Huber)
  14. Re: displaying the image with EBImage (Wolfgang Huber)
  15. Re: displaying the image with EBImage (Nima Ahmadian)
  16. Re: displaying the image with EBImage (Andrzej Ole?)
  17. Re: edgeR: new defaults of estimateTagwiseDisp and exactTest
      (Gordon K Smyth)
  18. negative correlation from limma's duplicateCorrelation
      (Gordon K Smyth)
  19. anova like test in edgeR (Gordon K Smyth)
  20. Re: displaying the image with EBImage (Nima Ahmadian)
  21. Re: Negative parameter error when using goseq (Nadia Davidson)
  22. Re: anova like test in edgeR (Gordon K Smyth)
  23. Re: Subsampling of BAM/SAM alignments (Martin Morgan)
  24. ArrayExpress Question (Amy Vandiver)
  25. Error in getGEO command with GSEMatrix=TRUE
      (Marwaha, Shruti (marwahsi))
  26. biomaRt access for NCBIM37 build of mouse genome (chris_utah)
  27. Re: anova like test in edgeR (Ryan C. Thompson)
  28. Re: DESeq2 (Ryan C. Thompson)
  29. Re: biomaRt access for NCBIM37 build of mouse genome
      (Hans-Rudolf Hotz)
  30. Re: DESeq2 (Michael Love)
  31. Re: Error in getGEO command with GSEMatrix=TRUE (Sean Davis)


----------------------------------------------------------------------

Message: 1
Date: Tue, 12 Mar 2013 04:20:08 -0700 (PDT)
From: "Alessandra [guest]" <guest at bioconductor.org>
To: bioconductor at r-project.org, alessandra.breschi at crg.es
Subject: [BioC] edgeR: paired samples DGE and GLM
Message-ID: <20130312112008.0529C1434F2 at mamba.fhcrc.org>


Hi,

I am trying to use edgeR to compute differential gene expression.

I have a quite simple experimental design:
labExpId Patient sex    Treatment
sample.4  1   F            A
sample.3  2   M           A
sample.5  2   M           B
sample.7  3   F            A
sample.0  4   M           A
sample.8  3   F            B
sample.1  4   M           B
sample.2  1   F            B


I am comparing the effect of treatment across all patients, and it is fine when I apply exactTest. Then I wanted to include also the Patient in my model and I try using GLM, but I found several problems. So, I tried to apply GLM only including the Treatment just to troubleshoot. I follow the instructions on page 22 from the user's guide revised in December 2012.

My counts are stored in a matrix called m.

>colnames(m)
[1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" sample.8"
[7] "sample.0" "sample.1"

>design=model.matrix(~0+Treatment)
> design
          TreatmentA          TreatmentB
sample.4             1               0
sample.3             1               0
sample.5             0               1
sample.7             1               0
sample.0             1               0
sample.8             0               1
sample.1             0               1
sample.2             0               1

>M = DGEList(counts=na.omit(m))

>cpm.m <- cpm(M)
>M <- M[rowSums(cpm.m >1) >=1,]
>M <- calcNormFactors(M)

>M <- estimateGLMCommonDisp(M, design, verbose=T)
> names(M)
[1] "counts"            "samples"           "abundance"
[4] "logCPM"            "common.dispersion"

First thing I notice, I don't see pseudo.counts and other attributes I saw when calling estimateCommonDisp().

> M <- estimateGLMTrendedDisp(M, design)
> M <- estimateGLMTagwiseDisp(M, design)

> fit <- glmFit(M, design)
> lrt <- glmLRT(fit, contrast=c(-1,1))
> res = topTags(lrt, n=nrow(lrt))$table

> head(res)
                     logFC    logCPM       LR       PValue          FDR
ENSG00000244115  15.512665  2.266789 35.30878 2.813602e-09 3.482958e-05
ENSG00000256329 -17.760869  4.514903 32.89653 9.719681e-09 6.015996e-05
ENSG00000223638  11.777617 -1.467638 18.46673 1.728967e-05 7.134295e-02
ENSG00000134321  -5.328058  5.959204 16.76318 4.234703e-05 1.310535e-01
ENSG00000196861   7.776972 -1.722111 15.83143 6.924259e-05 1.494412e-01
ENSG00000229292  11.368590 -1.876601 15.74621 7.243290e-05 1.494412e-01

> m[rownames(head(res)),]
                sample.2 sample.3 sample.4 sample.5 sample.7 sample.8
ENSG00000244115         0         0         0         0         0      6412
ENSG00000256329         0         0         0     32070     0         0
ENSG00000223638         0         0         2         0         0        0
ENSG00000134321      1178       590   890     83130   754   1116
ENSG00000196861         0         3       363       0         0        53
ENSG00000229292         0         0         0         0         0         0

                sample.0 sample.1
ENSG00000244115         0      4415
ENSG00000256329         0         0
ENSG00000223638       434       550
ENSG00000134321       852      1092
ENSG00000196861       500        57
ENSG00000229292       344       406

I realized that it may be a problem of the ordering of the design matrix rows

> colnames(m)
[1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" "sample.8"
[7] "sample.0" "sample.1"
> rownames(design)
[1] "sample.4" "sample.3" "sample.5" "sample.7" "sample.0" "sample.8"
[7] "sample.1" "sample.2"

so I redo it forcing the row names of my design matrix to be the same order as the column names in the matrix count.
Now my design matrix looks like:

> design
          TreatmentA TreatmentB
sample.2             0               1
sample.3             1               0
sample.4             1               0
sample.5             0               1
sample.7             1               0
sample.8             0               1
sample.0             1               0
sample.1             0               1

I execute exactly the same commands as before.
Again I cannot see the pseudocounts.
> names(M)
[1] "counts"            "samples"           "abundance"
[4] "logCPM"            "common.dispersion"

> head(res)
                    logFC     logCPM       LR PValue FDR
ENSG00000074855 -32.17749  0.2759965 2121.104      0   0
ENSG00000076351 -32.17749 -0.4306713 1549.345      0   0
ENSG00000105552 -32.17749 -0.4864164 1502.658      0   0
ENSG00000106991 -32.17749  0.4622641 2144.947      0   0
ENSG00000123009 -32.17749 -0.2422835 1719.625      0   0
ENSG00000133216 -32.17749  0.2206127 2127.575      0   0

I get very different results still from the exactTest with no GLM.
I attribute this to the missing pseudocounts in the DGEList object?

> m[rownames(head(res)),]
                sample.2 sample.3 sample.4 sample.5 sample.7 sample.8
ENSG00000074855         0       948       902         0       676         0
ENSG00000076351         0       494       522         0       538         0
ENSG00000105552         0       490       454         0       486         0
ENSG00000106991         0      1094       798        0       698         0
ENSG00000123009         0       556       520         0       566         0
ENSG00000133216         0       808       762         0       730         0
                sample.0 sample.1
ENSG00000074855      1166        0
ENSG00000076351       698         0
ENSG00000105552       764         0
ENSG00000106991      1733        0
ENSG00000123009       980         0
ENSG00000133216      1304        0

I see that these genes have all zero counts for TreatmentB and this explains the low logFC, but I don't get this when I don't use GLM at all (exactTest).

So in summary, my questions are:

1) Does the ordering of rows in the design matrix have to be the same as the column order in the count matrix,
or the sample name is taken into account so the order should not matter?

2) Do you have any idea why I don't get pseudocounts after estimating the dispersion, and is it really this that causes such low logFC?

I am looking forward to hearing from you.

Many thanks in advance,
Ale


 -- output of sessionInfo():

R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] plyr_1.7.1      optparse_1.0.1  ggplot2_0.9.3.1 reshape2_1.2.1
[5] edgeR_3.0.8     limma_3.14.4

loaded via a namespace (and not attached):
 [1] MASS_7.3-17        RColorBrewer_1.0-5 colorspace_1.1-1   dichromat_1.2-4
 [5] digest_0.5.2       getopt_1.19.1      grid_2.15.0        gtable_0.1.2
 [9] labeling_0.1       munsell_0.3        proto_0.3-9.2      scales_0.2.3
[13] stringr_0.6        tools_2.15.0

--
Sent via the guest posting facility at bioconductor.org.



------------------------------

Message: 2
Date: Tue, 12 Mar 2013 09:57:52 -0400
From: Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
To: Lavinia Gordon <lavinia.gordon at mcri.edu.au>
Cc: Zilbauer Matthias <mz304 at medschl.cam.ac.uk>,        "dupan at gmal.com"
        <dupan at gmal.com>,       "bioconductor at r-project.org"
        <bioconductor at r-project.org>
Subject: Re: [BioC] methyAnalysis - changes to GenoSet
Message-ID:
        <CAC2h7usdEoP-zsi_Oxp3ASNEQkg5msOEuYs62hH96W9qHQUpFA at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Lavina,

I don't really have time to really look into all of this, but note the
function mapToGenome which associates a MethylSet with genomic
coordinates. So you can do

gmSet = mapToGenome(mSet)

and now you have

granges(gmSet) # locations
getMeth(gmSet) # methylation channel

etc.  Note that the transformation is non-invertible because a few
CpGs have no associated genomic locations and they are dropped (well,
you can use drop=FALSE).

Kasper

On Mon, Mar 11, 2013 at 5:13 PM, Lavinia Gordon
<lavinia.gordon at mcri.edu.au> wrote:
> Argh!, thanks Martin, sorry about that.
>
> So in that case I think I've worked out a (rather ugly) way to force minfi data into a MethyGenoSet object for use within the package methyAnalysis:
>
> Library(minfi)
> # mset is minfi object, e.g. used in dmpFinder code
> myexprs <- getM(mSet)
> mymeth <- getMeth(mSet)
> myunmeth <- getUnmeth(mSet)
>
> # using Tim Triches helpful code, thanks!
> library(IlluminaHumanMethylation450kprobe)
> data(IlluminaHumanMethylation450kprobe)
> library(GenomicRanges)
> chs = levels(IlluminaHumanMethylation450kprobe$chr)
> names(chs) = paste('chr',chs,sep='')
> CpGs.450k = with(IlluminaHumanMethylation450kprobe,
> GRanges(paste('chr',chr,sep=''),
> IRanges(start=site, width=2, names=Probe_ID),
> strand=strand))
>
> CpGs.450k.df <- as.data.frame(CpGs.450k)
>
> mylocData <- RangedData(ranges=IRanges(start=CpGs.450k.df$start, end=CpGs.450k.df$end, names=row.names(CpGs.450k.df)), space=CpGs.450k.df$seqnames)
>
> mymethy.obj <- new("MethyGenoSet", locData=mylocData, assayData=assayDataNew(exprs=new("matrix"), methylated=new("matrix"), unmethylated=new("matrix"), detection=new("matrix")))
>
> # make sure the CpG ids in mylocData agree with what is in CpGs.450k.df
>
> exprs(mymethy.obj) <- myexprs
> methylated(mymethy.obj) <- mymeth
> unmethylated(mymethy.obj) <- myunmeth
> # had to add this afterwards
>
> methyGenoSet.sm <- smoothMethyData(mymethy.obj, winSize = 250)
>
> I haven't run through all of the MethyAnalysis functions so the MethyGenoSet object may need more information added to it, in other slots, for it to work properly.
>
>
> Lavinia Gordon
> Senior Research Officer
> Quantitative Sciences Core, Bioinformatics
>
> Murdoch Childrens Research Institute
> The Royal Children's Hospital
> Flemington Road Parkville Victoria 3052 Australia
> T 03 8341 6221
> www.mcri.edu.au
>
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Saturday, 9 March 2013 1:49 AM
> To: Lavinia Gordon
> Cc: bioconductor at r-project.org; dupan at gmal.com
> Subject: Re: [BioC] methyAnalysis - changes to GenoSet
>
> On 03/07/2013 08:23 PM, Lavinia Gordon wrote:
>> Dear Bioc users,
>> I have just tried the example code supplied with the package methyAnalysis and am getting an error:
>>
>>> ###################################################
>>> ### code chunk number 3: Slide-window smoothing of the DNA
>>> methylation data ###################################################
>>> methyGenoSet.sm <- smoothMethyData(exampleMethyGenoSet, winSize =
>>> 250)
>> Smoothing Chromosome 21 ...
>>
>> Note: Method with signature ?GenoSet#character#ANY#ANY? chosen for
>> function ?[?, target signature ?MethyGenoSet#character#character#missing?.
>> "MethyGenoSet#ANY#ANY#ANY" would also be valid Warning message:
>> The ranges method on a GenoSet is depricated. Please use space(locData(x)) or seqnames(locData(x)) as appropriate for RangedData or GRanges.
>>> methyGenoset.sm
>> Error: object 'methyGenoset.sm' not found
>
> typo -- methyGenoSet.sm
>
> Martin
>
>>> sessionInfo()
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8
>>   [2] LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8
>>   [4] LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8
>>   [6] LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C
>>   [8] LC_NAME=C
>>   [9] LC_ADDRESS=C
>> [10] LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8
>> [12] LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid      stats     graphics  grDevices
>> [5] utils     datasets  methods   base
>>
>> other attached packages:
>> [1] methyAnalysis_1.0.0  org.Hs.eg.db_2.8.0
>> [3] RSQLite_0.11.2       DBI_0.2-5
>> [5] AnnotationDbi_1.20.5 Biobase_2.18.0
>> [7] IRanges_1.16.6       BiocGenerics_0.4.0
>>
>> loaded via a namespace (and not attached):
>> [1] affy_1.36.1            affyio_1.26.0
>>   [3] annotate_1.36.0        BiocInstaller_1.8.3
>>   [5] biomaRt_2.14.0         Biostrings_2.26.3
>>   [7] biovizBase_1.6.2       bitops_1.0-5
>>   [9] BSgenome_1.26.1        cluster_1.14.3
>> [11] colorspace_1.2-1       dichromat_2.0-0
>> [13] genefilter_1.40.0      GenomicFeatures_1.10.2
>> [15] GenomicRanges_1.10.7   genoset_1.10.1
>> [17] Gviz_1.2.1             Hmisc_3.10-1
>> [19] KernSmooth_2.23-9      labeling_0.1
>> [21] lattice_0.20-13        lumi_2.10.0
>> [23] MASS_7.3-23            Matrix_1.0-11
>> [25] methylumi_2.4.0        mgcv_1.7-22
>> [27] munsell_0.4            nleqslv_2.0
>> [29] nlme_3.1-108           parallel_2.15.2
>> [31] plyr_1.8               preprocessCore_1.20.0
>> [33] RColorBrewer_1.0-5     RCurl_1.95-4.1
>> [35] Rsamtools_1.10.2       rtracklayer_1.18.2
>> [37] scales_0.2.3           splines_2.15.2
>> [39] stats4_2.15.2          stringr_0.6.2
>> [41] survival_2.37-4        tools_2.15.2
>> [43] XML_3.95-0.2           xtable_1.7-1
>> [45] zlibbioc_1.4.0
>>>
>>
>> With regards,
>>
>> Lavinia Gordon
>> Senior Research Officer
>> Quantitative Sciences Core, Bioinformatics
>>
>> Murdoch Childrens Research Institute
>> The Royal Children's Hospital
>> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221
>> www.mcri.edu.au<http://www.mcri.edu.au/>
>>
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> ______________________________________________________________________
>>       [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
>
> If you have any question, please contact MCRI IT Helpdesk for further assistance.
> ______________________________________________________________________
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 3
Date: Tue, 12 Mar 2013 10:05:02 -0400
From: Richard Friedman <friedman at cancercenter.columbia.edu>
To: Bioconductor mailing list <bioconductor at r-project.org>
Subject: [BioC] QuasiSeq vs DSS
Message-ID:
        <E41FFAA1-4082-41D3-88B1-034F5586683E at cancercenter.columbia.edu>
Content-Type: text/plain; charset=us-ascii

Dear List.

        The papers on DSS (included in Bioconductor):

Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves
differential expression detection in RNA-seq data. Biostatistics. 2013
Apr;14(2):232-43.

and  QuasiSeq (included in CRAN):

 Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression
in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.
Stat Appl Genet Mol Biol. 2012

both give evidence of superior performance to edgeR (if I understand them correctly).

Have the two methods been compared?
Can the 2 methods been combined (with DSS estimating the dispersion used in
the quasi-negative bionomial disribution used in QuasiSeq)?

I would appreciate any insight with respect to what is the overall best
method for differential expression in RNASeq available at present.

Thanks and best wishes,
Rich


Richard A. Friedman, PhD
Associate Research Scientist,
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer,
Department of Biomedical Informatics (DBMI)
Educational Coordinator,
Center for Computational Biology and Bioinformatics (C2B2)/
National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
Columbia Initiative in Systems Biology
Room 824
Irving Cancer Research Center
Columbia University
1130 St. Nicholas Ave
New York, NY 10032
(212)851-4765 (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/

Fritz Lang. Didn't he do "Star Trek".
-Rose Friedman, age 16




------------------------------

Message: 4
Date: Tue, 12 Mar 2013 08:32:19 -0700
From: "Tim Triche, Jr." <tim.triche at gmail.com>
To: Kasper Daniel Hansen <kasperdanielhansen at gmail.com>
Cc: Zilbauer Matthias <mz304 at medschl.cam.ac.uk>,        Lavinia Gordon
        <lavinia.gordon at mcri.edu.au>,   "bioconductor at r-project.org"
        <bioconductor at r-project.org>,   "dupan at gmal.com" <dupan at gmal.com>
Subject: Re: [BioC] methyAnalysis - changes to GenoSet
Message-ID: <FA98BC57-3830-4B9C-8C77-DF5B0CC9A177 at gmail.com>
Content-Type: text/plain;       charset=utf-8

Right, my original intent was simply to coerce the SummarizedExperiment-based containers from methylumi and minfi into a genoset for Pan's package.

That reminds me, I need to update the documentation for 450kProbe to note that it is essentially obsolete, since both methylumi and minfi can emit SummarizedExperiments. The only mop-up now is for me to send in a few minfi patches for processing. The innards of minfi are tidier than... Others.

--t

On Mar 12, 2013, at 6:57 AM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:

> Lavina,
>
> I don't really have time to really look into all of this, but note the
> function mapToGenome which associates a MethylSet with genomic
> coordinates. So you can do
>
> gmSet = mapToGenome(mSet)
>
> and now you have
>
> granges(gmSet) # locations
> getMeth(gmSet) # methylation channel
>
> etc.  Note that the transformation is non-invertible because a few
> CpGs have no associated genomic locations and they are dropped (well,
> you can use drop=FALSE).
>
> Kasper
>
> On Mon, Mar 11, 2013 at 5:13 PM, Lavinia Gordon
> <lavinia.gordon at mcri.edu.au> wrote:
>> Argh!, thanks Martin, sorry about that.
>>
>> So in that case I think I've worked out a (rather ugly) way to force minfi data into a MethyGenoSet object for use within the package methyAnalysis:
>>
>> Library(minfi)
>> # mset is minfi object, e.g. used in dmpFinder code
>> myexprs <- getM(mSet)
>> mymeth <- getMeth(mSet)
>> myunmeth <- getUnmeth(mSet)
>>
>> # using Tim Triches helpful code, thanks!
>> library(IlluminaHumanMethylation450kprobe)
>> data(IlluminaHumanMethylation450kprobe)
>> library(GenomicRanges)
>> chs = levels(IlluminaHumanMethylation450kprobe$chr)
>> names(chs) = paste('chr',chs,sep='')
>> CpGs.450k = with(IlluminaHumanMethylation450kprobe,
>> GRanges(paste('chr',chr,sep=''),
>> IRanges(start=site, width=2, names=Probe_ID),
>> strand=strand))
>>
>> CpGs.450k.df <- as.data.frame(CpGs.450k)
>>
>> mylocData <- RangedData(ranges=IRanges(start=CpGs.450k.df$start, end=CpGs.450k.df$end, names=row.names(CpGs.450k.df)), space=CpGs.450k.df$seqnames)
>>
>> mymethy.obj <- new("MethyGenoSet", locData=mylocData, assayData=assayDataNew(exprs=new("matrix"), methylated=new("matrix"), unmethylated=new("matrix"), detection=new("matrix")))
>>
>> # make sure the CpG ids in mylocData agree with what is in CpGs.450k.df
>>
>> exprs(mymethy.obj) <- myexprs
>> methylated(mymethy.obj) <- mymeth
>> unmethylated(mymethy.obj) <- myunmeth
>> # had to add this afterwards
>>
>> methyGenoSet.sm <- smoothMethyData(mymethy.obj, winSize = 250)
>>
>> I haven't run through all of the MethyAnalysis functions so the MethyGenoSet object may need more information added to it, in other slots, for it to work properly.
>>
>>
>> Lavinia Gordon
>> Senior Research Officer
>> Quantitative Sciences Core, Bioinformatics
>>
>> Murdoch Childrens Research Institute
>> The Royal Children's Hospital
>> Flemington Road Parkville Victoria 3052 Australia
>> T 03 8341 6221
>> www.mcri.edu.au
>>
>>
>> -----Original Message-----
>> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
>> Sent: Saturday, 9 March 2013 1:49 AM
>> To: Lavinia Gordon
>> Cc: bioconductor at r-project.org; dupan at gmal.com
>> Subject: Re: [BioC] methyAnalysis - changes to GenoSet
>>
>> On 03/07/2013 08:23 PM, Lavinia Gordon wrote:
>>> Dear Bioc users,
>>> I have just tried the example code supplied with the package methyAnalysis and am getting an error:
>>>
>>>> ###################################################
>>>> ### code chunk number 3: Slide-window smoothing of the DNA
>>>> methylation data ###################################################
>>>> methyGenoSet.sm <- smoothMethyData(exampleMethyGenoSet, winSize =
>>>> 250)
>>> Smoothing Chromosome 21 ...
>>>
>>> Note: Method with signature ?GenoSet#character#ANY#ANY? chosen for
>>> function ?[?, target signature ?MethyGenoSet#character#character#missing?.
>>> "MethyGenoSet#ANY#ANY#ANY" would also be valid Warning message:
>>> The ranges method on a GenoSet is depricated. Please use space(locData(x)) or seqnames(locData(x)) as appropriate for RangedData or GRanges.
>>>> methyGenoset.sm
>>> Error: object 'methyGenoset.sm' not found
>>
>> typo -- methyGenoSet.sm
>>
>> Martin
>>
>>>> sessionInfo()
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8
>>>  [2] LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8
>>>  [4] LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8
>>>  [6] LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=C
>>>  [8] LC_NAME=C
>>>  [9] LC_ADDRESS=C
>>> [10] LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8
>>> [12] LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] grid      stats     graphics  grDevices
>>> [5] utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] methyAnalysis_1.0.0  org.Hs.eg.db_2.8.0
>>> [3] RSQLite_0.11.2       DBI_0.2-5
>>> [5] AnnotationDbi_1.20.5 Biobase_2.18.0
>>> [7] IRanges_1.16.6       BiocGenerics_0.4.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affy_1.36.1            affyio_1.26.0
>>>  [3] annotate_1.36.0        BiocInstaller_1.8.3
>>>  [5] biomaRt_2.14.0         Biostrings_2.26.3
>>>  [7] biovizBase_1.6.2       bitops_1.0-5
>>>  [9] BSgenome_1.26.1        cluster_1.14.3
>>> [11] colorspace_1.2-1       dichromat_2.0-0
>>> [13] genefilter_1.40.0      GenomicFeatures_1.10.2
>>> [15] GenomicRanges_1.10.7   genoset_1.10.1
>>> [17] Gviz_1.2.1             Hmisc_3.10-1
>>> [19] KernSmooth_2.23-9      labeling_0.1
>>> [21] lattice_0.20-13        lumi_2.10.0
>>> [23] MASS_7.3-23            Matrix_1.0-11
>>> [25] methylumi_2.4.0        mgcv_1.7-22
>>> [27] munsell_0.4            nleqslv_2.0
>>> [29] nlme_3.1-108           parallel_2.15.2
>>> [31] plyr_1.8               preprocessCore_1.20.0
>>> [33] RColorBrewer_1.0-5     RCurl_1.95-4.1
>>> [35] Rsamtools_1.10.2       rtracklayer_1.18.2
>>> [37] scales_0.2.3           splines_2.15.2
>>> [39] stats4_2.15.2          stringr_0.6.2
>>> [41] survival_2.37-4        tools_2.15.2
>>> [43] XML_3.95-0.2           xtable_1.7-1
>>> [45] zlibbioc_1.4.0
>>>
>>> With regards,
>>>
>>> Lavinia Gordon
>>> Senior Research Officer
>>> Quantitative Sciences Core, Bioinformatics
>>>
>>> Murdoch Childrens Research Institute
>>> The Royal Children's Hospital
>>> Flemington Road Parkville Victoria 3052 Australia T 03 8341 6221
>>> www.mcri.edu.au<http://www.mcri.edu.au/>
>>>
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the Symantec Email Security.cloud service.
>>> For more information please visit http://www.symanteccloud.com
>>> ______________________________________________________________________
>>>      [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>>
>> If you have any question, please contact MCRI IT Helpdesk for further assistance.
>> ______________________________________________________________________
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 5
Date: Tue, 12 Mar 2013 09:30:32 -0700 (PDT)
From: "Nima [guest]" <guest at bioconductor.org>
To: bioconductor at r-project.org, ahmadian.n at gmail.com
Subject: [BioC] displaying the image with EBImage
Message-ID: <20130312163032.69E5714352C at mamba.fhcrc.org>


Dear All

I have problem with displaying my images in R, all functions are working except display
my OS is windows 7 but Mozilla Firefox gives this message:
 Oops the page you are looking for could not be found

would you please help me in this regard??

Regards
Nima

 -- output of sessionInfo():

R version 2.15.3 (2013-03-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] EBImage_4.0.0

loaded via a namespace (and not attached):
[1] abind_1.4-0 jpeg_0.1-2  png_0.1-4   tiff_0.1-4

--
Sent via the guest posting facility at bioconductor.org.



------------------------------

Message: 6
Date: Tue, 12 Mar 2013 16:35:10 +0000
From: Kemal Akat <kakat at mail.rockefeller.edu>
To: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: [BioC] Help on Illumina HumanHT12 v4 chromosome filtering
Message-ID: <25FA2F2F-5525-4206-9515-C45CF82AE54F at rockefeller.edu>
Content-Type: text/plain; charset="us-ascii"

Hi all,

I am running into problems when I want to remove probes targeting chromosome Y.

I can reproduce this behavior with the data from the beadarrayExampleData package:


library(beadarray)
library(illuminaHumanv4.db)
library(beadarrayExampleData)
data(exampleSummaryData)
## filter for probe quality
ids = as.character(featureNames(exampleSummaryData))
qual = unlist(mget(ids, illuminaHumanv4PROBEQUALITY, ifnotfound = NA))
rem = qual == "No match" | qual == "Bad" | is.na(qual)
exampleSummaryData_filt = exampleSummaryData[!rem]
## get chromosome location for remaining probes
ids = as.character(featureNames(exampleSummaryData_filt))
chr = unlist(mget(ids, illuminaHumanv4CHR, ifnotfound = NA))
## filter out probes targeting the Y chromosome
rem_chr= chr == "Y"
exampleSummaryData_filt = exampleSummaryData_filt[!rem_chr]
Error in obj[i, , ..., drop = drop] :
  (subscript) logical subscript too long
Calls: [ ... [ -> callNextMethod -> .nextMethod -> lapply -> FUN

The eSet (Illumina) object starts with

R> dim(exampleSummaryData)
Features  Samples Channels
   49576       12        2

probes. After quality filtering

R> dim(exampleSummaryData_filt) ## after filtering for probe quality
Features  Samples Channels
   30084       12        2
R>

Now, for the second filtering (ids = as.character(featureNames(exampleSummaryData_filt) etc.):

R> length(ids)
[1] 30084
R> length(chr)
[1] 30109

Where are the extra probes coming from? I tried to match only the ones in exampleSummaryData_filt),

idx = match(ids, names(chr))
chr = chr[idx]

but this led to other problems.

Maybe I am missing something obvious?
Thank you for any hints or help!

Kemal

R> sessionInfo()
R Under development (unstable) (2013-02-06 r61857)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] beadarrayExampleData_1.0.5 illuminaHumanv4.db_1.16.0  org.Hs.eg.db_2.9.0         RSQLite_0.11.2
 [5] DBI_0.2-5                  AnnotationDbi_1.21.13      beadarray_2.9.2            ggplot2_0.9.3
 [9] Biobase_2.19.3             BiocGenerics_0.5.6         setwidth_1.0-1

loaded via a namespace (and not attached):
 [1] AnnotationForge_1.1.10 BeadDataPackR_1.11.0   colorspace_1.2-0       dichromat_1.2-4        digest_0.6.0
 [6] grid_3.0.0             gtable_0.1.2           IRanges_1.17.36        labeling_0.1           limma_3.15.15
[11] MASS_7.3-23            munsell_0.4            plyr_1.8               proto_0.3-10           RColorBrewer_1.0-5
[16] reshape2_1.2.2         scales_0.2.3           stats4_3.0.0           stringr_0.6.2          tools_3.0.0
R>



------------------------------

Message: 7
Date: Tue, 12 Mar 2013 09:59:41 -0700
From: "Ryan C. Thompson" <rct at thompsonclan.org>
To: Richard Friedman <friedman at cancercenter.columbia.edu>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] QuasiSeq vs DSS
Message-ID: <513F5EFD.3060606 at thompsonclan.org>
Content-Type: text/plain

Dear Rich,

 From what I can tell, it should be possible. The development version of
DESeq2 implements the DSS "squeezing" method combined with edgeR's
Cox-Reid dispersion estimation. You could use DESeq2 to estimate
dispersions, and then copy those dispersion values into an edgeR DGEList
object. Then you can use edgeR::glmQLFTest, which implements
(approximately) the QuasiSeq method.

I have not had time yet to investigate putting these packages together
in this way, but it is something I plan to look at. I'm certain that the
combination is technically possible, and I'm reasonably sure that the
result would be statistically meaningful.

-Ryan Thompson

On Mar 12, 2013 7:06 AM, "Richard Friedman"
<friedman at cancercenter.columbia.edu
<mailto:friedman at cancercenter.columbia.edu>> wrote:

    Dear List.

             The papers on DSS (included in Bioconductor):

    Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves
    differential expression detection in RNA-seq data. Biostatistics. 2013
    Apr;14(2):232-43.

    and  QuasiSeq (included in CRAN):

      Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting
    differential expression
    in RNA-sequence data using quasi-likelihood with shrunken dispersion
    estimates.
    Stat Appl Genet Mol Biol. 2012

    both give evidence of superior performance to edgeR (if I understand
    them correctly).

    Have the two methods been compared?
    Can the 2 methods been combined (with DSS estimating the dispersion
    used in
    the quasi-negative bionomial disribution used in QuasiSeq)?

    I would appreciate any insight with respect to what is the overall best
    method for differential expression in RNASeq available at present.

    Thanks and best wishes,
    Rich


    Richard A. Friedman, PhD
    Associate Research Scientist,
    Biomedical Informatics Shared Resource
    Herbert Irving Comprehensive Cancer Center (HICCC)
    Lecturer,
    Department of Biomedical Informatics (DBMI)
    Educational Coordinator,
    Center for Computational Biology and Bioinformatics (C2B2)/
    National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
    Columbia Initiative in Systems Biology
    Room 824
    Irving Cancer Research Center
    Columbia University
    1130 St. Nicholas Ave
    New York, NY 10032
    (212)851-4765 (voice)
    friedman at cancercenter.columbia.edu
    <mailto:friedman at cancercenter.columbia.edu>
    http://cancercenter.columbia.edu/~friedman/
    <http://cancercenter.columbia.edu/%7Efriedman/>

    Fritz Lang. Didn't he do "Star Trek".
    -Rose Friedman, age 16


    _______________________________________________
    Bioconductor mailing list
    Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
    https://stat.ethz.ch/mailman/listinfo/bioconductor
    Search the archives:
    http://news.gmane.org/gmane.science.biology.informatics.conductor


        [[alternative HTML version deleted]]



------------------------------

Message: 8
Date: Tue, 12 Mar 2013 13:11:25 -0400
From: Richard Friedman <friedman at cancercenter.columbia.edu>
To: "Ryan C. Thompson" <rct at thompsonclan.org>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] QuasiSeq vs DSS
Message-ID:
        <961B56D4-1114-47E9-B6E0-935BA09E9EDF at cancercenter.columbia.edu>
Content-Type: text/plain; charset=us-ascii

Dear Ryan,

        Thank you for your response.
3 questions:
1. If I had just a simple pairwise comparison is it known DSS or QuasiSeq better?
2. I was unaware that an approximate implementation of QuasiSeq was available in
edgeR. If so, is it known hor  it compare to the ordinairy EdgeR on the one hand and the
full QuasiSeq on the other.
3. And I guess that the third question is for Gordon - Is using DSS and QuasiSeq (or EdgeR) together
desireable and if so, are there plans to incorporate DSS into QuasiSeq (EdgeR).

My note was planning ahead. I will still be in the microarray world for a more few weeks
before I return to learning RNASeq. I wanted to know what the best practice is.
If you (or anybody out there) develops a script to meld the two methods, I am sure that
it would be interesting to the list.

Best wishes,
Rich





On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote:

> Dear Rich,
> From what I can tell, it should be possible. The development version of DESeq2 implements the DSS "squeezing" method combined with edgeR's Cox-Reid dispersion estimation. You could use DESeq2 to estimate dispersions, and then copy those dispersion values into an edgeR DGEList object. Then you can use edgeR::glmQLFTest,       which implements (approximately) the QuasiSeq method.
> I have not had time yet to investigate putting these packages together in this way, but it is something I plan to look at. I'm certain that the combination is technically possible, and I'm reasonably sure that the result would be statistically meaningful.
> -Ryan Thompson
> On Mar 12, 2013 7:06 AM, "Richard Friedman" <friedman at cancercenter.columbia.edu> wrote:
> Dear List.
>
>         The papers on DSS (included in Bioconductor):
>
> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves
> differential expression detection in RNA-seq data. Biostatistics. 2013
> Apr;14(2):232-43.
>
> and  QuasiSeq (included in CRAN):
>
>  Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression
> in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.
> Stat Appl Genet Mol Biol. 2012
>
> both give evidence of superior performance to edgeR (if I understand them correctly).
>
> Have the two methods been compared?
> Can the 2 methods been combined (with DSS estimating the dispersion used in
> the quasi-negative bionomial disribution used in QuasiSeq)?
>
> I would appreciate any insight with respect to what is the overall best
> method for differential expression in RNASeq available at present.
>
> Thanks and best wishes,
> Rich
>
>
> Richard A. Friedman, PhD
> Associate Research Scientist,
> Biomedical Informatics Shared Resource
> Herbert Irving Comprehensive Cancer Center (HICCC)
> Lecturer,
> Department of Biomedical Informatics (DBMI)
> Educational Coordinator,
> Center for Computational Biology and Bioinformatics (C2B2)/
> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
> Columbia Initiative in Systems Biology
> Room 824
> Irving Cancer Research Center
> Columbia University
> 1130 St. Nicholas Ave
> New York, NY 10032
> (212)851-4765 (voice)
> friedman at cancercenter.columbia.edu
> http://cancercenter.columbia.edu/~friedman/
>
> Fritz Lang. Didn't he do "Star Trek".
> -Rose Friedman, age 16
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 9
Date: Tue, 12 Mar 2013 12:15:25 -0700
From: "Ryan C. Thompson" <rct at thompsonclan.org>
To: Richard Friedman <friedman at cancercenter.columbia.edu>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] QuasiSeq vs DSS
Message-ID: <513F7ECD.8090402 at thompsonclan.org>
Content-Type: text/plain; charset=UTF-8; format=flowed

Here is a quote from Gordon Smyth a few months ago in response to a
question of mine, which I think neatly summarizes the relationship
between QuasiSeq and edgeR::glmQLFTest:

> glmQLFTest() and QuasiSeq were developed independently with the same
> idea, hence we got together to write the paper. If you use common
> dispersion to fit the linear model, then glmQLFTest() is like
> NegBinQLShrink. If you use trended dispersion to fit the linear model
> (recommended), then glmQLFTest() is like NegBinQLSpline. They are not
> quite identical however. glmQLTest() leverages the functionality of
> the edgeR and limma packages, whereas QuasiSeq has used its own
> implementations of everything. The latter are described in the paper.
(Gordon, I hope you don't mind me posting this publically. Hopefully it
saves you the trouble of rewriting it.)

-Ryan



On Tue 12 Mar 2013 10:11:25 AM PDT, Richard Friedman wrote:
>
> Dear Ryan,
>
> Thank you for your response.
> 3 questions:
> 1. If I had just a simple pairwise comparison is it known DSS or
> QuasiSeq better?
> 2. I was unaware that an approximate implementation of QuasiSeq was
> available in
> edgeR. If so, is it known hor it compare to the ordinairy EdgeR on the
> one hand and the
> full QuasiSeq on the other.
> 3. And I guess that the third question is for Gordon - Is using DSS
> and QuasiSeq (or EdgeR) together
> desireable and if so, are there plans to incorporate DSS into QuasiSeq
> (EdgeR).
>
> My note was planning ahead. I will still be in the microarray world
> for a more few weeks
> before I return to learning RNASeq. I wanted to know what the best
> practice is.
> If you (or anybody out there) develops a script to meld the two
> methods, I am sure that
> it would be interesting to the list.
>
> Best wishes,
> Rich
>
>
>
>
>
> On Mar 12, 2013, at 12:59 PM, Ryan C. Thompson wrote:
>
>>
>> Dear Rich,
>> From what I can tell, it should be possible. The development version
>> of DESeq2 implements the DSS "squeezing" method combined with edgeR's
>> Cox-Reid dispersion estimation. You could use DESeq2 to estimate
>> dispersions, and then copy those dispersion values into an edgeR
>> DGEList object. Then you can use edgeR::glmQLFTest, which implements
>> (approximately) the QuasiSeq method.
>> I have not had time yet to investigate putting these packages
>> together in this way, but it is something I plan to look at. I'm
>> certain that the combination is technically possible, and I'm
>> reasonably sure that the result would be statistically meaningful.
>> -Ryan Thompson
>> On Mar 12, 2013 7:06 AM, "Richard Friedman"
>> <friedman at cancercenter.columbia.edu> wrote:
>> Dear List.
>>
>> The papers on DSS (included in Bioconductor):
>>
>> Wu H, Wang C, Wu Z. A new shrinkage estimator for dispersion improves
>> differential expression detection in RNA-seq data. Biostatistics. 2013
>> Apr;14(2):232-43.
>>
>> and QuasiSeq (included in CRAN):
>>
>> Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential
>> expression
>> in RNA-sequence data using quasi-likelihood with shrunken dispersion
>> estimates.
>> Stat Appl Genet Mol Biol. 2012
>>
>> both give evidence of superior performance to edgeR (if I understand
>> them correctly).
>>
>> Have the two methods been compared?
>> Can the 2 methods been combined (with DSS estimating the dispersion
>> used in
>> the quasi-negative bionomial disribution used in QuasiSeq)?
>>
>> I would appreciate any insight with respect to what is the overall best
>> method for differential expression in RNASeq available at present.
>>
>> Thanks and best wishes,
>> Rich
>>
>>
>> Richard A. Friedman, PhD
>> Associate Research Scientist,
>> Biomedical Informatics Shared Resource
>> Herbert Irving Comprehensive Cancer Center (HICCC)
>> Lecturer,
>> Department of Biomedical Informatics (DBMI)
>> Educational Coordinator,
>> Center for Computational Biology and Bioinformatics (C2B2)/
>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
>> Columbia Initiative in Systems Biology
>> Room 824
>> Irving Cancer Research Center
>> Columbia University
>> 1130 St. Nicholas Ave
>> New York, NY 10032
>> (212)851-4765 (voice)
>> friedman at cancercenter.columbia.edu
>> http://cancercenter.columbia.edu/~friedman/
>>
>> Fritz Lang. Didn't he do "Star Trek".
>> -Rose Friedman, age 16
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



------------------------------

Message: 10
Date: Tue, 12 Mar 2013 12:26:31 -0700
From: "Ryan C. Thompson" <rct at thompsonclan.org>
To: "Alessandra [guest]" <guest at bioconductor.org>
Cc: bioconductor at r-project.org, alessandra.breschi at crg.es
Subject: Re: [BioC] edgeR: paired samples DGE and GLM
Message-ID: <513F8167.8090900 at thompsonclan.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,

See this previous discussion for a discussion of the pseudocounts:
https://stat.ethz.ch/pipermail/bioconductor/2013-March/051182.html

Only the non-GLM method makes use of pseudocounts, so it's not important
that they are missing from the GLM analysis. I believe the design rows
and count columns *do* need to match up in the same order. To do a
paired test, you can just use model.matrix(~0+Treatment+Patient) and
proceed as you already have. The GLM method does give non-identical
results, and in general it seems that people are assuming that the GLM
method is better overall, though as always, proof is scarce when
sequencing is so expensive.

Hope this answers your questions.

-Ryan Thompson


On 03/12/2013 04:20 AM, Alessandra [guest] wrote:
> Hi,
>
> I am trying to use edgeR to compute differential gene expression.
>
> I have a quite simple experimental design:
> labExpId Patient sex    Treatment
> sample.4  1   F            A
> sample.3  2   M           A
> sample.5  2   M           B
> sample.7  3   F            A
> sample.0  4   M           A
> sample.8  3   F            B
> sample.1  4   M           B
> sample.2  1   F            B
>
>
> I am comparing the effect of treatment across all patients, and it is fine when I apply exactTest. Then I wanted to include also the Patient in my model and I try using GLM, but I found several problems. So, I tried to apply GLM only including the Treatment just to troubleshoot. I follow the instructions on page 22 from the user's guide revised in December 2012.
>
> My counts are stored in a matrix called m.
>
>> colnames(m)
> [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" sample.8"
> [7] "sample.0" "sample.1"
>
>> design=model.matrix(~0+Treatment)
>> design
>            TreatmentA          TreatmentB
> sample.4             1               0
> sample.3             1               0
> sample.5             0               1
> sample.7             1               0
> sample.0             1               0
> sample.8             0               1
> sample.1             0               1
> sample.2             0               1
>
>> M = DGEList(counts=na.omit(m))
>> cpm.m <- cpm(M)
>> M <- M[rowSums(cpm.m >1) >=1,]
>> M <- calcNormFactors(M)
>> M <- estimateGLMCommonDisp(M, design, verbose=T)
>> names(M)
> [1] "counts"            "samples"           "abundance"
> [4] "logCPM"            "common.dispersion"
>
> First thing I notice, I don't see pseudo.counts and other attributes I saw when calling estimateCommonDisp().
>
>> M <- estimateGLMTrendedDisp(M, design)
>> M <- estimateGLMTagwiseDisp(M, design)
>> fit <- glmFit(M, design)
>> lrt <- glmLRT(fit, contrast=c(-1,1))
>> res = topTags(lrt, n=nrow(lrt))$table
>> head(res)
>                       logFC    logCPM       LR       PValue          FDR
> ENSG00000244115  15.512665  2.266789 35.30878 2.813602e-09 3.482958e-05
> ENSG00000256329 -17.760869  4.514903 32.89653 9.719681e-09 6.015996e-05
> ENSG00000223638  11.777617 -1.467638 18.46673 1.728967e-05 7.134295e-02
> ENSG00000134321  -5.328058  5.959204 16.76318 4.234703e-05 1.310535e-01
> ENSG00000196861   7.776972 -1.722111 15.83143 6.924259e-05 1.494412e-01
> ENSG00000229292  11.368590 -1.876601 15.74621 7.243290e-05 1.494412e-01
>
>> m[rownames(head(res)),]
>                  sample.2 sample.3 sample.4 sample.5 sample.7 sample.8
> ENSG00000244115         0         0         0         0         0      6412
> ENSG00000256329         0         0         0     32070     0         0
> ENSG00000223638         0         0         2         0         0        0
> ENSG00000134321      1178       590   890     83130   754   1116
> ENSG00000196861         0         3       363       0         0        53
> ENSG00000229292         0         0         0         0         0         0
>
>                  sample.0 sample.1
> ENSG00000244115         0      4415
> ENSG00000256329         0         0
> ENSG00000223638       434       550
> ENSG00000134321       852      1092
> ENSG00000196861       500        57
> ENSG00000229292       344       406
>
> I realized that it may be a problem of the ordering of the design matrix rows
>
>> colnames(m)
> [1] "sample.2" "sample.3" "sample.4" "sample.5" "sample.7" "sample.8"
> [7] "sample.0" "sample.1"
>> rownames(design)
> [1] "sample.4" "sample.3" "sample.5" "sample.7" "sample.0" "sample.8"
> [7] "sample.1" "sample.2"
>
> so I redo it forcing the row names of my design matrix to be the same order as the column names in the matrix count.
> Now my design matrix looks like:
>
>> design
>            TreatmentA TreatmentB
> sample.2             0               1
> sample.3             1               0
> sample.4             1               0
> sample.5             0               1
> sample.7             1               0
> sample.8             0               1
> sample.0             1               0
> sample.1             0               1
>
> I execute exactly the same commands as before.
> Again I cannot see the pseudocounts.
>> names(M)
> [1] "counts"            "samples"           "abundance"
> [4] "logCPM"            "common.dispersion"
>
>> head(res)
>                      logFC     logCPM       LR PValue FDR
> ENSG00000074855 -32.17749  0.2759965 2121.104      0   0
> ENSG00000076351 -32.17749 -0.4306713 1549.345      0   0
> ENSG00000105552 -32.17749 -0.4864164 1502.658      0   0
> ENSG00000106991 -32.17749  0.4622641 2144.947      0   0
> ENSG00000123009 -32.17749 -0.2422835 1719.625      0   0
> ENSG00000133216 -32.17749  0.2206127 2127.575      0   0
>
> I get very different results still from the exactTest with no GLM.
> I attribute this to the missing pseudocounts in the DGEList object?
>
>> m[rownames(head(res)),]
>                  sample.2 sample.3 sample.4 sample.5 sample.7 sample.8
> ENSG00000074855         0       948       902         0       676         0
> ENSG00000076351         0       494       522         0       538         0
> ENSG00000105552         0       490       454         0       486         0
> ENSG00000106991         0      1094       798        0       698         0
> ENSG00000123009         0       556       520         0       566         0
> ENSG00000133216         0       808       762         0       730         0
>                  sample.0 sample.1
> ENSG00000074855      1166        0
> ENSG00000076351       698         0
> ENSG00000105552       764         0
> ENSG00000106991      1733        0
> ENSG00000123009       980         0
> ENSG00000133216      1304        0
>
> I see that these genes have all zero counts for TreatmentB and this explains the low logFC, but I don't get this when I don't use GLM at all (exactTest).
>
> So in summary, my questions are:
>
> 1) Does the ordering of rows in the design matrix have to be the same as the column order in the count matrix,
> or the sample name is taken into account so the order should not matter?
>
> 2) Do you have any idea why I don't get pseudocounts after estimating the dispersion, and is it really this that causes such low logFC?
>
> I am looking forward to hearing from you.
>
> Many thanks in advance,
> Ale
>
>
>   -- output of sessionInfo():
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] plyr_1.7.1      optparse_1.0.1  ggplot2_0.9.3.1 reshape2_1.2.1
> [5] edgeR_3.0.8     limma_3.14.4
>
> loaded via a namespace (and not attached):
>   [1] MASS_7.3-17        RColorBrewer_1.0-5 colorspace_1.1-1   dichromat_1.2-4
>   [5] digest_0.5.2       getopt_1.19.1      grid_2.15.0        gtable_0.1.2
>   [9] labeling_0.1       munsell_0.3        proto_0.3-9.2      scales_0.2.3
> [13] stringr_0.6        tools_2.15.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 11
Date: Tue, 12 Mar 2013 16:14:58 -0400
From: Zhuzhu Zhang <zhuzhuz at email.unc.edu>
To: Gordon K Smyth <smyth at wehi.EDU.AU>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] edgeR: new defaults of estimateTagwiseDisp and
        exactTest
Message-ID: <513F8CC2.5070300 at email.unc.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dear Gordon,

Thank you very much for your reply. That was greatly helpful.

For prion.df, would you recommend that the user use the default value in
the current version, or choose it differently for different datasets
since it varies as the data size changes? If the latter, would the same
principle of choosing prior.n be applied?

Thanks,
Zhuzhu




On 3/8/13 2:06 AM, Gordon K Smyth wrote:
> Dear Zhuzhu,
>
> The edgeR User's guide gives case studies of how we intend edgeR to be
> used.  You will see that we expect users to generally use the default
> parameters.  Although there are a great many parameters that can be
> varied in principle in the edgeR functions, we not expect them to be
> changed in most analyses.
>
> One exception was the prior.n argument to estimateTagwiseDisp.  It was
> never our intention that a fixed value for prior.n would be used for
> datasets of different sizes, so previous versions of the User's Guide
> used to explain how to set this parameter.  Several versions ago, we
> eliminated the prior.n argument and replaced it with prior.df so that
> the default behavior was what we intended.
>
> Over time we have reduced the default value for prior.df somewhat.
> Larger values of prior.df give more priority to genes with large fold
> changes between groups.  Smaller values of prior.df give more priority
> to genes with small dispersion values, i.e., to genes that are
> consistent between replicates.
>
> You ask about exactTest().  To learn more about the different
> rejection regions, you should start by reading the help page for
> exactTest(), but I doubt that this is causing you a problem.  The
> theory behind the original exact negative binomial test of Robinson
> and Smyth (2008) presupposed that the conditional distibution of the
> counts given the genewise total was unimodal (like a binomial
> distribution).  This is true for typical dispersion values, but is not
> true for very large dispersion values. Hence the original exactTest
> can give totally innappropriate results when the dispersion is very,
> very large. The new rejection region is similar to the original when
> the dispersion is smallish, but gives sensible results in any
> situation.  Hence it should be used.  The old rejection region is
> preserved as an option only for backward compatability.
>
> Best wishes
> Gordon
>
>
>> Date: Wed, 06 Mar 2013 19:22:06 -0500
>> From: Zhuzhu Zhang <zhuzhuz at email.unc.edu>
>> To: bioconductor at r-project.org
>> Subject: [BioC] edgeR: new defaults of estimateTagwiseDisp and
>>     exactTest
>>
>> Dear All,
>>
>> I'm running edgeR 3.0.8 and notice that the results are considerably
>> different than those from older versions. I realized that it was likely
>> due to the different prior.df I used in Function estimateTagwiseDisp,
>> thanks to an earlier discussion on the list-
>> https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html
>>
>> I have a few more questions:
>>
>> 1. How to choose a proper prior.df (or prior.n)?
>>
>> 2. How is the new default method of exactText different from the old
>> default (rejection.region = "smallp")? How does it improve the
>> performance?
>>
>> 3. In general, what parameters should I tune for different datasets,
>> when using function estimateTagwiseDisp and exactTest? I used the
>> defaults but realized that they may not be most appropriate.
>>
>> Thank you for your time and attention. Any suggestions and comments
>> would be extremely helpful and appreciated.
>>
>> Thanks,
>> Zhuzhu
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}



------------------------------

Message: 12
Date: Tue, 12 Mar 2013 21:32:03 +0100
From: Julian Gehring <julian.gehring at embl.de>
To: bioconductor at r-project.org
Subject: [BioC] Subsampling of BAM/SAM alignments
Message-ID: <513F90C3.3020301 at embl.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,

Is there an efficient function to randomly subsample alignments from a
BAM/SAM within R?  In the best case, an equivalent to 'FastqSampler' in
'ShortRead'.  If not, what would be a clever way of doing this with the
bioc framework, without having to read the whole file first?

Best wishes
Julian



------------------------------

Message: 13
Date: Tue, 12 Mar 2013 21:51:39 +0100
From: Wolfgang Huber <whuber at embl.de>
To: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: [BioC] DESeq2
Message-ID: <592E98F1-5FEF-47F0-B9F8-FA485F4489B0 at embl.de>
Content-Type: text/plain; charset=us-ascii

Dear DESeq users,

Mike Love, Simon Anders and I have been updating the DESeq package. This resulted in the package DESeq2, which is available from the development branch, and scheduled for the next release: http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html

For several release cycles, the original package (DESeq) will be maintained at its current functionality, in order to not disrupt the workflows of DESeq users. For new projects, we recommend using DESeq2. Major innovations are:

* Base class: SummarizedExperiment is used as the superclass for storing the data, rather than eSet.  This allows closer integration with upstream workflows involving GRanges and summarizeOverlaps, and facilitates downstream analyses of the genomic regions of interest.

* Simplified workflow: the wrapper function DESeq() performs all steps for a differential expression analysis. The individual steps are of course also accessible.

* More powerful statistics: incorporation of prior distributions into the estimation of dispersions and fold changes (empirical-Bayes shrinkage). The dispersion shrinkage improves power compared to the old DESeq. The fold changes shrinkage help moderate the otherwise large spread in log fold changes for genes with low counts, while it has negligible effect on genes with high counts; it may be particularly useful for visualisation, clustering, classification, ordination (PCA, MDS), similar to the variance-stabilizing transformation in the old DESeq. A Wald test for significance is provided as the default inference method, with the chi-squared test of the previous version is also available. A manuscript is in preparation.

* Normalization: it is possible to provide a matrix of sample- *and* gene-specific normalization factors, which allows the use of normalisation factors from Bioconductor packages such as cqn and EDASeq.

Examples of usage are provided in the vignette, and more details are available in the manual pages (specifically, the DESeq function and estimateDispersions function).

Enjoy -

        Mike, Simon, Wolfgang.



------------------------------

Message: 14
Date: Tue, 12 Mar 2013 21:56:17 +0100
From: Wolfgang Huber <whuber at embl.de>
To: "Nima [guest]" <guest at bioconductor.org>
Cc: ahmadian.n at gmail.com, bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID: <9F4C0F37-22AE-49BF-99CE-51F2BE3CCB4A at embl.de>
Content-Type: text/plain; charset=us-ascii


Nima

Thank you. Can you provide a (minimal) transcript of the commands you issued, and the exact error message you get (e.g. what is the URL looked for but not found by firefox?)

        Best wishes
        Wolfgang

Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <guest at bioconductor.org> ha scritto:

>
> Dear All
>
> I have problem with displaying my images in R, all functions are working except display
> my OS is windows 7 but Mozilla Firefox gives this message:
> Oops the page you are looking for could not be found
>
> would you please help me in this regard??
>
> Regards
> Nima
>
> -- output of sessionInfo():
>
> R version 2.15.3 (2013-03-01)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] EBImage_4.0.0
>
> loaded via a namespace (and not attached):
> [1] abind_1.4-0 jpeg_0.1-2  png_0.1-4   tiff_0.1-4
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 15
Date: Tue, 12 Mar 2013 22:27:58 +0100
From: Nima Ahmadian <ahmadian.n at gmail.com>
To: Wolfgang Huber <whuber at embl.de>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID:
        <CAN+VrQnYqC1cUU=4XML_P1q_SXNmnquicy1S=bb7fzoQjc7kLg at mail.gmail.com>
Content-Type: text/plain

^Dear Wolfgang

Thank you so much but I solved the issue somehow, there is no any error
message, but the image doesn't show automatically with my browser, I have
to go to my Temp folder and open it manually so if you have any hint, I
would be really grateful

Cheers
Nima

On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at embl.de> wrote:

>
> Nima
>
> Thank you. Can you provide a (minimal) transcript of the commands you
> issued, and the exact error message you get (e.g. what is the URL looked
> for but not found by firefox?)
>
>         Best wishes
>         Wolfgang
>
> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <
> guest at bioconductor.org> ha scritto:
>
> >
> > Dear All
> >
> > I have problem with displaying my images in R, all functions are working
> except display
> > my OS is windows 7 but Mozilla Firefox gives this message:
> > Oops the page you are looking for could not be found
> >
> > would you please help me in this regard??
> >
> > Regards
> > Nima
> >
> > -- output of sessionInfo():
> >
> > R version 2.15.3 (2013-03-01)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United States.1252
> > [2] LC_CTYPE=English_United States.1252
> > [3] LC_MONETARY=English_United States.1252
> > [4] LC_NUMERIC=C
> > [5] LC_TIME=English_United States.1252
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] EBImage_4.0.0
> >
> > loaded via a namespace (and not attached):
> > [1] abind_1.4-0 jpeg_0.1-2  png_0.1-4   tiff_0.1-4
> >
> > --
> > Sent via the guest posting facility at bioconductor.org.
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>

        [[alternative HTML version deleted]]



------------------------------

Message: 16
Date: Tue, 12 Mar 2013 22:53:39 +0100
From: Andrzej Ole? <andrzej.oles at gmail.com>
To: Nima Ahmadian <ahmadian.n at gmail.com>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID:
        <CAH7mKkj-Pzkd1A6vqkiu8tsyBh5o5Kc67FbaxZb3JhGcfVytrg at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Dear Nima,

many thanks for reporting this issue! Could you please check whether
the example from the documentation of the 'display' function results
in the same error? Try running the following commands:

f = system.file("images", "lena-color.png", package="EBImage")
x = readImage(f)
display(x)

If you still keep getting the error please copy the contents of the
address bar of your browser and send it to me. Additionally, could you
provide the complete path (i.e., beginning with c:\ ... or similar) to
the Temp folder containing the generated images and the listing of its
contents?

I look forward to hearing from you soon.

Best,
Andrzej


On Tue, Mar 12, 2013 at 10:27 PM, Nima Ahmadian <ahmadian.n at gmail.com> wrote:
> ^Dear Wolfgang
>
> Thank you so much but I solved the issue somehow, there is no any error
> message, but the image doesn't show automatically with my browser, I have
> to go to my Temp folder and open it manually so if you have any hint, I
> would be really grateful
>
> Cheers
> Nima
>
> On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at embl.de> wrote:
>
>>
>> Nima
>>
>> Thank you. Can you provide a (minimal) transcript of the commands you
>> issued, and the exact error message you get (e.g. what is the URL looked
>> for but not found by firefox?)
>>
>>         Best wishes
>>         Wolfgang
>>
>> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <
>> guest at bioconductor.org> ha scritto:
>>
>> >
>> > Dear All
>> >
>> > I have problem with displaying my images in R, all functions are working
>> except display
>> > my OS is windows 7 but Mozilla Firefox gives this message:
>> > Oops the page you are looking for could not be found
>> >
>> > would you please help me in this regard??
>> >
>> > Regards
>> > Nima
>> >
>> > -- output of sessionInfo():
>> >
>> > R version 2.15.3 (2013-03-01)
>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> >
>> > locale:
>> > [1] LC_COLLATE=English_United States.1252
>> > [2] LC_CTYPE=English_United States.1252
>> > [3] LC_MONETARY=English_United States.1252
>> > [4] LC_NUMERIC=C
>> > [5] LC_TIME=English_United States.1252
>> >
>> > attached base packages:
>> > [1] stats     graphics  grDevices utils     datasets  methods   base
>> >
>> > other attached packages:
>> > [1] EBImage_4.0.0
>> >
>> > loaded via a namespace (and not attached):
>> > [1] abind_1.4-0 jpeg_0.1-2  png_0.1-4   tiff_0.1-4
>> >
>> > --
>> > Sent via the guest posting facility at bioconductor.org.
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 17
Date: Wed, 13 Mar 2013 10:37:03 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: Zhuzhu Zhang <zhuzhuz at email.unc.edu>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] edgeR: new defaults of estimateTagwiseDisp and
        exactTest
Message-ID: <Pine.WNT.4.64.1303131033340.6032 at PC765.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Tue, 12 Mar 2013, Zhuzhu Zhang wrote:

> Dear Gordon,
>
> Thank you very much for your reply. That was greatly helpful.
>
> For prion.df, would you recommend that the user use the default value in the
> current version, or choose it differently for different datasets since it
> varies as the data size changes?

Use the default value.  As I tried to explain, the whole point of using
prior.df instead of prior.n is that the optimal value for prior.df does
not depend on data size.

> If the latter, would the same principle of
> choosing prior.n be applied?

prior.n is a function of prior.df.  See ?getPriorN.

Gordon

> Thanks,
> Zhuzhu
>
>
>
>
> On 3/8/13 2:06 AM, Gordon K Smyth wrote:
>> Dear Zhuzhu,
>>
>> The edgeR User's guide gives case studies of how we intend edgeR to be
>> used.  You will see that we expect users to generally use the default
>> parameters.  Although there are a great many parameters that can be
>> varied in principle in the edgeR functions, we not expect them to be
>> changed in most analyses.
>>
>> One exception was the prior.n argument to estimateTagwiseDisp.  It was
>> never our intention that a fixed value for prior.n would be used for
>> datasets of different sizes, so previous versions of the User's Guide
>> used to explain how to set this parameter.  Several versions ago, we
>> eliminated the prior.n argument and replaced it with prior.df so that
>> the default behavior was what we intended.
>>
>> Over time we have reduced the default value for prior.df somewhat.
>> Larger values of prior.df give more priority to genes with large fold
>> changes between groups.  Smaller values of prior.df give more priority
>> to genes with small dispersion values, i.e., to genes that are
>> consistent between replicates.
>>
>> You ask about exactTest().  To learn more about the different rejection
>> regions, you should start by reading the help page for exactTest(), but
>> I doubt that this is causing you a problem.  The theory behind the
>> original exact negative binomial test of Robinson and Smyth (2008)
>> presupposed that the conditional distibution of the counts given the
>> genewise total was unimodal (like a binomial distribution).  This is
>> true for typical dispersion values, but is not true for very large
>> dispersion values. Hence the original exactTest can give totally
>> innappropriate results when the dispersion is very, very large. The new
>> rejection region is similar to the original when the dispersion is
>> smallish, but gives sensible results in any situation.  Hence it should
>> be used.  The old rejection region is preserved as an option only for
>> backward compatability.
>>
>> Best wishes
>> Gordon
>>
>>
>>> Date: Wed, 06 Mar 2013 19:22:06 -0500
>>> From: Zhuzhu Zhang <zhuzhuz at email.unc.edu>
>>> To: bioconductor at r-project.org
>>> Subject: [BioC] edgeR: new defaults of estimateTagwiseDisp and
>>>     exactTest
>>>
>>> Dear All,
>>>
>>> I'm running edgeR 3.0.8 and notice that the results are considerably
>>> different than those from older versions. I realized that it was
>>> likely due to the different prior.df I used in Function
>>> estimateTagwiseDisp, thanks to an earlier discussion on the list-
>>> https://stat.ethz.ch/pipermail/bioconductor/2012-December/049644.html
>>>
>>> I have a few more questions:
>>>
>>> 1. How to choose a proper prior.df (or prior.n)?
>>>
>>> 2. How is the new default method of exactText different from the old
>>> default (rejection.region = "smallp")? How does it improve the
>>> performance?
>>>
>>> 3. In general, what parameters should I tune for different datasets,
>>> when using function estimateTagwiseDisp and exactTest? I used the
>>> defaults but realized that they may not be most appropriate.
>>>
>>> Thank you for your time and attention. Any suggestions and comments
>>> would be extremely helpful and appreciated.
>>>
>>> Thanks,
>>> Zhuzhu

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



------------------------------

Message: 18
Date: Wed, 13 Mar 2013 11:17:30 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: Ian Sudbery <ian.sudbery at dpag.ox.ac.uk>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: [BioC] negative correlation from limma's duplicateCorrelation
Message-ID: <Pine.WNT.4.64.1303131037280.6032 at PC765.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Dear Ian,

The help page for duplicateCorrelation() says

"For this function to return statistically useful results, there must be
at least two more arrays than the number of coefficients to be estimated,
i.e., two more than the column rank of design."

You have just two arrays, only one more than the number of columns in the
design matrix.  So there is not enough meaningful information from which
to estimate the duplicate correlation.

The correlations that you are computing using cor() are not comparable to
the quantity estimated by duplicateCorrelation.  You are computing
correlations between pairs of spots for the same gene across different
genes.  This correlation arises from the fact that different genes have
different expression levels.  You would get a positive value for this
correlation even if there were no spot duplicates.  The
duplicateCorrelation() function on the other hand computes a correlation
across technical replicates for the same gene.  These two correlations are
not related.

Best wishes
Gordon

> Date: Mon, 11 Mar 2013 18:27:25 +0000
> From: Ian Sudbery <ian.sudbery at dpag.ox.ac.uk>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] negative correlation from limma's duplicateCorrelation
>
> Hi all,
>
> I might be misunderstanding something here, but I think I'm having
> trouble with Limma's duplicateCorrelation function.
>
> I am analysing a set of custom spotted two color microarrays. Each
> comparison contains two arrays. Each array contains two copies of each
> spot. The design table for each comparison looks something like this:
>
>> targets
>                     SlideNumber             FileName   Cy3 Cy5       Date                 Name
> yeast_wt_v_dd_rep1           823  s823_bwp17y+ddy.txt   wty ddy 17/12/2012   yeast_wt_v_dd_rep1
> yeast_wt_v_dd_rep2           828     s828_wty+ddy.txt   wty ddy 18/01/2013   yeast_wt_v_dd_rep2
>
>
> After background correction and normalisation, I run
> duplicateCorrelation before fitting the model:
>
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324)
>
> When I look at the results, I find that the consensus correlation is
> negative:
>
>> corfit$consensus.correlation
> [1] -0.4163954
>
> Surely this is wrong? Examining the correlation for duplicate spots on
> each slide manually suggests a good correlation, particularly if one
> ignores flagged spots:
>
>> good_spots <- MA_norm$weights[1:8324,] == 1 & MA_norm$weights[8325:16648,] ==1
>> cor(MA_norm$M[1:8324,1][good_spots[,1]],MA_norm$M[8325:16648,1][good_spots[,1]])
> [1] 0.8769604
>
>> cor(MA_norm$M[1:8324,2][good_spots[,2]],MA_norm$M[8325:16648,2][good_spots[,2]])
> [1] 0.858891
>
> Even without removing the flagged spots, the correlation is still positive:
>> cor(MA_norm$M[1:8324,1],MA_norm$M[8325:16648,1])
> [1] 0.2669379
>
>> cor(MA_norm$M[1:8324,2],MA_norm$M[8325:16648,2])
> [1] 0.1843577
>
> I know that this isn't the way that duplicateCorrelation calculates rho,
> but one would expect that a good correlation in this way might indicate
> at least a positive correlation from duplicateCorrelation? Or am I
> interpreting the consensus correlation incorrectly?
>
> Here is the rest of my analysis script (I've left out some diagnostic plot generation):
>
>> targets <- readTargets(row.names="Name")
>> RG <- read.maimages(targets$FileName, source = "genepix", wt.fun=wtflags(weight=0, cutoff=-50))
>> spottypes <- readSpotTypes()
>> RG$genes$Status <- controlStatus(spottypes,RG)
>> anno <- read.delim("anno.txt", comment.char="!", header = T)
>> RG_background <- backgroundCorrect(RG,method = "normexp", offset =10)
>>
>> #flag out spots where intentity isn't much above background in both channels
>> RG_background$weights[RG_background$R < 50 & RG_background$G < 50] = 0
>> MA <- normalizeWithinArrays(RG_background)
>> MA_norm <- MA[MA$genes$Status == "cDNA",]
>> MA_norm$genes$Status <- NULL
>> corfit <- duplicateCorrelation(MA_norm, design = c(1,1), ndups=2, spacing = 8324)
>> fit <- lmFit(MA_norm, design = design, ndups = 2, correlation=corfit$consensus.correlation,spacing = 8324)
>> fit<- eBayes(fit)
>
>> sesssionInfo()
> R version 2.15.1 (2012-06-22)
>
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> [4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] dichromat_2.0-0   gplots_2.11.0     MASS_7.3-18       KernSmooth_2.23-7 caTools_1.14      gdata_2.12.0      gtools_2.7.0
> [8] reshape2_1.2.2    ggplot2_0.9.3     statmod_1.4.16    limma_3.14.4
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-5       colorspace_1.2-1   digest_0.6.2       gtable_0.1.2       labeling_0.1       munsell_0.4        plyr_1.8
> [8] proto_0.3-10       RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2      tools_2.15.1
>
> Any advice appreciated
>
> Ian
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



------------------------------

Message: 19
Date: Wed, 13 Mar 2013 11:27:50 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: Luis alberto Martinez lopez <lewisluis2013 at yahoo.com>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: [BioC] anova like test in edgeR
Message-ID: <Pine.WNT.4.64.1303131122400.6032 at PC765.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed


> Date: Mon, 11 Mar 2013 17:18:09 -0700 (PDT)
> From: Luis alberto Martinez lopez <lewisluis2013 at yahoo.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] anova like test in edgeR
>
> Hello everyone:
>
> a wish to know if there is one way of doing an anova like test in edgeR,
> comparing all the possibilities between the groups defined in the
> experimental design.
>
> in the users guide the authors explain one way anova test, but the
> comparison is done only against the intercept group, is there a way of
> doing a general test comparing all the posibilities and not only against
> the intercept?

The example in the User's Guide is a general test comparing all
possibilities.

Best wishes
Gordon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



------------------------------

Message: 20
Date: Wed, 13 Mar 2013 01:38:44 +0100
From: Nima Ahmadian <ahmadian.n at gmail.com>
To: Andrzej Ole? <andrzej.oles at gmail.com>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] displaying the image with EBImage
Message-ID:
        <CAN+VrQmL+9Hs0iq+khoociSgxw7NNCXK1ZBNXxv_gk5pgKCs0Q at mail.gmail.com>
Content-Type: text/plain

Dear Andrzej

I solved the issue, but I came to know that if you have searchqu toolbar
installed on your browser it prevents the images to get open, so I removed
it and it worked out

cheers
Nima

On Tue, Mar 12, 2013 at 10:53 PM, Andrzej Ole?? <andrzej.oles at gmail.com>wrote:

> Dear Nima,
>
> many thanks for reporting this issue! Could you please check whether
> the example from the documentation of the 'display' function results
> in the same error? Try running the following commands:
>
> f = system.file("images", "lena-color.png", package="EBImage")
> x = readImage(f)
> display(x)
>
> If you still keep getting the error please copy the contents of the
> address bar of your browser and send it to me. Additionally, could you
> provide the complete path (i.e., beginning with c:\ ... or similar) to
> the Temp folder containing the generated images and the listing of its
> contents?
>
> I look forward to hearing from you soon.
>
> Best,
> Andrzej
>
>
> On Tue, Mar 12, 2013 at 10:27 PM, Nima Ahmadian <ahmadian.n at gmail.com>
> wrote:
> > ^Dear Wolfgang
> >
> > Thank you so much but I solved the issue somehow, there is no any error
> > message, but the image doesn't show automatically with my browser, I have
> > to go to my Temp folder and open it manually so if you have any hint, I
> > would be really grateful
> >
> > Cheers
> > Nima
> >
> > On Tue, Mar 12, 2013 at 9:56 PM, Wolfgang Huber <whuber at embl.de> wrote:
> >
> >>
> >> Nima
> >>
> >> Thank you. Can you provide a (minimal) transcript of the commands you
> >> issued, and the exact error message you get (e.g. what is the URL looked
> >> for but not found by firefox?)
> >>
> >>         Best wishes
> >>         Wolfgang
> >>
> >> Il giorno Mar 12, 2013, alle ore 5:30 PM, Nima [guest] <
> >> guest at bioconductor.org> ha scritto:
> >>
> >> >
> >> > Dear All
> >> >
> >> > I have problem with displaying my images in R, all functions are
> working
> >> except display
> >> > my OS is windows 7 but Mozilla Firefox gives this message:
> >> > Oops the page you are looking for could not be found
> >> >
> >> > would you please help me in this regard??
> >> >
> >> > Regards
> >> > Nima
> >> >
> >> > -- output of sessionInfo():
> >> >
> >> > R version 2.15.3 (2013-03-01)
> >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> >> >
> >> > locale:
> >> > [1] LC_COLLATE=English_United States.1252
> >> > [2] LC_CTYPE=English_United States.1252
> >> > [3] LC_MONETARY=English_United States.1252
> >> > [4] LC_NUMERIC=C
> >> > [5] LC_TIME=English_United States.1252
> >> >
> >> > attached base packages:
> >> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >> >
> >> > other attached packages:
> >> > [1] EBImage_4.0.0
> >> >
> >> > loaded via a namespace (and not attached):
> >> > [1] abind_1.4-0 jpeg_0.1-2  png_0.1-4   tiff_0.1-4
> >> >
> >> > --
> >> > Sent via the guest posting facility at bioconductor.org.
> >> >
> >> > _______________________________________________
> >> > Bioconductor mailing list
> >> > Bioconductor at r-project.org
> >> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> > Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]



------------------------------

Message: 21
Date: Wed, 13 Mar 2013 01:21:49 +0000
From: Nadia Davidson <nadia.davidson at mcri.edu.au>
To: <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] Negative parameter error when using goseq
Message-ID: <loom.20130313T015334-216 at post.gmane.org>
Content-Type: text/plain; charset="us-ascii"


Tarca, Adi <atarca at ...> writes:
>
> Dear Matthew,
> I am getting the following error when using goseq. Any idea what may cause it?
> Thanks,
> Adi
>
> > pwf=nullp(genes,bias.data=cntbias)
> > gocats=as.list(org.Hs.egGO2ALLEGS)
> > GOr=goseq(pwf,gene2cat=gocats)
> Using manually entered categories.
> Calculating the p-values...
> Error in dWNCHypergeo(num_de_incat, num_incat, num_genes - num_incat,  :
>   Negative parameter
> > head(pwf)
>           DEgenes bias.data        pwf
> 642819          0      3764 0.05775376
> 414235          0       124 0.02107342
> 57107           0     14702 0.06023892
> 649159          1       248 0.02542837
> 3127            0     44633 0.06357525
> 100507412       1      1337 0.05268580
> > head(gocats[3:5])
> $`GO:0000012`
>         IDA         IDA         IEA         IMP         IDA         IDA
>      "3981"      "7141"      "7515"     "23411"     "54840"     "55775"
>         IMP         IMP         IEA
>     "55775"    "200558" "100133315"
>

> Adi Laurentiu TARCA, Ph.D.

Dear Adi,

I think the trouble is that "org.Hs.egGO2ALLEGS" lists some GO groups
twice for the same gene. This is making goseq overestimate the number
of genes in a particular GO category, to the extent that the number of
genes in the category is more than the total number of genes.

It should be simple to fix with:
gocats<-sapply(gocats,function(x){unique(x)})
before the "goseq" command.

I'll update the goseq code so it checks for this scenario in future.

Cheers,
Nadia.



------------------------------

Message: 22
Date: Wed, 13 Mar 2013 12:57:47 +1100 (AUS Eastern Daylight Time)
From: Gordon K Smyth <smyth at wehi.EDU.AU>
To: Luis alberto Martinez lopez <lewisluis2013 at yahoo.com>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] anova like test in edgeR
Message-ID: <Pine.WNT.4.64.1303131254070.6032 at PC765.wehi.edu.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

The example is in Section 3.2.5, which is titled "An ANOVA-like test for
any differences".

Gordon

> Date: Mon, 11 Mar 2013 17:18:09 -0700 (PDT)
> From: Luis alberto Martinez lopez <lewisluis2013 at yahoo.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] anova like test in edgeR
>
> Hello everyone:
>
> a wish to know if there is one way of doing an anova like test in edgeR,
> comparing all the possibilities between the groups defined in the
> experimental design.
>
> in the users guide the authors explain one way anova test, but the
> comparison is done only against the intercept group, is there a way of
> doing a general test comparing all the posibilities and not only against
> the intercept?

The example in the User's Guide is a general test comparing all
possibilities.

Best wishes
Gordon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



------------------------------

Message: 23
Date: Tue, 12 Mar 2013 19:45:01 -0700
From: Martin Morgan <mtmorgan at fhcrc.org>
To: Julian Gehring <julian.gehring at embl.de>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Subsampling of BAM/SAM alignments
Message-ID: <513FE82D.6050104 at fhcrc.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 03/12/2013 01:32 PM, Julian Gehring wrote:
> Hi,
>
> Is there an efficient function to randomly subsample alignments from a BAM/SAM
> within R?  In the best case, an equivalent to 'FastqSampler' in 'ShortRead'.  If
> not, what would be a clever way of doing this with the bioc framework, without
> having to read the whole file first?

Hi Julian -- there isn't anything at the moment; below is a function that will
have at most yieldSize * 2 elements in memory at one time. All the elements in
the bam file (satisfying ScanBamParam) will eventually pass through memory, so
not super efficient.

sampleBam <-
     function(file, index=file, ..., yieldSize, verbose=FALSE,
              param=ScanBamParam(what=scanBamWhat()))
{
     qunlist <- Rsamtools:::.quickUnlist
     tot <- sampleSize <- yieldSize
     bf <- open(BamFile(file, index, yieldSize=yieldSize))
     smpl <- qunlist(scanBam(bf, param=param))

     repeat {
         yld <- qunlist(scanBam(bf, param=param))
         yld_n <- length(yld[[1]])
         if (length(yld[[1]]) == 0L)
             break
         tot <- tot + yld_n
         keep <- rbinom(1L, yld_n, yld_n/ tot)
         if (verbose)
             message(sprintf("sampling %d of %d", keep, tot))
         if (keep == 0L)
             next

         i <- sample(sampleSize, keep)
         j <- sample(yld_n, keep)
         smpl <- Map(function(x, y, i, j) {
             x[i] <- y[j]
             x
         }, smpl, yld, MoreArgs=list(i=i, j=j))
     }

     close(bf)
     smpl
}


with

   fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
   param <- ScanBamParam(
       flag=scanBamFlag(isUnmappedQuery=FALSE),
       what=c("rname", "pos", "cigar", "strand"))

   lst <- sampleBam(fl, yieldSize=1000, param=param, verbose=TRUE)

The end result could be fed to a more friendly structure

   names(lst)[names(lst) == "rname"] = "seqname"
   do.call(GappedAlignments, lst)

Lightly tested, especially when ScanBamParam() is non-trivial.

Martin

>
> Best wishes
> Julian
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



------------------------------

Message: 24
Date: Tue, 12 Mar 2013 17:28:57 -0400
From: Amy Vandiver <amyruthvandiver at gmail.com>
To: <bioconductor at stat.math.ethz.ch>
Subject: [BioC] ArrayExpress Question
Message-ID:
        <CAF4AQNHqPOhKEpGhenUknjE+Cx2e3d8TS-VLp=wu_oKi3fk7tA at mail.gmail.com>
Content-Type: text/plain

Hello,
I've been trying to access Array Expression data using the Array Expression
Package, however I am running into an error, seemingly due to the format of
the SDRF file. I have been trying to run through the ae2bioc function line
by line (ae2bioc) to determine if I can work around this error, however
many of the called functions (readPhenoData, getSDRFcolumn, etc.) can not
be found.  Am I missing a specific package containing these functions? Is
there a way to work around the original error?
Thank you,
Amy

*My script:*

source("http://bioconductor.org/biocLite.R")

biocLite("ArrayExpress")

library("ArrayExpress")

AEset=ArrayExpress("E-MTAB-202")

*Error:*

Error in .subset2(x, i, exact = exact) :

  attempt to select less than one element

In addition: Warning message:

In readPhenoData(sdrf, path) :

  ArrayExpress: Cannot find 'Label' column in SDRF. Object might not be
created correctly.

*SessionInfo:*

R Under development (unstable) (2013-03-10 r62201)

Platform: x86_64-unknown-linux-gnu (64-bit)


locale:

 [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C

 [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915

 [5] LC_MONETARY=en_US.iso885915    LC_MESSAGES=en_US.iso885915

 [7] LC_PAPER=C                     LC_NAME=C

 [9] LC_ADDRESS=C                   LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C


attached base packages:

[1] parallel  stats     graphics  grDevices datasets  utils     methods

[8] base


other attached packages:

 [1] minfiLocal_0.1.20

 [2] minfiData_0.3.1

 [3] IlluminaHumanMethylation450kannotation.ilmn.v1.2_0.1.3

 [4] IlluminaHumanMethylation450kmanifest_0.4.0

 [5] bumphunter_0.99.33

 [6] iterators_1.0.6

 [7] foreach_1.4.0

 [8] minfi_1.5.3

 [9] Biostrings_2.27.11

[10] GenomicRanges_1.11.35

[11] IRanges_1.17.35

[12] reshape_0.8.4

[13] plyr_1.8

[14] lattice_0.20-13

[15] limma_3.15.14

[16] ArrayExpress_1.19.0

[17] Biobase_2.19.3

[18] BiocGenerics_0.5.6

[19] BiocInstaller_1.9.6


loaded via a namespace (and not attached):

 [1] affy_1.37.4           affyio_1.27.1         beanplot_1.1

 [4] codetools_0.2-8       DNAcopy_1.33.1        doRNG_1.5

 [7] grid_3.1.0            illuminaio_0.1.5      itertools_0.1-1

[10] MASS_7.3-24           matrixStats_0.6.2     mclust_4.0

[13] multtest_2.15.0       nor1mix_1.1-3         preprocessCore_1.21.1

[16] RColorBrewer_1.0-5    R.methodsS3_1.4.2     siggenes_1.33.1

[19] splines_3.1.0         stats4_3.1.0          survival_2.37-4

[22] tools_3.1.0           XML_3.95-0.2          zlibbioc_1.5.0




--
Amy Ruth Vandiver
MSII, GSIII
Johns Hopkins School of Medicine, MD-PhD Program

        [[alternative HTML version deleted]]



------------------------------

Message: 25
Date: Wed, 13 Mar 2013 01:55:49 +0000
From: "Marwaha, Shruti (marwahsi)" <marwahsi at mail.uc.edu>
To: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: [BioC] Error in getGEO command with GSEMatrix=TRUE
Message-ID:
        <39BA056F90A9A14C8067B39997BFEDCB5867150A at SN2PRD0102MB117.prod.exchangelabs.com>

Content-Type: text/plain; charset="us-ascii"

Dear Developers,

I am trying to  get an ExpressionSet from a GSEfile using getGEO command with GSEMatrix=TRUE. It sometimes gives an error (Connection was reset) if I use GSEMatrix=TRUE however sometimes it works smooth. I have the latest version of R, Bioconductor and GEOquery. Please advise if I am missing some library or need to change some settings. Thanks in advance.

My Code
library(Biobase)
library(GEOquery)
library(limma)
gset <- getGEO("GSE27347", GSEMatrix =TRUE)

Error:
Error in function (type, msg, asError = TRUE)  :
  Send failure: Connection was reset

Session Info

> sessionInfo()

R version 2.15.2 (2012-10-26)

Platform: x86_64-w64-mingw32/x64 (64-bit)



locale:

[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252

[3] LC_MONETARY=English_India.1252 LC_NUMERIC=C

[5] LC_TIME=English_India.1252



attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base



other attached packages:

[1] limma_3.14.4       GEOquery_2.24.1    Biobase_2.18.0     BiocGenerics_0.4.0



loaded via a namespace (and not attached):

[1] RCurl_1.95-4.1 tools_2.15.2   XML_3.95-0.2

System Properties
OS: Windows 7
RAM: 4GB
Processor: 2GHz

Methods already attempted

*         I have tried it on both RStudio and GUI

*         I have also tried using the command "setInternet2(use=FALSE)" but it doesn't help.

*         I have tried this command with other GSEfiles also and get the same error.

*         GEOquery works fine and downloads the GSEfiles if I give GSEMatrix=FALSE

*         I have also tried downloading the file and provide the file path of the file to getGEO with GSEMatrix=TRUE but it does not give an ExpressionSet as output.

Thanks & Regards,
Shruti Marwaha

Graduate Student
Systems Biology and Physiology
University of Cincinnati
Medical Science Building,
231 Albert Sabin Way
Cincinnati, OH, USA 45267



------------------------------

Message: 26
Date: Tue, 12 Mar 2013 21:35:26 -0600
From: chris_utah <chris.gregg at neuro.utah.edu>
To: Bioconductor mailing list <bioconductor at r-project.org>
Subject: [BioC] biomaRt access for NCBIM37 build of mouse genome
Message-ID: <6D52F2F6-3564-4DA7-9101-2AA9B1D15BF8 at neuro.utah.edu>
Content-Type: text/plain

Hello,

I am using biomaRt.  I want to retrieve transcript start and stop info based on the NCBIM37 (mm9) build using getBM.  I can't find this annotation in listMarts(archive =T).   Is anyone aware of a simple solution?

Thank you.

best wishes,
Chris


> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.0.8          limma_3.14.1         rtracklayer_1.18.0   biomaRt_2.14.0       GenomicRanges_1.10.5 IRanges_1.16.4       BiocGenerics_0.4.0
[8] Gviz_1.2.0

loaded via a namespace (and not attached):
 [1] annotate_1.36.0        AnnotationDbi_1.20.2   Biobase_2.18.0         Biostrings_2.26.2      biovizBase_1.6.0       bitops_1.0-4.2
 [7] BSgenome_1.26.1        cluster_1.14.3         colorspace_1.2-0       DBI_0.2-5              DESeq_1.10.1           dichromat_1.2-4
[13] genefilter_1.40.0      geneplotter_1.36.0     GenomicFeatures_1.10.0 Hmisc_3.10-1           labeling_0.1           lattice_0.20-10
[19] munsell_0.4            parallel_2.15.2        plyr_1.7.1             RColorBrewer_1.0-5     RCurl_1.95-3           Rsamtools_1.10.1
[25] RSQLite_0.11.2         scales_0.2.2           splines_2.15.2         stats4_2.15.2          stringr_0.6.1          survival_2.36-14
[31] tools_2.15.2           XML_3.95-0.1           xtable_1.7-0           zlibbioc_1.4.0




        [[alternative HTML version deleted]]



------------------------------

Message: 27
Date: Tue, 12 Mar 2013 23:39:06 -0700
From: "Ryan C. Thompson" <rct at thompsonclan.org>
To: Luis alberto Martinez lopez <lewisluis2013 at yahoo.com>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] anova like test in edgeR
Message-ID: <51401F0A.6070803 at thompsonclan.org>
Content-Type: text/plain

Luis,

Your design has 4 groups,which means that the inter-group ANOVA-like
test has 4 - 1 = 3 degrees of freedom. Hence, the ANOVA-like test is
performed by testing 3 coefficients/contrasts equal to zero. The
procedures I have described will yield will test where the null
hypothesis is all group means being equal, and the alternative
hypothesis is that any two or more group means differ significantly from
each other. It is true that between 4 groups, there are 6 possible
pairwise comparisons to be made. If you want individual significance
rankings for each of the 6 possible contrasts, you can do that too, by
making 6 separate calls to glmLRT, each with a single coefficient or
contrast. Doing so will give you an appropriate rank order of
significance for each contrast, but the exact FDR values will probably
be too liberal, since there is no correction for the fact that you are
performing 6 tests on each gene instead of just 1. Ideally, you would
use a post-hoc testing method to assess individual contrasts within
genes that are called significant by their FDR in the ANOVA-like test.
However, I'm not sure how to perform such tests with the negative
binomial distribution. You might consider using the limma-voom method
instead of edgeR, since that will perform an actual ANOVA based on the
normal distribution, so standard ANOVA post-hoc analysis is possible. If
anyone has a take on how to do rigorous post-hoc testing on negative
binomial GLMs, I'd be interested in hearing about it.

If you just want the logFC values for the missing 3 contrasts, then
simply note that they can all be written in terms of the X vs 10
contrasts. For example, the 20 vs 40 contrast would be the 10 vs 40
contrast minus the 10 vs 20 contrast.

-Ryan

On 03/12/2013 06:42 PM, Luis alberto Martinez lopez wrote:
> thank you Ryan for your clear answer:
> but in your answer again only considered the comparison versus the 10
> group(the first)
> i dont't have a control group so i'm looking for a way of doing an
> anova like test in edgeR that compares 10 vs 20 ,20 vs 40, 40 vs 60,10
> vs 40, 10 vs 60,20 vs 60 ie all the posibilities at the same time and
> that give me a list with the genes more differential in any condition,
> thanks in advance,
> Luis
> *De:* Ryan C. Thompson <rct at thompsonclan.org>
> *Para:* Luis alberto Martinez lopez <lewisluis2013 at yahoo.com>
> *CC:* "bioconductor at r-project.org" <bioconductor at r-project.org>
> *Enviado:* Lunes, 11 de marzo, 2013 22:46:19
> *Asunto:* Re: [BioC] anova like test in edgeR
>
> Luis,
>
> Note that your experimental design is the same whether or not you
> include an intercept in the design matrix. If you include an
> intercept, then the intercept represents the level in the D10 group,
> and the other three coefficients represent the differences between
> D20/40/60 and D10. So with an intercept, you would specify coef=2:4 to
> do an ANOVA-like test for any differences among all four groups.
>
> If you choose not to use an intercept term, then the design matrix is
> just a different parametrization of the same design. Instead of
> representing differences, each coefficient simply represents the
> expression level in a group. So you can still perform the same test,
> you simply have to specify the contrasts (differences) explicitly:
>
> > anova.contrasts <- makeContrasts(contrasts=c("grpD20-grpD10",
> "grpD40-grpD10", "grpD60-grpD10"), levels=design)
> > print(anovalike.contrasts)
>         Contrasts
> Levels   grpD20-grpD10 grpD40-grpD10 grpD60-grpD10
>   grpD10            -1            -1            -1
>   grpD20             1             0             0
>   grpD40             0             1             0
>   grpD60             0             0             1
> > glmLRT(fit2, contrast=anova.contrasts)
>
> Hope this helps,
>
> -Ryan Thompson
>
>
> On 03/11/2013 05:18 PM, Luis alberto Martinez lopez wrote:
>> Hello everyone:
>>
>> a wish to know if there is one way of doing an anova like test in edgeR, comparing all the possibilities between the groups defined in the experimental design.
>>
>> in the users guide the authors explain one way anova test, but the comparison is done only against the intercept group, is there  a way of doing a general test comparing all the posibilities and not only against the intercept?
>>
>> also i'd like to know what does the following differential expression analysis does because the output is different when i include an intercept, i can't deduce what comparison are being doing....
>>
>>> grp<-c("D10","D10","D20","D20","D40","D40","D60","D60")
>>>    dge = DGEList(counts=general.m, group=grp)
>>>    dge = calcNormFactors(dge)
>>> design <- model.matrix(~0+grp, data=dge$samples)
>> Warning message:
>> In model.matrix.default(~0 + grp, data = dge$samples) :
>>    variable 'grp' converted to a factor
>>>    design
>>         grpD10 grpD20 grpD40 grpD60
>> i10re1      1      0      0      0
>> i10re2      1      0      0      0
>> i20re1      0      1      0      0
>> i20re2      0      1      0      0
>> i40re1      0      0      1      0
>> i40re2      0      0      1      0
>> i60re1      0      0      0      1
>> i60re2      0      0      0      1
>> attr(,"assign")
>> [1] 1 1 1 1
>> attr(,"contrasts")
>> attr(,"contrasts")$grp
>> [1] "contr.treatment"
>>
>>> dge <- estimateGLMCommonDisp(dge,design)
>>> dge <- estimateGLMTrendedDisp(dge,design)
>> Loading required package: splines
>>>    dge <- estimateGLMTagwiseDisp(dge,design)
>>>    fit2 <- glmFit(dge, design)
>>> lrt.anova_like <- glmLRT(fit2, coef=2:4)
>>> lrt.anova_like
>> An object of class "DGELRT"
>> $coefficients
>>                grpD10    grpD20    grpD40    grpD60
>> comp101_c0 -12.55010 -16.12974 -13.27081 -16.12974
>> comp103_c0 -13.85721 -13.35290 -12.61155 -13.96824
>> comp120_c0 -12.33347 -13.98877 -16.12974 -16.12974
>> comp179_c0 -12.55135 -13.98566 -16.12974 -13.96827
>> comp192_c0 -13.85721 -13.98569 -13.27548 -12.67228
>> 23484 more rows ...
>>
>> $fitted.values
>>                i10re1    i10re2    i20re1    i20re2   i40re1   i40re2    i60re1
>> comp101_c0 2.0108643 1.9941877 0.0000000 0.0000000 0.841309 1.168177 0.0000000
>> comp103_c0 0.5020816 0.4979177 1.1140306 0.8859780 1.674705 2.325366 0.5414430
>> comp120_c0 2.5112747 2.4904480 0.5552870 0.4416145 0.000000 0.000000 0.0000000
>> comp179_c0 2.0083306 1.9916749 0.5570245 0.4429963 0.000000 0.000000 0.5414282
>> comp192_c0 0.5020816 0.4979177 0.5570036 0.4429797 0.837345 1.162673 2.1657497
>>                i60re2
>> comp101_c0 0.0000000
>> comp103_c0 0.4585719
>> comp120_c0 0.0000000
>> comp179_c0 0.4585593
>> comp192_c0 1.8342685
>> 23484 more rows ...
>>
>> $deviance
>> comp101_c0 comp103_c0 comp120_c0 comp179_c0 comp192_c0
>>    4.386443   3.070288   2.848548   3.917710   2.629182
>> 23484 more elements ...
>>
>> $df.residual
>> [1] 4 4 4 4 4
>> 23484 more elements ...
>>
>> $abundance
>> [1] -13.63381 -13.35715 -13.64240 -13.64482 -13.35716
>> 23484 more elements ...
>>
>> $design
>>         grpD10 grpD20 grpD40 grpD60
>> i10re1      1      0      0      0
>> i10re2      1      0      0      0
>> i20re1      0      1      0      0
>> i20re2      0      1      0      0
>> i40re1      0      0      1      0
>> i40re2      0      0      1      0
>> i60re1      0      0      0      1
>> i60re2      0      0      0      1
>> attr(,"assign")
>> [1] 1 1 1 1
>> attr(,"contrasts")
>> attr(,"contrasts")$grp
>> [1] "contr.treatment"
>>
>>
>> $offset
>>           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
>> [1,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095
>> [2,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095
>> [3,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095
>> [4,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095
>> [5,] 13.27698 13.26865 13.52514 13.29609 13.15729 13.48553 13.47707 13.31095
>> 23484 more rows ...
>>
>> $dispersion
>> [1] 0.2161256308 0.0003318302 0.0634519684 0.0003270568 0.0003318016
>> 23484 more elements ...
>>
>> $method
>> [1] "oneway"
>>
>> $samples
>>         group lib.size norm.factors
>> i10re1   D10   594988    0.9808651
>> i10re2   D10   590850    0.9795431
>> i20re1   D20   723182    1.0343009
>> i20re2   D20   567524    1.0481805
>> i40re1   D40   452242    1.1449003
>> i40re2   D40   671656    1.0703965
>> i60re1   D60   801682    0.8892299
>> i60re2   D60   685351    0.8809633
>>
>> $table
>>             logFC.grpD20 logFC.grpD40 logFC.grpD60    logCPM          LR
>> comp101_c0    -23.27030    -19.14573    -23.27030 0.2621324    647.0115
>> comp103_c0    -19.26416    -18.19462    -20.15192 0.6612798 194043.3654
>> comp120_c0    -20.18153    -23.27030    -23.27030 0.2497509   1999.3276
>> comp179_c0    -20.17704    -23.27030    -20.15196 0.2462508 196431.3868
>> comp192_c0    -20.17709    -19.15248    -18.28224 0.6612611 194057.4372
>>                    PValue
>> comp101_c0 6.475824e-140
>> comp103_c0  0.000000e+00
>> comp120_c0  0.000000e+00
>> comp179_c0  0.000000e+00
>> comp192_c0  0.000000e+00
>> 23484 more rows ...
>>
>> $comparison
>> [1] "grpD20" "grpD40" "grpD60"
>>
>> $df.test
>> [1] 3 3 3 3 3
>> 23484 more elements ...
>>
>>
>>> top<-topTags(lrt.anova_like)
>>> top
>> Coefficient:  grpD20 grpD40 grpD60
>>              logFC.grpD20 logFC.grpD40 logFC.grpD60    logCPM         LR PValue
>> comp628_c0      -23.2703    -23.27030    -19.23749 0.2462499 196434.924      0
>> comp2607_c0     -23.2703    -20.07173    -18.68223 0.4686227 195146.814      0
>> comp2858_c0     -23.2703    -23.27030    -16.85200 1.1206869 192585.615      0
>> comp2885_c0     -23.2703    -18.60303    -17.96733 0.6593429   4326.878      0
>> comp2904_c0     -23.2703    -23.27030    -16.21482 1.8318474   2065.059      0
>> comp2912_c0     -23.2703    -17.88139    -20.15196 1.1207038 192646.480      0
>> comp2933_c0     -23.2703    -18.59551    -18.68219 0.4686259 195111.455      0
>> comp2950_c0     -23.2703    -19.16586    -17.97899 0.4607410   1715.721      0
>> comp2999_c0     -23.2703    -23.27030    -23.27030 1.2524134   3853.539      0
>> comp3028_c0     -23.2703    -19.15248    -17.30519 0.9831770 192775.649      0
>>              FDR
>> comp628_c0    0
>> comp2607_c0   0
>> comp2858_c0   0
>> comp2885_c0   0
>> comp2904_c0   0
>> comp2912_c0   0
>> comp2933_c0   0
>> comp2950_c0   0
>> comp2999_c0   0
>> comp3028_c0   0
>>> lrt.anova_like <- glmLRT(fit2, coef=1:4)
>> Error in mglmLevenberg(y, design = design, dispersion = dispersion, offset = offset,  :
>>    BLAS/LAPACK routine 'DGEMM ' gave error code -13
>>
>>
>>
>>
>>
>> The DGELRT object says that is comparing
>>
>> $comparison
>> [1] "grpD20" "grpD40" "grpD60"
>>
>> but what exactly this means?
>>
>> Luis
>>      [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org  <mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>


        [[alternative HTML version deleted]]



------------------------------

Message: 28
Date: Wed, 13 Mar 2013 00:59:20 -0700
From: "Ryan C. Thompson" <rct at thompsonclan.org>
To: Wolfgang Huber <whuber at embl.de>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] DESeq2
Message-ID: <514031D8.6010006 at thompsonclan.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hello,

I noticed the addition of the DESeq2 package a few days ago and had a
look at the new additions. Overall, it looks like an excellent package
(and it runs a lot faster than DESeq 1, too). I have a few questions for
clarification of exactly what methods DESeq2 is using. Specifically, I
notice that the fold change shrinkage is performed in nbinomWaldTest but
not in nbinomChisqTest. Is this just for reasons of
backward-compatibility of results, or is the Chi-squared test logically
incompatible with shrunken GLM coefficients? Secondly, is there any plan
to extend the Wald test to testing contrasts of multiple coefficients or
testing multiple coefficients/contrasts at once in an ANOVA-like test?

Thanks,

-Ryan Thompson

On 03/12/2013 01:51 PM, Wolfgang Huber wrote:
> Dear DESeq users,
>
> Mike Love, Simon Anders and I have been updating the DESeq package. This resulted in the package DESeq2, which is available from the development branch, and scheduled for the next release: http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
>
> For several release cycles, the original package (DESeq) will be maintained at its current functionality, in order to not disrupt the workflows of DESeq users. For new projects, we recommend using DESeq2. Major innovations are:
>
> * Base class: SummarizedExperiment is used as the superclass for storing the data, rather than eSet.  This allows closer integration with upstream workflows involving GRanges and summarizeOverlaps, and facilitates downstream analyses of the genomic regions of interest.
>
> * Simplified workflow: the wrapper function DESeq() performs all steps for a differential expression analysis. The individual steps are of course also accessible.
>
> * More powerful statistics: incorporation of prior distributions into the estimation of dispersions and fold changes (empirical-Bayes shrinkage). The dispersion shrinkage improves power compared to the old DESeq. The fold changes shrinkage help moderate the otherwise large spread in log fold changes for genes with low counts, while it has negligible effect on genes with high counts; it may be particularly useful for visualisation, clustering, classification, ordination (PCA, MDS), similar to the variance-stabilizing transformation in the old DESeq. A Wald test for significance is provided as the default inference method, with the chi-squared test of the previous version is also available. A manuscript is in preparation.
>
> * Normalization: it is possible to provide a matrix of sample- *and* gene-specific normalization factors, which allows the use of normalisation factors from Bioconductor packages such as cqn and EDASeq.
>
> Examples of usage are provided in the vignette, and more details are available in the manual pages (specifically, the DESeq function and estimateDispersions function).
>
> Enjoy -
>
>       Mike, Simon, Wolfgang.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 29
Date: Wed, 13 Mar 2013 09:40:11 +0100
From: Hans-Rudolf Hotz <hrh at fmi.ch>
To: chris_utah <chris.gregg at neuro.utah.edu>
Cc: Bioconductor mailing list <bioconductor at r-project.org>
Subject: Re: [BioC] biomaRt access for NCBIM37 build of mouse genome
Message-ID: <51403B6B.9020700 at fmi.ch>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed

Hi Chris

use one of the archives. 'ensembl67' ("may2012.archive.ensembl.org") was
the last ensembl release containing NCBIM37 (if I remember correctly).

So you can try:

 > library(biomaRt)

 > ensembl67=useMart(host='may2012.archive.ensembl.org',
biomart='ENSEMBL_MART_ENSEMBL')

 > listDatasets(ensembl67)[grep("Mus",
listDatasets(ensembl67)$description),]
                   dataset                  description version
48 mmusculus_gene_ensembl Mus musculus genes (NCBIM37) NCBIM37
 >



Regards, Hans-Rudolf



On 03/13/2013 04:35 AM, chris_utah wrote:
> Hello,
>
> I am using biomaRt.  I want to retrieve transcript start and stop info based on the NCBIM37 (mm9) build using getBM.  I can't find this annotation in listMarts(archive =T).   Is anyone aware of a simple solution?
>
> Thank you.
>
> best wishes,
> Chris
>
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.0.8          limma_3.14.1         rtracklayer_1.18.0   biomaRt_2.14.0       GenomicRanges_1.10.5 IRanges_1.16.4       BiocGenerics_0.4.0
> [8] Gviz_1.2.0
>
> loaded via a namespace (and not attached):
>   [1] annotate_1.36.0        AnnotationDbi_1.20.2   Biobase_2.18.0         Biostrings_2.26.2      biovizBase_1.6.0       bitops_1.0-4.2
>   [7] BSgenome_1.26.1        cluster_1.14.3         colorspace_1.2-0       DBI_0.2-5              DESeq_1.10.1           dichromat_1.2-4
> [13] genefilter_1.40.0      geneplotter_1.36.0     GenomicFeatures_1.10.0 Hmisc_3.10-1           labeling_0.1           lattice_0.20-10
> [19] munsell_0.4            parallel_2.15.2        plyr_1.7.1             RColorBrewer_1.0-5     RCurl_1.95-3           Rsamtools_1.10.1
> [25] RSQLite_0.11.2         scales_0.2.2           splines_2.15.2         stats4_2.15.2          stringr_0.6.1          survival_2.36-14
> [31] tools_2.15.2           XML_3.95-0.1           xtable_1.7-0           zlibbioc_1.4.0
>
>
>
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



------------------------------

Message: 30
Date: Wed, 13 Mar 2013 09:45:43 +0100
From: Michael Love <michaelisaiahlove at gmail.com>
To: rct at thompsonclan.org, Wolfgang Huber <whuber at embl.de>
Cc: bioconductor at r-project.org
Subject: Re: [BioC] DESeq2
Message-ID: <51403CB7.7080002 at gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

hi Ryan,

Thanks for the comments.

On 03/13/13 08:59, Ryan C. Thompson wrote:
> Hello,
>
> I noticed the addition of the DESeq2 package a few days ago and had a
> look at the new additions. Overall, it looks like an excellent package
> (and it runs a lot faster than DESeq 1, too). I have a few questions
> for clarification of exactly what methods DESeq2 is using.
> Specifically, I notice that the fold change shrinkage is performed in
> nbinomWaldTest but not in nbinomChisqTest. Is this just for reasons of
> backward-compatibility of results, or is the Chi-squared test
> logically incompatible with shrunken GLM coefficients?
The latter was our reason.  With shrunken coefficients, the differences
in deviances are no longer distributed as a chi-square. For example,
with simulated null data (non-intercept betas equal to zero), the
differences in deviances will pile up near zero and the resulting
p-values will not be uniformly distributed.

> Secondly, is there any plan to extend the Wald test to testing
> contrasts of multiple coefficients or testing multiple
> coefficients/contrasts at once in an ANOVA-like test?
>
Yes, we are looking into a convenient interface for this.

best,

Mike

> Thanks,
>
> -Ryan Thompson
>
> On 03/12/2013 01:51 PM, Wolfgang Huber wrote:
>> Dear DESeq users,
>>
>> Mike Love, Simon Anders and I have been updating the DESeq package.
>> This resulted in the package DESeq2, which is available from the
>> development branch, and scheduled for the next release:
>> http://www.bioconductor.org/packages/devel/bioc/html/DESeq2.html
>>
>> For several release cycles, the original package (DESeq) will be
>> maintained at its current functionality, in order to not disrupt the
>> workflows of DESeq users. For new projects, we recommend using
>> DESeq2. Major innovations are:
>>
>> * Base class: SummarizedExperiment is used as the superclass for
>> storing the data, rather than eSet.  This allows closer integration
>> with upstream workflows involving GRanges and summarizeOverlaps, and
>> facilitates downstream analyses of the genomic regions of interest.
>>
>> * Simplified workflow: the wrapper function DESeq() performs all
>> steps for a differential expression analysis. The individual steps
>> are of course also accessible.
>>
>> * More powerful statistics: incorporation of prior distributions into
>> the estimation of dispersions and fold changes (empirical-Bayes
>> shrinkage). The dispersion shrinkage improves power compared to the
>> old DESeq. The fold changes shrinkage help moderate the otherwise
>> large spread in log fold changes for genes with low counts, while it
>> has negligible effect on genes with high counts; it may be
>> particularly useful for visualisation, clustering, classification,
>> ordination (PCA, MDS), similar to the variance-stabilizing
>> transformation in the old DESeq. A Wald test for significance is
>> provided as the default inference method, with the chi-squared test
>> of the previous version is also available. A manuscript is in
>> preparation.
>>
>> * Normalization: it is possible to provide a matrix of sample- *and*
>> gene-specific normalization factors, which allows the use of
>> normalisation factors from Bioconductor packages such as cqn and EDASeq.
>>
>> Examples of usage are provided in the vignette, and more details are
>> available in the manual pages (specifically, the DESeq function and
>> estimateDispersions function).
>>
>> Enjoy -
>>
>>     Mike, Simon, Wolfgang.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



------------------------------

Message: 31
Date: Wed, 13 Mar 2013 06:26:46 -0400
From: Sean Davis <sdavis2 at mail.nih.gov>
To: "Marwaha, Shruti (marwahsi)" <marwahsi at mail.uc.edu>
Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>
Subject: Re: [BioC] Error in getGEO command with GSEMatrix=TRUE
Message-ID:
        <CANeAVBn94wf=W_+r-qY+49e=fwk11kcFwzA_WAKz-MyOHbwtYA at mail.gmail.com>
Content-Type: text/plain

On Tue, Mar 12, 2013 at 9:55 PM, Marwaha, Shruti (marwahsi) <
marwahsi at mail.uc.edu> wrote:

> Dear Developers,
>
> I am trying to  get an ExpressionSet from a GSEfile using getGEO command
> with GSEMatrix=TRUE. It sometimes gives an error (Connection was reset) if
> I use GSEMatrix=TRUE however sometimes it works smooth. I have the latest
> version of R, Bioconductor and GEOquery. Please advise if I am missing some
> library or need to change some settings. Thanks in advance.
>
> My Code
> library(Biobase)
> library(GEOquery)
> library(limma)
> gset <- getGEO("GSE27347", GSEMatrix =TRUE)
>
> Error:
> Error in function (type, msg, asError = TRUE)  :
>   Send failure: Connection was reset
>
>
Hi, Marwaha.

While I cannot be sure, I suspect that these are intermittent connection
problems between you and NCBI.  You could try wrapping your getGEO call in
something like this to catch this condition :

gse = NULL
while(!inherits(gse,'list')) {gse=tryCatch(
  getGEO('GSE27347'),
  error=function(e) {Sys.sleep(5); message('retrying'); return(e)})}

This will continuously retry the getGEO call every 5 seconds until the
outcome is a list, which is what we expect getGEO to return in this case.
 Hopefully, this will make your code a bit more robust.

Sean



> Session Info
>
> > sessionInfo()
>
> R version 2.15.2 (2012-10-26)
>
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
> locale:
>
> [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252
>
> [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_India.1252
>
>
>
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> other attached packages:
>
> [1] limma_3.14.4       GEOquery_2.24.1    Biobase_2.18.0
> BiocGenerics_0.4.0
>
>
>
> loaded via a namespace (and not attached):
>
> [1] RCurl_1.95-4.1 tools_2.15.2   XML_3.95-0.2
>
> System Properties
> OS: Windows 7
> RAM: 4GB
> Processor: 2GHz
>
> Methods already attempted
>
> *         I have tried it on both RStudio and GUI
>
> *         I have also tried using the command "setInternet2(use=FALSE)"
> but it doesn't help.
>
> *         I have tried this command with other GSEfiles also and get the
> same error.
>
> *         GEOquery works fine and downloads the GSEfiles if I give
> GSEMatrix=FALSE
>
> *         I have also tried downloading the file and provide the file path
> of the file to getGEO with GSEMatrix=TRUE but it does not give an
> ExpressionSet as output.
>
> Thanks & Regards,
> Shruti Marwaha
>
> Graduate Student
> Systems Biology and Physiology
> University of Cincinnati
> Medical Science Building,
> 231 Albert Sabin Way
> Cincinnati, OH, USA 45267
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]



------------------------------

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor


End of Bioconductor Digest, Vol 121, Issue 13



More information about the Bioconductor mailing list