[BioC] phyloseq/DESeq gives negative transformed values

Sophie Josephine Weiss Sophie.Weiss at colorado.edu
Wed Apr 23 23:44:16 CEST 2014


Thanks Michael,
The entire dataset (attached code and .biom) is negatives - there was an
error of "out of vertex space" as described
here<http://seqanswers.com/forums/showthread.php?p=18620>,
so I tried setting maxk=300 as suggested.
Commands are below.
Thanks again!
Sophie

source("http://bioconductor.org/biocLite.R")
biocLite("phyloseq")
biocLite("DESeq")

library("phyloseq")
library("DESeq")
library("biom")

file = "~/Downloads/study_449_closed_reference_otu_table.biom"
x = import_biom(file)
source("~/Downloads/deseq_varstab.R")
DESeq_data = deseq_varstab(x, method = "blind", sharingMode = "maximum",
fitType = "local", locfit_extra_args=list(maxk=300))
write_biom(make_biom(DESeq_data at otu_table
),"~/Desktop/449_Costello_DESeq.biom.tsv")


On Sat, Apr 19, 2014 at 11:29 AM, Michael Love
<michaelisaiahlove at gmail.com>wrote:

> hi Sophie,
>
> You are getting negative values from the transformation for the
> reasons I mentioned earlier, the transformation is log2-like.
>
> If you want to do something downstream of our software which requires
> non-negative values, below is some example code of how to threshold
> negative values for a matrix in R.
>
> The question of what is the best distance to use for taxa counts, or
> whether ANOVA on variance stabilized data is a good idea for taxa
> counts, depends on the properties of the data, and this is an area of
> active research. As I don't have experience analyzing this kind of
> data, I don't want to make any guesses.
>
> > m <- matrix(-2:5, ncol=2)
> > m
>      [,1] [,2]
> [1,]   -2    2
> [2,]   -1    3
> [3,]    0    4
> [4,]    1    5
> > m[m < 0] <- 0
> > m
>      [,1] [,2]
> [1,]    0    2
> [2,]    0    3
> [3,]    0    4
> [4,]    1    5
>
> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss
> <Sophie.Weiss at colorado.edu> wrote:
> > Hi Mike,
> > Could you please check whether I am running this correctly?  I have
> double
> > checked all the parameters, but for some reason, I am getting negatives
> > using the R script on the attached .biom dataset.  There are no
> replicates
> > in this microbial dataset.
> > Thanks for your advice,
> > Sophie
> >
> >
> > On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss
> > <Sophie.Weiss at colorado.edu> wrote:
> >>
> >> Thanks Mike, that is what I thought.  What if we wanted to perform
> kruskal
> >> wallis, or is it possible to perform anova on the variance-stabilized
> >> matrix?
> >>
> >>
> >> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love
> >> <michaelisaiahlove at gmail.com> wrote:
> >>>
> >>> hi Sophie,
> >>>
> >>> We recommend using the standard DESeq() function for differential
> >>> expression.
> >>>
> >>> This is mentioned in the first line of the vignette section on
> >>> transformations:
> >>>
> >>> "In order to test for diff erential expression, we operate on raw
> >>> counts and use discrete distributions as
> >>> described in the previous section"
> >>>
> >>> Also, in the McMurdie and Holmes, they are using the DESeq() function,
> >>> as shown in their supplemental material:
> >>>
> >>>
> >>>
> http://joey711.github.io/waste-not-supplemental/simulation-differential-abundance/simulation-differential-abundance-server.html
> >>>
> >>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss
> >>> <Sophie.Weiss at colorado.edu> wrote:
> >>> > Please help with this?  Thanks again.
> >>> >
> >>> >
> >>> > On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss
> >>> > <Sophie.Weiss at colorado.edu> wrote:
> >>> >>
> >>> >> Thanks again Mike - would it be ok to do chi-2 and other
> significance
> >>> >> tests on the DESeq transformed datasets using independent code, or
> is
> >>> >> it
> >>> >> necessary to do the differential expression tests strictly within
> >>> >> DESeq2?
> >>> >>
> >>> >> Sophie
> >>> >>
> >>> >>
> >>> >> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love
> >>> >> <michaelisaiahlove at gmail.com> wrote:
> >>> >>>
> >>> >>> hi Sophie,
> >>> >>>
> >>> >>> The VST code is the same in DESeq and DESeq2. The estimation of
> >>> >>> dispersion is slightly different (details are in the vignette
> >>> >>> "Changes
> >>> >>> from DESeq to DESeq2"), but the fitted line (which is used by the
> >>> >>> VST)
> >>> >>> should be very similar.
> >>> >>>
> >>> >>> Mike
> >>> >>>
> >>> >>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss
> >>> >>> <Sophie.Weiss at colorado.edu> wrote:
> >>> >>> > Hi Mike,
> >>> >>> > The McMurdie and Holmes paper uses DESeq for matrix
> normalization -
> >>> >>> > do
> >>> >>> > you
> >>> >>> > think that is ok, or would it be better to use DESeq 2?
> >>> >>> > Thanks again,
> >>> >>> > Sophie
> >>> >>> >
> >>> >>> >
> >>> >>> > On Mon, Apr 14, 2014 at 3:40 PM, Michael Love
> >>> >>> > <michaelisaiahlove at gmail.com>
> >>> >>> > wrote:
> >>> >>> >>
> >>> >>> >> hi Sophie,
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss
> >>> >>> >> <Sophie.Weiss at colorado.edu> wrote:
> >>> >>> >> >
> >>> >>> >> > Hi Mike,
> >>> >>> >> > Thanks for the references.  By "threshold at 0" do you mean
> set
> >>> >>> >> > any
> >>> >>> >> > negative values equal to 0?
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> yes.
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> >
> >>> >>> >> > Do you think this is the best approach?
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> I haven't explored this area, and would defer to the McMurdie
> and
> >>> >>> >> Holmes paper for the best combinations of distance and
> >>> >>> >> transformation.
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> >
> >>> >>> >> > Thanks again,
> >>> >>> >> > Sophie
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> > On Mon, Apr 14, 2014 at 11:01 AM, Michael Love
> >>> >>> >> > <michaelisaiahlove at gmail.com> wrote:
> >>> >>> >> >>
> >>> >>> >> >> I tried poking around here
> >>> >>> >> >> http://joey711.github.io/phyloseq/distance
> >>> >>> >> >> but couldn't see if the authors did anything for distances
> >>> >>> >> >> requiring
> >>> >>> >> >> non-negative data. It appears
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >> >>
> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003531
> >>> >>> >> >> that VST was tested with Bray-Curtis distance. I think the
> >>> >>> >> >> distance
> >>> >>> >> >> is
> >>> >>> >> >> designed for counts, but you could always threshold at 0 to
> >>> >>> >> >> insist
> >>> >>> >> >> that the
> >>> >>> >> >> log2-like quantity act more like a count.
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >> >>
> >>> >>> >> >> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss
> >>> >>> >> >> <Sophie.Weiss at colorado.edu> wrote:
> >>> >>> >> >>>
> >>> >>> >> >>> Hi Mike,
> >>> >>> >> >>> Thanks for explaining more.  I am used to working with
> >>> >>> >> >>> rarefied
> >>> >>> >> >>> microbial datasets, that is why.  Instead of rarefying I
> would
> >>> >>> >> >>> like to use
> >>> >>> >> >>> the DESeq method.
> >>> >>> >> >>>
> >>> >>> >> >>> How would you then suggest going about calculating
> bray-curtis
> >>> >>> >> >>> distance, or summarized taxa diagrams with these new
> >>> >>> >> >>> transformed
> >>> >>> >> >>> matrices
> >>> >>> >> >>> with negative values?
> >>> >>> >> >>> Thanks again,
> >>> >>> >> >>> Sophie
> >>> >>> >> >>>
> >>> >>> >> >>>
> >>> >>> >> >>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love
> >>> >>> >> >>> <michaelisaiahlove at gmail.com> wrote:
> >>> >>> >> >>>>
> >>> >>> >> >>>> hi Sophie,
> >>> >>> >> >>>>
> >>> >>> >> >>>> Can you explain why you don't want negative values in the
> >>> >>> >> >>>> transformed
> >>> >>> >> >>>> values?  Adding one to the raw counts is not sufficient. I
> >>> >>> >> >>>> should
> >>> >>> >> >>>> have said
> >>> >>> >> >>>> in my previous email, "the expected counts on the common
> >>> >>> >> >>>> scale".
> >>> >>> >> >>>> If the
> >>> >>> >> >>>> size factor for a sample is 2, then an expected count of 1
> >>> >>> >> >>>> leads
> >>> >>> >> >>>> to an
> >>> >>> >> >>>> expected count of 1/2 on the common scale (after accounting
> >>> >>> >> >>>> for
> >>> >>> >> >>>> size
> >>> >>> >> >>>> factors).
> >>> >>> >> >>>>
> >>> >>> >> >>>>
> >>> >>> >> >>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss
> >>> >>> >> >>>> <Sophie.Weiss at colorado.edu> wrote:
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> Hi Mike,
> >>> >>> >> >>>>> Thanks for your reply!  Ok, makes sense, but I added 1 to
> >>> >>> >> >>>>> all my
> >>> >>> >> >>>>> matrix values, so the lowest value in the matrix is 1 -
> >>> >>> >> >>>>> there
> >>> >>> >> >>>>> are still
> >>> >>> >> >>>>> negatives?
> >>> >>> >> >>>>> Thanks again,
> >>> >>> >> >>>>> Sophie
> >>> >>> >> >>>>>
> >>> >>> >> >>>>>
> >>> >>> >> >>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love
> >>> >>> >> >>>>> <michaelisaiahlove at gmail.com> wrote:
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> hi Sophie,
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> The transformations in DESeq and DESeq2 are log2-like
> >>> >>> >> >>>>>> transformations. If the expected count is between 0 and
> 1,
> >>> >>> >> >>>>>> the
> >>> >>> >> >>>>>> values can be
> >>> >>> >> >>>>>> negative, this does not indicate a problem.
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> Mike
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss
> >>> >>> >> >>>>>> <Sophie.Weiss at colorado.edu> wrote:
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> Hello,
> >>> >>> >> >>>>>>> I have microbiome data with no replicates, from
> different
> >>> >>> >> >>>>>>> conditions.  I am
> >>> >>> >> >>>>>>> trying to transform the data using the DESeq method, as
> >>> >>> >> >>>>>>> described
> >>> >>> >> >>>>>>> in
> >>> >>> >> >>>>>>> McMurdie and Holmes 2014.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> The attached file is the definition I am using, as per
> the
> >>> >>> >> >>>>>>> supplemental
> >>> >>> >> >>>>>>> info in McMurdie and Holmes 2014, and the .biom file I
> am
> >>> >>> >> >>>>>>> using.
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> Thank you for your help,
> >>> >>> >> >>>>>>> Sophie
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>> _______________________________________________
> >>> >>> >> >>>>>>> Bioconductor mailing list
> >>> >>> >> >>>>>>> Bioconductor at r-project.org
> >>> >>> >> >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> >>> >> >>>>>>> Search the archives:
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>>
> >>> >>> >> >>>>>>>
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>>
> >>> >>> >> >>>>>
> >>> >>> >> >>>>
> >>> >>> >> >>>
> >>> >>> >> >>
> >>> >>> >> >
> >>> >>> >
> >>> >>> >
> >>> >>
> >>> >>
> >>> >
> >>
> >>
> >
>


More information about the Bioconductor mailing list