[BioC] phyloseq/DESeq gives negative transformed values

Michael Love michaelisaiahlove at gmail.com
Thu Apr 24 01:02:50 CEST 2014


Hi Sophie,

On Wed, Apr 23, 2014 at 5:44 PM, Sophie Josephine Weiss
<Sophie.Weiss at colorado.edu> wrote:
>
> Thanks Michael,
> The entire dataset (attached code and .biom) is negatives


I don't see that the entire dataset is all negatives. I get the same
percent of negatives as you had zeros in the original counts:

z <- otu_table(x)
zz <- otu_table(DESeq_data)

> table(as.vector(z) > 0) / prod(dim(z))

     FALSE       TRUE
0.98022416 0.01977584

> table(as.vector(zz) > 0) / prod(dim(zz))

     FALSE       TRUE
0.98022416 0.01977584

> summary(as.vector(zz))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -1.200  -1.197  -1.197  -1.126  -1.197 280.300

> summary(as.vector(zz)[as.vector(zz) > 0])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.304   1.304   1.304   2.409   2.476 280.300

If you want to do something downstream which requires positive values,
set all the negative values to 0 as I wrote previously Or you can add
the absolute value of the smallest value, so if the smallest value is
-1.200, just add 1.2 to the matrix. I don't have any recommendations
though for what is a good idea here.

Mike



>
> - there was an error of "out of vertex space" as described here, so I tried setting maxk=300 as suggested.
> Commands are below.
> Thanks again!
> Sophie
>
> source("http://bioconductor.org/biocLite.R")
> biocLite("phyloseq")
> biocLite("DESeq")
>
> library("phyloseq")
> library("DESeq")
> library("biom")
>
> file = "~/Downloads/study_449_closed_reference_otu_table.biom"
> x = import_biom(file)
> source("~/Downloads/deseq_varstab.R")
> DESeq_data = deseq_varstab(x, method = "blind", sharingMode = "maximum", fitType = "local", locfit_extra_args=list(maxk=300))
> write_biom(make_biom(DESeq_data at otu_table),"~/Desktop/449_Costello_DESeq.biom.tsv")
>
>
> On Sat, Apr 19, 2014 at 11:29 AM, Michael Love <michaelisaiahlove at gmail.com> wrote:
>>
>> hi Sophie,
>>
>> You are getting negative values from the transformation for the
>> reasons I mentioned earlier, the transformation is log2-like.
>>
>> If you want to do something downstream of our software which requires
>> non-negative values, below is some example code of how to threshold
>> negative values for a matrix in R.
>>
>> The question of what is the best distance to use for taxa counts, or
>> whether ANOVA on variance stabilized data is a good idea for taxa
>> counts, depends on the properties of the data, and this is an area of
>> active research. As I don't have experience analyzing this kind of
>> data, I don't want to make any guesses.
>>
>> > m <- matrix(-2:5, ncol=2)
>> > m
>>      [,1] [,2]
>> [1,]   -2    2
>> [2,]   -1    3
>> [3,]    0    4
>> [4,]    1    5
>> > m[m < 0] <- 0
>> > m
>>      [,1] [,2]
>> [1,]    0    2
>> [2,]    0    3
>> [3,]    0    4
>> [4,]    1    5
>>
>> On Fri, Apr 18, 2014 at 3:32 PM, Sophie Josephine Weiss
>> <Sophie.Weiss at colorado.edu> wrote:
>> > Hi Mike,
>> > Could you please check whether I am running this correctly?  I have double
>> > checked all the parameters, but for some reason, I am getting negatives
>> > using the R script on the attached .biom dataset.  There are no replicates
>> > in this microbial dataset.
>> > Thanks for your advice,
>> > Sophie
>> >
>> >
>> > On Wed, Apr 16, 2014 at 4:02 PM, Sophie Josephine Weiss
>> > <Sophie.Weiss at colorado.edu> wrote:
>> >>
>> >> Thanks Mike, that is what I thought.  What if we wanted to perform kruskal
>> >> wallis, or is it possible to perform anova on the variance-stabilized
>> >> matrix?
>> >>
>> >>
>> >> On Wed, Apr 16, 2014 at 2:29 PM, Michael Love
>> >> <michaelisaiahlove at gmail.com> wrote:
>> >>>
>> >>> hi Sophie,
>> >>>
>> >>> We recommend using the standard DESeq() function for differential
>> >>> expression.
>> >>>
>> >>> This is mentioned in the first line of the vignette section on
>> >>> transformations:
>> >>>
>> >>> "In order to test for diff erential expression, we operate on raw
>> >>> counts and use discrete distributions as
>> >>> described in the previous section"
>> >>>
>> >>> Also, in the McMurdie and Holmes, they are using the DESeq() function,
>> >>> as shown in their supplemental material:
>> >>>
>> >>>
>> >>> http://joey711.github.io/waste-not-supplemental/simulation-differential-abundance/simulation-differential-abundance-server.html
>> >>>
>> >>> On Wed, Apr 16, 2014 at 3:22 PM, Sophie Josephine Weiss
>> >>> <Sophie.Weiss at colorado.edu> wrote:
>> >>> > Please help with this?  Thanks again.
>> >>> >
>> >>> >
>> >>> > On Mon, Apr 14, 2014 at 6:02 PM, Sophie Josephine Weiss
>> >>> > <Sophie.Weiss at colorado.edu> wrote:
>> >>> >>
>> >>> >> Thanks again Mike - would it be ok to do chi-2 and other significance
>> >>> >> tests on the DESeq transformed datasets using independent code, or is
>> >>> >> it
>> >>> >> necessary to do the differential expression tests strictly within
>> >>> >> DESeq2?
>> >>> >>
>> >>> >> Sophie
>> >>> >>
>> >>> >>
>> >>> >> On Mon, Apr 14, 2014 at 5:41 PM, Michael Love
>> >>> >> <michaelisaiahlove at gmail.com> wrote:
>> >>> >>>
>> >>> >>> hi Sophie,
>> >>> >>>
>> >>> >>> The VST code is the same in DESeq and DESeq2. The estimation of
>> >>> >>> dispersion is slightly different (details are in the vignette
>> >>> >>> "Changes
>> >>> >>> from DESeq to DESeq2"), but the fitted line (which is used by the
>> >>> >>> VST)
>> >>> >>> should be very similar.
>> >>> >>>
>> >>> >>> Mike
>> >>> >>>
>> >>> >>> On Mon, Apr 14, 2014 at 6:27 PM, Sophie Josephine Weiss
>> >>> >>> <Sophie.Weiss at colorado.edu> wrote:
>> >>> >>> > Hi Mike,
>> >>> >>> > The McMurdie and Holmes paper uses DESeq for matrix normalization -
>> >>> >>> > do
>> >>> >>> > you
>> >>> >>> > think that is ok, or would it be better to use DESeq 2?
>> >>> >>> > Thanks again,
>> >>> >>> > Sophie
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > On Mon, Apr 14, 2014 at 3:40 PM, Michael Love
>> >>> >>> > <michaelisaiahlove at gmail.com>
>> >>> >>> > wrote:
>> >>> >>> >>
>> >>> >>> >> hi Sophie,
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> On Mon, Apr 14, 2014 at 1:15 PM, Sophie Josephine Weiss
>> >>> >>> >> <Sophie.Weiss at colorado.edu> wrote:
>> >>> >>> >> >
>> >>> >>> >> > Hi Mike,
>> >>> >>> >> > Thanks for the references.  By "threshold at 0" do you mean set
>> >>> >>> >> > any
>> >>> >>> >> > negative values equal to 0?
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> yes.
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> >
>> >>> >>> >> > Do you think this is the best approach?
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> I haven't explored this area, and would defer to the McMurdie and
>> >>> >>> >> Holmes paper for the best combinations of distance and
>> >>> >>> >> transformation.
>> >>> >>> >>
>> >>> >>> >>
>> >>> >>> >> >
>> >>> >>> >> > Thanks again,
>> >>> >>> >> > Sophie
>> >>> >>> >> >
>> >>> >>> >> >
>> >>> >>> >> > On Mon, Apr 14, 2014 at 11:01 AM, Michael Love
>> >>> >>> >> > <michaelisaiahlove at gmail.com> wrote:
>> >>> >>> >> >>
>> >>> >>> >> >> I tried poking around here
>> >>> >>> >> >> http://joey711.github.io/phyloseq/distance
>> >>> >>> >> >> but couldn't see if the authors did anything for distances
>> >>> >>> >> >> requiring
>> >>> >>> >> >> non-negative data. It appears
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003531
>> >>> >>> >> >> that VST was tested with Bray-Curtis distance. I think the
>> >>> >>> >> >> distance
>> >>> >>> >> >> is
>> >>> >>> >> >> designed for counts, but you could always threshold at 0 to
>> >>> >>> >> >> insist
>> >>> >>> >> >> that the
>> >>> >>> >> >> log2-like quantity act more like a count.
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >>
>> >>> >>> >> >> On Mon, Apr 14, 2014 at 12:23 PM, Sophie Josephine Weiss
>> >>> >>> >> >> <Sophie.Weiss at colorado.edu> wrote:
>> >>> >>> >> >>>
>> >>> >>> >> >>> Hi Mike,
>> >>> >>> >> >>> Thanks for explaining more.  I am used to working with
>> >>> >>> >> >>> rarefied
>> >>> >>> >> >>> microbial datasets, that is why.  Instead of rarefying I would
>> >>> >>> >> >>> like to use
>> >>> >>> >> >>> the DESeq method.
>> >>> >>> >> >>>
>> >>> >>> >> >>> How would you then suggest going about calculating bray-curtis
>> >>> >>> >> >>> distance, or summarized taxa diagrams with these new
>> >>> >>> >> >>> transformed
>> >>> >>> >> >>> matrices
>> >>> >>> >> >>> with negative values?
>> >>> >>> >> >>> Thanks again,
>> >>> >>> >> >>> Sophie
>> >>> >>> >> >>>
>> >>> >>> >> >>>
>> >>> >>> >> >>> On Mon, Apr 14, 2014 at 7:17 AM, Michael Love
>> >>> >>> >> >>> <michaelisaiahlove at gmail.com> wrote:
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> hi Sophie,
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> Can you explain why you don't want negative values in the
>> >>> >>> >> >>>> transformed
>> >>> >>> >> >>>> values?  Adding one to the raw counts is not sufficient. I
>> >>> >>> >> >>>> should
>> >>> >>> >> >>>> have said
>> >>> >>> >> >>>> in my previous email, "the expected counts on the common
>> >>> >>> >> >>>> scale".
>> >>> >>> >> >>>> If the
>> >>> >>> >> >>>> size factor for a sample is 2, then an expected count of 1
>> >>> >>> >> >>>> leads
>> >>> >>> >> >>>> to an
>> >>> >>> >> >>>> expected count of 1/2 on the common scale (after accounting
>> >>> >>> >> >>>> for
>> >>> >>> >> >>>> size
>> >>> >>> >> >>>> factors).
>> >>> >>> >> >>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>> On Sun, Apr 13, 2014 at 11:50 PM, Sophie Josephine Weiss
>> >>> >>> >> >>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> Hi Mike,
>> >>> >>> >> >>>>> Thanks for your reply!  Ok, makes sense, but I added 1 to
>> >>> >>> >> >>>>> all my
>> >>> >>> >> >>>>> matrix values, so the lowest value in the matrix is 1 -
>> >>> >>> >> >>>>> there
>> >>> >>> >> >>>>> are still
>> >>> >>> >> >>>>> negatives?
>> >>> >>> >> >>>>> Thanks again,
>> >>> >>> >> >>>>> Sophie
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>> On Sun, Apr 13, 2014 at 9:01 PM, Michael Love
>> >>> >>> >> >>>>> <michaelisaiahlove at gmail.com> wrote:
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> hi Sophie,
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> The transformations in DESeq and DESeq2 are log2-like
>> >>> >>> >> >>>>>> transformations. If the expected count is between 0 and 1,
>> >>> >>> >> >>>>>> the
>> >>> >>> >> >>>>>> values can be
>> >>> >>> >> >>>>>> negative, this does not indicate a problem.
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> Mike
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>> On Sun, Apr 13, 2014 at 5:17 PM, Sophie Josephine Weiss
>> >>> >>> >> >>>>>> <Sophie.Weiss at colorado.edu> wrote:
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> Hello,
>> >>> >>> >> >>>>>>> I have microbiome data with no replicates, from different
>> >>> >>> >> >>>>>>> conditions.  I am
>> >>> >>> >> >>>>>>> trying to transform the data using the DESeq method, as
>> >>> >>> >> >>>>>>> described
>> >>> >>> >> >>>>>>> in
>> >>> >>> >> >>>>>>> McMurdie and Holmes 2014.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> The attached file is the definition I am using, as per the
>> >>> >>> >> >>>>>>> supplemental
>> >>> >>> >> >>>>>>> info in McMurdie and Holmes 2014, and the .biom file I am
>> >>> >>> >> >>>>>>> using.
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> Thank you for your help,
>> >>> >>> >> >>>>>>> Sophie
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> _______________________________________________
>> >>> >>> >> >>>>>>> Bioconductor mailing list
>> >>> >>> >> >>>>>>> Bioconductor at r-project.org
>> >>> >>> >> >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>> >>> >> >>>>>>> Search the archives:
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>>
>> >>> >>> >> >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>>
>> >>> >>> >> >>>>>
>> >>> >>> >> >>>>
>> >>> >>> >> >>>
>> >>> >>> >> >>
>> >>> >>> >> >
>> >>> >>> >
>> >>> >>> >
>> >>> >>
>> >>> >>
>> >>> >
>> >>
>> >>
>> >
>
>



More information about the Bioconductor mailing list