[BioC] :ERROR: Need the design matrix for GLM

Koji Kadota kadota at iu.a.u-tokyo.ac.jp
Fri Apr 18 02:55:17 CEST 2014


Dear Gordon,

I am the corresponding author of TCC paper.
What Panka want to do is not the same as the default procedure in edgeR.
As explicitly described in TCC paper, an differentially expressed gene
elimination strategy (DEGES) implemented in TCC is important for obtaining
more accurate DE result.
Please read the original paper.
http://www.biomedcentral.com/1471-2105/14/219

Koji

P.S.
Dear Sun, please send again this mail to the Bioconductor mailing list if I
could not ..., thanks in advance.
------------------------------------------
Koji Kadota, Ph.D., Associate Professor
Agricultural Bioinformatics Research Unit, 
Graduate School of Agricultural and Life Sciences,
The University of Tokyo 
1-1-1, Yayoi, Bunkyo-ku Tokyo, 113-8657, JAPAN
E-mail: kadota at iu.a.u-tokyo.ac.jp
Web: http://www.iu.a.u-tokyo.ac.jp/~kadota
------------------------------------------



> -----Original Message-----
> From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
> Sent: Friday, April 18, 2014 9:36 AM
> To: Pankaj Agarwal
> Cc: Bioconductor mailing list; kadota at bi.a.u-tokyo.ac.jp
> Subject: TCC::ERROR: Need the design matrix for GLM
> 
> Dear Panka,
> 
> It seems as if you are just using the TCC package to call methods from the
> edgeR package indirectly.
> 
> Why not use the edgeR package directly?  That would probably be easier and
> you would have a more direct understanding of the methods being used.
> Your experiment is almost identical to the oral carcinoma case study in
> the edgeR User's Guide.
> 
> Best wishes
> Gordon
> 
> 
> > Date: Tue, 15 Apr 2014 13:51:17 +0000
> > From: Pankaj Agarwal <p.agarwal at duke.edu>
> > To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> > Cc: "kadota at bi.a.u-tokyo.ac.jp" <kadota at bi.a.u-tokyo.ac.jp>
> > Subject: [BioC] TCC::ERROR: Need the design matrix for GLM.
> >
> > Hi,
> >
> > I have a rna-seq data consisting of matched tumor/normal samples from
> two patients.  For normalization of the counts I am following the steps
> in the TCC vignette section "3.3 Normalization of two-group count data
> without replicates (paired)".  The output from the commands are as
follows:
> >
> >>  data=read.delim("count_bt2_iGenomes_Ensembl.tsv")
> >
> >> head(data)
> >                A.sorted.bam B.sorted.bam
> > ENSG00000000003                               2400
> 1130
> > ENSG00000000005                                  2
> 3
> > ENSG00000000419                               1819
> 575
> > ENSG00000000457                               1317
> 1262
> > ENSG00000000460                                799
> 1743
> > ENSG00000000938                                203
> 41
> >                C.sorted.bam D.sorted.bam
> > ENSG00000000003                          12
> 72
> > ENSG00000000005                           0
> 0
> > ENSG00000000419                         938
> 1608
> > ENSG00000000457                         821
> 1469
> > ENSG00000000460                         367
> 800
> > ENSG00000000938                       33303
> 16355
> >
> >> group <- c(1,1,2,2)
> >> pair <- c(1,2,1,2)
> >>  c1 <- data.frame(group=group, pair=pair)
> >> colnames(data) <- c("T_BRPC13.1118", "T_BRPC_13.764", "N_DU04_PBMC",
> >> "N_DU06_PBMC")  tcc <- new("TCC", data, c1) tcc <-
> >> calcNormFactors(tcc, norm.method="tmm", test.method="edger",
> >> iteration=1, FDR=0.1, floorPDEG=0.05, paired=TRUE)
> > TCC::INFO: Calculating normalization factors using DEGES
> > TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 ) Error in
> > .testByEdger.3(design = design, coef = coef, contrast = contrast) :
> >  TCC::ERROR: Need the design matrix for GLM.
> >
> > Reading further for steps needed for edgeR without TCC I saw something
> related to design and tried it, but got the same error:
> >
> >> design <- model.matrix(~ group + pair)  tcc <- new("TCC", data, c1)
> >> tcc <- calcNormFactors(tcc, norm.method="tmm", test.method="edger",
> >> iteration=1, FDR=0.1, floorPDEG=0.05, paired=TRUE)
> > TCC::INFO: Calculating normalization factors using DEGES
> > TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 ) Error in
> > .testByEdger.3(design = design, coef = coef, contrast = contrast) :
> >  TCC::ERROR: Need the design matrix for GLM.
> >
> > I would appreciate help with understanding the cause of the error.
> >
> > The output from sessionInfo() and package description is as follows:
> >
> >> sessionInfo()
> > R version 3.0.3 (2014-03-06)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> > [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> > [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >>
> >> packageDescription("TCC")
> > Package: TCC
> > Type: Package
> > Title: TCC: Differential expression analysis for tag count data with
> >        robust normalization strategies
> > Version: 1.2.0
> > Author: Jianqiang Sun, Tomoaki Nishiyama, Kentaro Shimizu, and Koji
> >        Kadota
> > Maintainer: Jianqiang Sun <wukong at bi.a.u-tokyo.ac.jp>, Tomoaki
> >        Nishiyama <tomoakin at staff.kanazawa-u.ac.jp>
> > Description: This package provides a series of functions for performing
> >        differential expression analysis from RNA-seq count data using
> >        robust normalization strategy (called DEGES). The basic idea of
> >        DEGES is that potential differentially expressed genes or
> >        transcripts (DEGs) among compared samples should be removed
> >        before data normalization to obtain a well-ranked gene list
> >        where true DEGs are top-ranked and non-DEGs are bottom ranked.
> >        This can be done by performing a multi-step normalization
> >        strategy (called DEGES for DEG elimination strategy). A major
> >        characteristic of TCC is to provide the robust normalization
> >        methods for several kinds of count data (two-group with or
> >        without replicates, multi-group/multi-factor, and so on) by
> >        virtue of the use of combinations of functions in other
> >        sophisticated packages (especially edgeR, DESeq, and baySeq).
> > Depends: R (>= 2.15), methods, DESeq, edgeR, baySeq, ROC
> > Imports: EBSeq, samr
> > Suggests: RUnit, BiocGenerics
> > Enhances: snow
> > biocViews: HighThroughputSequencing, DifferentialExpression, RNAseq
> > License: GPL-2
> > Copyright: Authors listed above
> > Packaged: 2013-10-15 05:31:33 UTC; biocbuild
> > Built: R 3.0.3; ; 2014-03-31 20:00:49 UTC; unix
> >
> > -- File:
> > /general/installs/R/R-3.0.3/lib64/R/library/TCC/Meta/package.rds
> >
> > Thank you,
> >
> > - Pankaj
> > --------------------------------------
> > Pankaj Agarwal, M.S
> > Bioinformatician
> > Bioinformatics Shared Resource
> > Duke Cancer Institute
> > Duke University
> > 919-681-6573
> > p.agarwal at duke.edu
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}



More information about the Bioconductor mailing list