[BioC] EdgeR repeated measurment

Sat Apr 5 09:09:55 CEST 2014

> Date: Fri,  4 Apr 2014 01:06:47 -0700 (PDT)
> From: "teshe [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, ttb at uin.no
> Subject: [BioC] EdgeR repeated measurment
>
>
> I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs.
> Hypotheses of test are:
>
> 1.Mean miRNA expression is the same regardless of quality at any time point
> 2.Mean miRNA expression is the same at different time point within the same quality group
>
> Below is the detail of the analysis, which I used in edgeR.
>
> My question is, am I in the write track?

I don't have time to go through your analysis, but I'd recommend that you 
used the built-in function cpm() to compute 'm' near the beginning.

> How can I check whether the assumption of repeated measurments met or 
> not?

You are not making distributional assumptions about the repeated measures, 
so there is nothing to check.

> As you can see from the head of the data table individual S4 looks an 
> outlier, which is also clear after normalization in a plot (plot is not 
> shown). My question is that should I remove it? Is there any effect in 
> case of removing or missing samples? Does sample imbalance affect the 
> model?

edgeR is able to handle imbalanced numbers of samples without any problem.

Best wishes
Gordon

> R version 3.1.0 beta (2014-03-28 r65330)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
>> y <- read.table(file="salmon",row.names="miRNA", header=TRUE)
>> head(y)
>           S2 S12 S13 S19 S20  S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44
> let-7a 50  17   7  13  18 570 40 25  89 126  36  22  12  21  23  59   6  27
> let-7b  3   3   1   2   0  54  6  2  11   7   7   0   4   5   3  22   0   8
> let-7d  0   1   0   0   1  21  0  2   0   0   1   0   0   1   0   6   0   1
> let-7e  0   0   1   3   1 491  4  2  15   8  11   0   6   1   0   3   0   2
> let-7g 40   1   1   1   1 162  3  4  10  57  13   2   1   8   2   8   0   7
> let-7h  0   0   0   0   0 271  1  2   0   1   2   0   1   0   0   2   2   0
>           S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90
> let-7a 110  18   5  42   1  24  14   2  11  47    5  48  28  30  26   6  18
> let-7b  15   1   4   5   1   3   0   0   2  11    0  30   5   1   1   0   3
> let-7d   1   0   0   0   0   0   2   0   0   1    0   0   2   0   0   0   0
> let-7e  12   2   2  33   0   7   5   0   3  21    3   1  18  12   2   0  11
> let-7g   4   1   0  11   2   9   6   0   0  19    5  10   7   5   4   2   1
> let-7h   0   0   1   9   0   0   0   0   2   3    1   0   0   0   0   0   3
>> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day","0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day"))
>> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low"))
>> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448"))
>> dim(y)
> [1] 140  35
>> dge <- DGEList(counts=y)
>> dge <- calcNormFactors(dge)
>> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size)
>> keep <- rowSums(m > 1) >= 3
>> dge <- dge[keep,]
>> dim(dge)
> [1] 135  35
>> indv  <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6"))
>> targets <- data.frame(quality,indv,time)
>> targets
>   quality indv  time
> 1     Good    1  0day
> 2     Good    2  0day
> 3     Good    3  0day
> 4     Good    4  0day
> 5     Good    5  0day
> 6      Low    1  0day
> 7      Low    2  0day
> 8      Low    3  0day
> 9      Low    4  0day
> 10     Low    5  0day
> 11     Low    6  0day
> 12    Good    1 14day
> 13    Good    2 14day
> 14    Good    3 14day
> 15    Good    4 14day
> 16    Good    5 14day
> 17    Good    6 14day
> 18     Low    1 14day
> 19     Low    2 14day
> 20     Low    3 14day
> 21     Low    4 14day
> 22     Low    5 14day
> 23     Low    6 14day
> 24    Good    1 28day
> 25    Good    2 28day
> 26    Good    3 28day
> 27    Good    4 28day
> 28    Good    5 28day
> 29    Good    6 28day
> 30     Low    1 28day
> 31     Low    2 28day
> 32     Low    3 28day
> 33     Low    4 28day
> 34     Low    5 28day
> 35     Low    6 28day
>> quality <- factor(targets$quality, levels=c("Good","Low"))
>> time <- factor(targets$time, levels=c("0day","14day","28day"))
>> design <- model.matrix(~quality+quality:indv+quality:time)
>> dge <- estimateGLMCommonDisp(dge,design)
>> dge <- estimateGLMTrendedDisp(dge,design)
>> dge <- estimateGLMTagwiseDisp(dge,design)
>> fit <- glmFit(dge, design)
>> lrt <- glmLRT(fit, coef="qualityGood:time14day")
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Good_14dayvs0day.csv")
>
>> lrt <- glmLRT(fit, coef="qualityGood:time28day")
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Good_28dayvs0day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Good_28dayvs14day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Low_14dayvs0day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Low_28dayvs0day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Low_28dayvs14day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="28day_LowvsGood.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="14day_LowvsGood.csv")
>
>> lrt <- glmLRT(fit, coef="qualityLow")
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="0day_LowvsGood.csv")
>
> -- output of sessionInfo():
>
> R version 3.1.0 beta (2014-03-28 r65330)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.4.2   limma_3.18.13
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}