[BioC] EdgeR repeated measurment
teshe [guest]
guest at bioconductor.org
Fri Apr 4 10:06:47 CEST 2014
I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs.
Hypotheses of test are:
1.Mean miRNA expression is the same regardless of quality at any time point
2.Mean miRNA expression is the same at different time point within the same quality group
Below is the detail of the analysis, which I used in edgeR.
My question is, am I in the write track? How can I check whether the assumption of repeated measurments met or not?
As you can see from the head of the data table individual S4 looks an outlier, which is also clear after normalization in a plot (plot is not shown). My question is that should I remove it? Is there any effect in case of removing or missing samples? Does sample imbalance affect the model?
R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)
> y <- read.table(file="salmon",row.names="miRNA", header=TRUE)
> head(y)
S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44
let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27
let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8
let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1
let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2
let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7
let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0
S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90
let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18
let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3
let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0
let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11
let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1
let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3
> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day","0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day"))
> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low"))
> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448"))
> dim(y)
[1] 140 35
> dge <- DGEList(counts=y)
> dge <- calcNormFactors(dge)
> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size)
> keep <- rowSums(m > 1) >= 3
> dge <- dge[keep,]
> dim(dge)
[1] 135 35
> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6"))
> targets <- data.frame(quality,indv,time)
> targets
quality indv time
1 Good 1 0day
2 Good 2 0day
3 Good 3 0day
4 Good 4 0day
5 Good 5 0day
6 Low 1 0day
7 Low 2 0day
8 Low 3 0day
9 Low 4 0day
10 Low 5 0day
11 Low 6 0day
12 Good 1 14day
13 Good 2 14day
14 Good 3 14day
15 Good 4 14day
16 Good 5 14day
17 Good 6 14day
18 Low 1 14day
19 Low 2 14day
20 Low 3 14day
21 Low 4 14day
22 Low 5 14day
23 Low 6 14day
24 Good 1 28day
25 Good 2 28day
26 Good 3 28day
27 Good 4 28day
28 Good 5 28day
29 Good 6 28day
30 Low 1 28day
31 Low 2 28day
32 Low 3 28day
33 Low 4 28day
34 Low 5 28day
35 Low 6 28day
> quality <- factor(targets$quality, levels=c("Good","Low"))
> time <- factor(targets$time, levels=c("0day","14day","28day"))
> design <- model.matrix(~quality+quality:indv+quality:time)
> dge <- estimateGLMCommonDisp(dge,design)
> dge <- estimateGLMTrendedDisp(dge,design)
> dge <- estimateGLMTagwiseDisp(dge,design)
> fit <- glmFit(dge, design)
> lrt <- glmLRT(fit, coef="qualityGood:time14day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_14dayvs0day.csv")
> lrt <- glmLRT(fit, coef="qualityGood:time28day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_14dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="28day_LowvsGood.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="14day_LowvsGood.csv")
> lrt <- glmLRT(fit, coef="qualityLow")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="0day_LowvsGood.csv")
I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs.
Hypotheses of test are:
1.Mean miRNA expression is the same regardless of quality at any time point
2.Mean miRNA expression is the same at different time point within the same quality group
Below is the detail of the analysis, which I used in edgeR.
My question is, am I in the write track? How can I check whether the assumption of repeated measurments met or not?
As you can see from the head of the data table individual S4 looks an outlier, which is also clear after normalization in a plot (plot is not shown). My question is that should I remove it? Is there any effect in case of removing or missing samples? Does sample imbalance affect the model?
R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)
> y <- read.table(file="salmon",row.names="miRNA", header=TRUE)
> head(y)
S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44
let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27
let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8
let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1
let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2
let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7
let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0
S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90
let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18
let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3
let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0
let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11
let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1
let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3
> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day","0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day"))
> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low"))
> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448"))
> dim(y)
[1] 140 35
> dge <- DGEList(counts=y)
> dge <- calcNormFactors(dge)
> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size)
> keep <- rowSums(m > 1) >= 3
> dge <- dge[keep,]
> dim(dge)
[1] 135 35
> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6"))
> targets <- data.frame(quality,indv,time)
> targets
quality indv time
1 Good 1 0day
2 Good 2 0day
3 Good 3 0day
4 Good 4 0day
5 Good 5 0day
6 Low 1 0day
7 Low 2 0day
8 Low 3 0day
9 Low 4 0day
10 Low 5 0day
11 Low 6 0day
12 Good 1 14day
13 Good 2 14day
14 Good 3 14day
15 Good 4 14day
16 Good 5 14day
17 Good 6 14day
18 Low 1 14day
19 Low 2 14day
20 Low 3 14day
21 Low 4 14day
22 Low 5 14day
23 Low 6 14day
24 Good 1 28day
25 Good 2 28day
26 Good 3 28day
27 Good 4 28day
28 Good 5 28day
29 Good 6 28day
30 Low 1 28day
31 Low 2 28day
32 Low 3 28day
33 Low 4 28day
34 Low 5 28day
35 Low 6 28day
> quality <- factor(targets$quality, levels=c("Good","Low"))
> time <- factor(targets$time, levels=c("0day","14day","28day"))
> design <- model.matrix(~quality+quality:indv+quality:time)
> dge <- estimateGLMCommonDisp(dge,design)
> dge <- estimateGLMTrendedDisp(dge,design)
> dge <- estimateGLMTagwiseDisp(dge,design)
> fit <- glmFit(dge, design)
> lrt <- glmLRT(fit, coef="qualityGood:time14day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_14dayvs0day.csv")
> lrt <- glmLRT(fit, coef="qualityGood:time28day")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Good_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_14dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs0day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="Low_28dayvs14day.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="28day_LowvsGood.csv")
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0))
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="14day_LowvsGood.csv")
> lrt <- glmLRT(fit, coef="qualityLow")
> topTags(lrt)
> tab <- topTags(lrt, n=Inf)
> write.table(tab, file="0day_LowvsGood.csv")
-- output of sessionInfo():
R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.4.2 limma_3.18.13
loaded via a namespace (and not attached):
[1] tools_3.1.0
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list