# [BioC] Using linear models to find differential gene expression (NGS)

```I think that you need to start by learning some elementary
statistics.  There are lots of good books out there.
I like the ones by Ott (or Ott and Longnecker) and by Devore and
Peck.  Read a few pages a day, work through the examples in the text
and you'll
be much better equipped to handle your analyses in about 3 weeks.

At 08:25 AM 9/1/2010, Johnny H wrote:
>Hi.
>I have found some R/Bioconductor/Genominator code on the web (below) and it
>measures differential expression of RNA-seq short read data using a general
>linear model.
>
>Can someone explain some basic questions I have?
>
>1) What is the reason for using 2 glm's for measuring differential
>expression?
>
>2) In the function(y) there are two linear models ran; one with argument y ~
>groups and the other with argument y ~ 1. Why do this?
>
>3) What does the offset do?
>
>4) Why use ANOVA; is to compare the linear models?
>
>5) What can we say about results, if adjusted for multiple testing; how
>would you explain a significant result?
>
>6) Would an adjusted p-value of <= 0.05 be significant?
>
>Basically, I don't know much about the statistics done below and any advice
>or pointers to good literature for this would be a great help. Thank you.
>
>         mut_1_f mut_2_f wt_1_f wt_2_f
>YAL069W       0       0      0      0
>YBL049W      19      18     10      4
>
># Normalisation of RNA-seq lanes
>notZero <- which(rowSums(geneCountsUI) != 0)
>upper.quartiles <- apply(geneCountsUI[notZero, ], 2, function(x) quantile(x,
>0.75))
>uq.scaled <- upper.quartiles/sum(upper.quartiles) * sum(laneCounts)
>
># Calculating differential expression
>groups <- factor(rep(c("mut", "wt"), times = c(2, 2)))
>
>pvalues <- apply(geneCountsUI[notZero, ], 1,
>   function(y) {
>     fit <- glm(y ~ groups, family = poisson(), offset = log(uq.scaled))
>     fit0 <- glm(y ~ 1, family = poisson(), offset = log(uq.scaled))
>     anova(fit0, fit, test = "Chisq")[2, 5]
>})
>
>
