[BioC] Experimental design for RNA-Seq

Jakob Hedegaard Jakob.Hedegaard at agrsci.dk
Wed Jun 2 22:08:25 CEST 2010

Hi Mick,

After quantification of each library using qPCR, we prepared a 20 pM dilution in hyb-buffer. This was followed by pooling of the 12 plex sets producing a 20 pM solution of 12-plex libraries, which according to Illumina should be stable for 2-3 weeks. From this 20 pM stock solution we then prepared a fresh 7 pM solution for each sequence run (7 pM results - in our hands - in app 200K clusters/tile). We obtained app 1.5 M to 1.1 M clusters/sample/lane during the runs, but observed a decline in cluster counts by run. To compensate for this we increased the concentration to 12 pM for the fourth run. The decline was most likely caused by adherence of the DNA to the tubes (?). You will see "some" variation between runs - and also between lanes of the same sample in the same flowcell....

To obtain a higher reproducibility in future experiments, I will prepare the 20 pM solution, dilute to e.g. 10 pM and run a flowcell. As soon as I have the cluster estimates from RTA, the concentration could be corrected to obtain an optimal cluster count, e.g. to 12 pM, and a number of flowcells could be prepared in a few days and sequenced during the following weeks.

Do you have some suggestions for analyzing time-series data sets?


-----Oprindelig meddelelse-----
Fra: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] 
Sendt: 30. maj 2010 14:15
Til: Jakob Hedegaard; bioconductor at stat.math.ethz.ch
Emne: RE: [BioC] Experimental design for RNA-Seq


An excellent answer, thank you.

You say each sample has been sequenced four times due to the 12-plexing.  What kind of variation do you see for the counts across those four times?

From: Jakob Hedegaard [Jakob.Hedegaard at agrsci.dk]
Sent: 30 May 2010 12:03
To: bioconductor at stat.math.ethz.ch
Cc: michael watson (IAH-C)
Subject: SV: [BioC] Experimental design for RNA-Seq


We have just completed the sequencing of RNA-Seq libraries from a porcine challenge experiment: two treatments (bacteria) and 5 time points after challenge (T0,T6,T12,T24,T48 hours pi) - a total of 48 samples (5-6 samples/treatmentXtime).
A single RNA-Seq library has been generated from each sample (so no true technical replication) and the 48 libraries have been sequenced as 12-plex in four flowcells (4 lanes of 12-plexed samples/flowcell, all 48 samples sequenced in each flowcell) using the Illumina index system.
In each 12-plex, the samples have been mixed to balance each treatmentXtime in each plex.
When starting the experiment it was not recommended by Illumina to do less than 12-plex. Since then, Illumina have changed their recommendation so it is possible to do 2, 3, 6 and 12 plex indexing. The experiment could hence have been conducted by 3 plexing instead (so each sample would have been sequenced once instead of four times in four runs) but I still like the idea of sequencing all samples in each run....

Following mapping, the counts from each library have been combined from the three runs - generating more than 4 millions seqs/sample

Starting the analysis, I have found that the available package (DEseq, DEGseq and edger) present examples on the analysis of simple experiment (e.g. control vs challenge) but wonder how to analyse a time-point experiment with two treatments.
Initially, I am going to compare each time-point to the control (within and across treatment) but it would be nice to take the interactions into account as well.

Best regards,

-----Oprindelig meddelelse-----
Fra: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-bounces at stat.math.ethz.ch] På vegne af michael watson (IAH-C)
Sendt: 28. maj 2010 18:04
Til: 'Steve Lianoglou'; Naomi Altman
Cc: bioconductor
Emne: Re: [BioC] Experimental design for RNA-Seq

Great stuff, thanks Steve and Naomi.

I guess I was thinking of technical replicates simply as sequencing the same library on multiple occasions;  though creating two libraries out of one sample adds an extra layer of complexity.

What is the evidence (if any) that lane and/or library preparation can have an effect?

To adjust for lane effects, I guess one could multiplex each sample so that they're run on all lanes, and combine the counts at the end?


-----Original Message-----
From: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com]
Sent: 28 May 2010 16:01
To: Naomi Altman
Cc: michael watson (IAH-C); bioconductor
Subject: Re: [BioC] Experimental design for RNA-Seq


I just wanted to ask/make one point.

On Fri, May 28, 2010 at 9:17 AM, Naomi Altman <naomi at stat.psu.edu> wrote:
> At least from the stat theory point of view, the best design is equal
> numbers of biological samples (the more the better) for each condition and
> no technical reps.

Can you clarify a bit as to what you are referring to as a "technical
replicate" in this sense?

You could consider two lanes that are sequenced from the same library
as technical replicates, no? Or, by "technical replicate" do you mean
creating two libraries out of one sample?

If we're talking about the former, then I think there is lots of value
to be gained, and perhaps necessary(?), to running more than one lane
per library preparation -- and maybe the question would rather be "how
many lanes to run per library"?

What does the court think?


Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list