[BioC] EdgeR - generating appropriate design and contrast matrix : multi-factorial experiment
    Zaki Fadlullah 
    zaki.fadlullah at carif.com.my
       
    Fri Apr  5 00:38:17 CEST 2013
    
    
  
Dear EdgeR developers and kind list members,
I have a RNA-seq experiment which I would like to analyse using edgeR as i think it is a multi-factorial experiment .
After reading the excellent EdgeR user manual as well the wealth of design-matrix related question in the mailing list, I am still unsure about what design matrix would be appropriate for my data. Therefore I would appreciate feedback from members of mailing list.
The RNA-seq data : 19 samples {16 tumours & 3 normal}. All samples are from different individual, all samples was sequenced once (ie – no replicates)
The aim – To find DE genes based on the sensitivity of tumour samples to drug A [and replicate the same analysis to drug B]
The aim (reworded for clarification) - To find DE genes between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A
Integrating previously known information on drug sensitivity, therefore I designed my meta-data as below ;
targets
   files samples    drug_A    drug_B
1    T01     T01 resistant sensitive
2    T02     T02 resistant resistant
3    T03     T03 sensitive resistant
4    T04     T04    medium sensitive
5    T05     T05    medium sensitive
6    T06     T06 resistant sensitive
7    T07     T07    medium resistant
8    T08     T08    medium resistant
9    T09     T09 resistant resistant
10   T10     T10    medium sensitive
11   T11     T11 resistant resistant
12   T12     T12 sensitive resistant
13   T13     T13 resistant resistant
14   T14     T14 sensitive sensitive
15   T15     T15 sensitive resistant
16   T16     T16 sensitive sensitive
17   N01  normal   unknown   unknown
18   N02  normal   unknown   unknown
19   N03  normal   unknown   unknown
To clarify :-
1)All RNA-seq was data was from untreated samples
2)Information on drug sensitivity was obtain from wet-lab experiments
3)No drug sensitivity experiments was done on normal samples, hence the unknown
>From my current understanding after reading the EdgeR user manual and to an extent the limma section 8.5, to test my aim, I am inclined to say the design matrix for my data should be an interaction model (limma section 8.5.1 & edgeR section 3.31) rather than block model (edgeR section 3.4.2). #I have not fully understand what nested model is, so I  am unsure if nested is the better option??
Therefore my design is 
Groups = factor(paste(targets$samples,targets$drug_A,sep="."))
design = model.matrix(~0 + Groups)
colnames(design) = levels(Groups)
>From this design, I fail to see a way to specify a contrast that would answer my aim of the study (Determine which genes are differently expressed between tumour samples which are sensitive to drug A and tumour samples which are resistant to drug A).
Therefore my question to dear mailing list members would be,
1) Is my experimental design correct to test my aim? (My gut feeling is it is not...)
2) What design is appropriate to account for the individual variability in the tumour while addressing the aim of expreiment (tumour sensitive vs tumour resistant) ? Is this possible?
Would this meta-data be the key?
targets_2
targets
  files samples   type    drug_A    drug_B
1   T01     T01 tumour resistant sensitive
2   T02     T02 tumour resistant resistant
3   T03     T03 tumour sensitive resistant
4   T04     T04 tumour    medium sensitive
. 
.
16   T16     T16 tumour sensitive sensitive
17   N01  normal normal   unknown   unknown
18   N02  normal normal   unknown   unknown
19   N03  normal normal   unknown   unknown
Following the above meta-data, proceed along this line:-
Groups = factor(paste(targets_2$type,targets_2$drug_A,sep="."))
design = model.matrix(~0 + Groups)
colnames(design) = levels(Groups)
my.contrast = makeContrasts(
    tumour.sensitiveVSresistant = tumour.sensitive-tumour.resistant,
    tumour_normal.sensitiveVsresitabt = (tumour.sensitive-normal.unknown)-(tumour.resistant-normal.unknown)
    ,levels=design)
Would the method above be more appropriate?? But will it account for the variability in the tumour samples? (ie- Does the design above treat the tumour as replicates??)
Thank you for taking time reading this post and I apologies if I included many unnecessary information. 
Zaki
    
    
More information about the Bioconductor
mailing list