[BioC] GSVA

Robert Castelo robert.castelo at upf.edu
Tue Feb 26 14:43:07 CET 2013


Dear Afsaneh,

i'm cc'ing the Bioconductor mailing list as this helps in building a 
knowledge base with questions like yours that can help others in finding 
their own way. please carbon copy this mailing list address in future 
communications.

as for your question, there are many ways in which you can explore the 
association between a phenotype of interest and gene/pathway expression 
profiles.

from the way you describe your data below, it seems like you would like 
to find gene or pathway expression patterns that correlate with 
physiological phenotypic data described in the variables below called 
BMI, FEV1_PREDICTED, OVERALL_ACQ6_NO_FEV1, etc.

GSVA can help you in obtaining pathway-level summaries of expression 
which you can use to explore associations between pathways and 
phenotypes. however, you have to decide the way in which you 
explore/calculate those associations since GSVA only calculates the 
pathway summaries of expression for you and not the associations.

the default parameters of GSVA, and particularly the default argument 
mx.diff=TRUE, will produce pathway expression values that are 
approximately normally distributed. however, you have to explore each of 
your phenotype variables to figure out what kind of data do they contain 
(numerical, categorical, counts), how are they distributed, whether they 
need to be transformed (taking logs for instance if they would have a 
long tail), how many missing values there are, etc.

finally, on the basis of the type of phenotypic data you have at hand, 
you have to decide what kind of statistical model you should use to 
explore the association of each phenotype with expression.

if you feel somewhat overwhelmed with the number of issues that i've 
raised in my answer, try to contact a local statistician that can help 
you out in analysing your data.


cheers,
robert.

On 02/26/2013 02:12 PM, Afsaneh wrote:
>   Dear Justin,
> I have set of normalized microarray data from group of patients + some
> physiological data(phenotype) like below
> I was wonder what can be done using your package:
> would I be able to calculate association of phenotype and gene
> expression and what about pathway analysis.
> Regards, Afsaneh
>
> SAMPLE_NAME 	R5 	R9 	R14 	R17 	R19 	R21 	R29
> BMI 	24.03440715 	28.37370242 	34.19856 	48.91212683 	29.5858
> 31.21748 	24.02381
> FEV1_PREDICTED 	2.36 	2.93 	3.01 	2.3 	2.59 	2.22 	2.9
> OVERALL_ACQ6_NO_FEV1 	1.5 	1.33 	1.67 	1.83 	2 	1 	2.67
> GINA 	4 	4 	5 	5 	4 	5 	4
> EXACERB_PAST_12MONTH_REQ_RESCUE_COURSE_PREDNISOLONE_ANDOR_ANTIBIOTICS 	3 	1
> 	2 	1 	5 	4 	5
> SPUTUM_EOS_PERCENT 	9.5 	15 	3 	2.5 	1 	22.5 	2.75
> SPUTUM_NEUTROPHIL_PERCENT 	55.5 	50.75 	53.25 	72.75 	97.25 	7.75 	64
> SPUTUM_EPITHELIAL_CELLS_PERCENT 	0 	1.3 	0.25 	1.75 	0 	19.25 	1.25
> OVERALL_ACQ7 	1.71 	1.86 	2.14 	2.14 	1.71 	0.86 	3
> TOTAL_IgE_IUperL 	142 	435 	70.8 	11.5 	10.7 	32.5 	245
> SARP 	2 	5 	4 	5 	1 	1 	4
> FENO50 	54.4 	29.1 	68.3 	3.6 	32.6 	24.1 	27
> BLOOD_NEUTROPHILS 	2900 	3790 	3340 	8620 	6520 	3690 	3080
> BLOOD_EOSINOPHILS 	150 	470 	320 	160 	320 	820 	150
>
>
>
>
>
> --
> Dr Afsaneh Maleki-Dizaji
> Research Fellow,
> Computational Systems Biology,
> Department of Computer Science
> Kroto Research Institute,
> University of Sheffield,
> North Campus,
> Broad Lane,
> Sheffield,
> S3 7HQ.
> Email:s.maleki-dizaji at dcs.shef.ac.uk
> Phone: +44 (0) 114 2221949
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioconductor mailing list