[BioC] GSVA

Robert Castelo robert.castelo at upf.edu
Tue Feb 26 15:16:09 CET 2013


Dear Afsaneh,

please *do include* the email address bioconductor at r-project.org as 
recipient email address (cc:) from your answers.

as for your question below, you can use the Bioconductor package limma 
on the GSVA enrichment scores for the purpose of identifying 
differential pathway activity just as if they were gene expression 
normalized values.

in Section 4.1 from the vignette of GSVA you will find such an example 
using limma. there is extensive documentation about limma, you may 
consult the limma User's Guide by typing:

library(limma)
limmaUsersGuide()

on the R shell.

cheers,
robert.

On 02/26/2013 03:02 PM, Afsaneh wrote:
> On 26/02/2013 13:43, Robert Castelo wrote:
>> Dear Afsaneh,
>>
>> i'm cc'ing the Bioconductor mailing list as this helps in building a
>> knowledge base with questions like yours that can help others in
>> finding their own way. please carbon copy this mailing list address in
>> future communications.
>>
>> as for your question, there are many ways in which you can explore the
>> association between a phenotype of interest and gene/pathway
>> expression profiles.
>>
>> from the way you describe your data below, it seems like you would
>> like to find gene or pathway expression patterns that correlate with
>> physiological phenotypic data described in the variables below called
>> BMI, FEV1_PREDICTED, OVERALL_ACQ6_NO_FEV1, etc.
>>
>> GSVA can help you in obtaining pathway-level summaries of expression
>> which you can use to explore associations between pathways and
>> phenotypes. however, you have to decide the way in which you
>> explore/calculate those associations since GSVA only calculates the
>> pathway summaries of expression for you and not the associations.
>>
>> the default parameters of GSVA, and particularly the default argument
>> mx.diff=TRUE, will produce pathway expression values that are
>> approximately normally distributed. however, you have to explore each
>> of your phenotype variables to figure out what kind of data do they
>> contain (numerical, categorical, counts), how are they distributed,
>> whether they need to be transformed (taking logs for instance if they
>> would have a long tail), how many missing values there are, etc.
>>
>> finally, on the basis of the type of phenotypic data you have at hand,
>> you have to decide what kind of statistical model you should use to
>> explore the association of each phenotype with expression.
>>
>> if you feel somewhat overwhelmed with the number of issues that i've
>> raised in my answer, try to contact a local statistician that can help
>> you out in analysing your data.
>>
>>
>> cheers,
>> robert.
>>
>> On 02/26/2013 02:12 PM, Afsaneh wrote:
>>> Dear Justin,
>>> I have set of normalized microarray data from group of patients + some
>>> physiological data(phenotype) like below
>>> I was wonder what can be done using your package:
>>> would I be able to calculate association of phenotype and gene
>>> expression and what about pathway analysis.
>>> Regards, Afsaneh
>>>
>>> SAMPLE_NAME R5 R9 R14 R17 R19 R21 R29
>>> BMI 24.03440715 28.37370242 34.19856 48.91212683 29.5858
>>> 31.21748 24.02381
>>> FEV1_PREDICTED 2.36 2.93 3.01 2.3 2.59 2.22 2.9
>>> OVERALL_ACQ6_NO_FEV1 1.5 1.33 1.67 1.83 2 1 2.67
>>> GINA 4 4 5 5 4 5 4
>>> EXACERB_PAST_12MONTH_REQ_RESCUE_COURSE_PREDNISOLONE_ANDOR_ANTIBIOTICS
>>> 3 1
>>> 2 1 5 4 5
>>> SPUTUM_EOS_PERCENT 9.5 15 3 2.5 1 22.5 2.75
>>> SPUTUM_NEUTROPHIL_PERCENT 55.5 50.75 53.25 72.75 97.25 7.75 64
>>> SPUTUM_EPITHELIAL_CELLS_PERCENT 0 1.3 0.25 1.75 0 19.25 1.25
>>> OVERALL_ACQ7 1.71 1.86 2.14 2.14 1.71 0.86 3
>>> TOTAL_IgE_IUperL 142 435 70.8 11.5 10.7 32.5 245
>>> SARP 2 5 4 5 1 1 4
>>> FENO50 54.4 29.1 68.3 3.6 32.6 24.1 27
>>> BLOOD_NEUTROPHILS 2900 3790 3340 8620 6520 3690 3080
>>> BLOOD_EOSINOPHILS 150 470 320 160 320 820 150
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Dr Afsaneh Maleki-Dizaji
>>> Research Fellow,
>>> Computational Systems Biology,
>>> Department of Computer Science
>>> Kroto Research Institute,
>>> University of Sheffield,
>>> North Campus,
>>> Broad Lane,
>>> Sheffield,
>>> S3 7HQ.
>>> Email:s.maleki-dizaji at dcs.shef.ac.uk
>>> Phone: +44 (0) 114 2221949
>>>
>>
> Many thanks for reply. for me is important to see differential pathway
> difference between e.g. two sets of data control Vs treated. which
> function is most useful for this aim.
> Regards, Afsaneh
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioconductor mailing list