[BioC] FW: URGENT Help required: Getting this error with GAGE analysis, input files attached

Martin Morgan mtmorgan at fhcrc.org
Thu Jan 26 22:22:39 CET 2012


On 01/26/2012 12:41 PM, Javerjung Sandhu wrote:
> Hi Martin,
> Thanks for the reply. I am forwarding this message to you which shows
> the error in RED. I have checked the help pages for GAGE, readExpData
> which don't give that much info. Class of "Micro_array_data" is a
> data.frame. I will also forward you the email of Mr. Luo Weijun which

The help page ?gage says that class of the first argument should be a 
'matrix'. The error message also says that the function was expecting a 
'matrix'. As you have discovered you provided a 'data.frame'. Is a 
data.frame a matrix?

Martin

> might help you i assume. In that email Mr. Luo Weijun explains what
> should be the format of input files and how should i read them. I am
> reading the files in the same way but still it shows the error.
> Thanks,
> Jung
> ------------------------------------------------------------------------
> *From:* Javerjung Sandhu
> *Sent:* Tuesday, January 24, 2012 11:16 AM
> *To:* luo_weijun at yahoo.com
> *Cc:* bioconductor at r-project.org
> *Subject:* URGENT Help required: Getting this error with GAGE analysis,
> input files attached.
>
>
> ------------------------------------------------------------------------
> *From:* Javerjung Sandhu
> *Sent:* Monday, January 23, 2012 1:27 PM
> *To:* Valerie Obenchain
> *Cc:* bioconductor at r-project.org; luo_weijun at yahoo.com
> *Subject:* Getting this error with GAGE analysis, input files attached
>
> Hi there,
> I am getting this error on R console. I have attached the input files.
> Help will be really appreciated.
>
>  > Micro_array_data <- readExpData(file = "Micro_array_dataset.txt")
>  > Gene_set <- readList("Gene_set.gmt")
>  > Reference_condition <- c(1,3,5)
>  > Target_condition <- c(2,4,6)
>  > A1_compare_un <- gage(Micro_array_data, Gene_set, ref =
> Reference_condition, samp = Target_condition)
> Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir,
> compare = compare, :
> exprs needs to be a numeric matrix or vector
>  > # Essential_member_genes <- essGene(Gene_set, Micro_array_data,ref =
> NULL)
>  > # Non_redundant_significant_gene_set_list <- esset.grp()
>  >
>
> Thanks,
>
> Jung
>
> ________________________________________
> From: Valerie Obenchain [vobencha at fhcrc.org]
> Sent: Monday, January 23, 2012 9:50 AM
> To: Javerjung Sandhu
> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE
> Analysis and DO a BASIC GAGE analysis using those files
>
> Hello,
>
> On 01/22/12 16:31, Javerjung Sandhu wrote:
>  > Hi Valerie,
>  > Thanks for the information. Now i won't follow the vignette, i am
> trying to write my own code from scratch.
>  > My supervisor said i should follow the GO.GS gene dataset for now and
> we can work with others later. Actually i am an engineering science
> student who had no background in biology and i got a co-op job at bc
> cancer agency to do some analysis using perl and python. But last month
> my supervisor said that i need to work on GAGE therefore i learned the R
> from different sources and also from the R website, i got the "R intro"
> file which helped me a lot to learn R.
>  > So i have a request for you. I will send you the code which i will
> write along with the data files. So if you could please help me in
> getting rid of the errors so that i can finish the analysis asap.
> If you have problems using the gage package, they should be posted on
> the bioconductor mailing list. As you mentioned, Weijun has also
> responded to your message and is willing to help. Posting on the list
> makes it possible for more than one person to respond and for other new
> users to learn from the discussion. So, once you have your script
> written and have tried to use the functions in gage, post them to the
> mailing list. You need to provide a small working example of what you
> have tried and what errors you are seeing.
>
>  > I really appreciate all your help.
>  > I also recieved an email from Weijun. I will go through that email
> and ask you questions/problems.
>  > If possible can you write a script for me which can do a basic GAGE
> analysis and i can edit that to customise it according to my needs. You
> can use the GO.GS gene set and the input file which i have attached
> right now.
> No, unfortunately I can't write the script for you. The vignette in the
> package has examples of how to perform a gage analysis. Your data will
> be different but the general steps will be the same. If you run into
> trouble, post a small, reproducible example on the mailing list.
>
> Valerie
>
>  > But i am also writing my code but i am so depressed, sad and
> confused; i don't think my code will work.
>  > Thanks,
>  > Jung
>  >
>  > ________________________________________
>  > From: Valerie Obenchain [vobencha at fhcrc.org]
>  > Sent: Thursday, January 19, 2012 5:54 PM
>  > To: Javerjung Sandhu
>  > Cc: bioconductor at r-project.org; luo_weijun at yahoo.com
>  > Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE
> Analysis and DO a BASIC GAGE analysis using those files
>  >
>  > Hi Jung,
>  >
>  > Thank you for sending your files but there is no need to attach the
>  > source files from the gage package (GAGE.r, gage.pdf). I have access to
>  > those files.
>  >
>  > The package vignette is just intended to be an example. Clearly the data
>  > in the package and your data will be very different. It does not make
>  > sense to try to follow the code exactly "as is" when using your data.
>  > For example, it doesn't make sense for you to grep for 'HN', 'ADH' and
>  > 'DCIS' since they don't exist in your file. These are treatment groups
>  > included in the gage sample data and have no bearing on your analysis.
>  > This is why you see nothing (i.e., integer(0)) for these variables.
>  >
>  > > Micro_array_dataset<- read.table("Micro_array_dataset.txt")
>  > > cn=colnames(Micro_array_dataset)
>  > > hn=grep('HN',cn, ignore.case =T)
>  > > adh=grep('ADH',cn, ignore.case =T)
>  > > dcis=grep('DCIS',cn, ignore.case =T)
>  > > print(hn)
>  > integer(0)
>  > > print(dcis)
>  > integer(0)
>  >
>  >
>  > This error is due to the fact that you are subsetting a data.frame and
>  > have not specified the columns. In the vignette, the gene set is a list
>  > so this subsetting works.
>  >
>  > > lapply(Gene_set[1:3],head)
>  > Error in `[.data.frame`(Gene_set, 1:3) : undefined columns selected
>  >
>  >
>  > Next, your genes need to be grouped by pathway. The idea is to do an
>  > analysis of gene pathways so you need to provide a list of genes grouped
>  > by pathway (like the kegg.gs or go.gs example files in the vignette).
>  > Your gene file consists only of gene names,
>  >
>  > > head(rownames(Micro_array_dataset))
>  > [1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419"
> "ENSG00000000457"
>  > [5] "ENSG00000000460" "ENSG00000000938"
>  >
>  > In R, a list of genes grouped by pathway would look like something like
>  > this,
>  > > head(kegg.gs)
>  > $`hsa00010 Glycolysis / Gluconeogenesis`
>  > [1] "10327" "124" "125" "126" "127" "128" "130"
>  > "130589"
>  > [9] "131" "160287" "1737" "1738" "2023" "2026" "2027" "217"
>  > ...
>  >
>  > $`hsa00020 Citrate cycle (TCA cycle)`
>  > [1] "1431" "1737" "1738" "1743" "2271" "283398" "3417" "3418"
>  > [9] "3419" "3420" "3421" "4190" "4191" "47" "48" "4967"
>  > ...
>  >
>  > You need to identify what pathways you are interested and group the
>  > genes by those pathways. For identifying pathways take a look at the
>  > GO.db, KEGG.db or reactome.db. Mapping between gene identifiers can be
>  > done with the org.*.db packages.
>  >
>  > http://www.bioconductor.org/packages/release/data/annotation/
>  >
>  > Some general background on using Bioconductor annotation data is here,
>  >
>  >
>  >
> http://www.bioconductor.org/help/workflows/annotation-data/#annotation-resources
>  >
>  >
>  > Valerie
>  >
>  >
>  > On 01/17/12 12:51, Javerjung Sandhu wrote:
>  >> Hello Valerie,
>  >> Thanks for your help. I am sending you the data
>  >> files(Micro_array_dataset.txt**& Gene_Set.txt) which i want to use
>  >> for the analysis.
>  >> I need to know in which format the files should be saved (like
>  >>
> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
>  >> this site explains in great detail, what should be the format of the
>  >> data files required for GSEA analysis (though i am not using GSEA
>  >> analysis or these file types), same way i want to know in which format
>  >> i should save the data files required for GAGE analysis so that the
>  >> analysis is done properly)
>  >> Please tell me which information is missing from these files.
>  >> * Yes i know that "gse16873" is expression data and "kegg.gs" is a
>  >> geneset but i want to use my own, these ones are provided by the author.
>  >> 1) What i want to accomplish is: I want to do a basic gage analysis
>  >> (as given in the R script file named "GAGE.r" and pdf file "gage.pdf")
>  >> such as t-test, rank test, KS test etc.
>  >> 2) I copied the begining code(to make sure that it loads all the files
>  >> successfully) from R script file provided by the author (which is also
>  >> attached as GAGE.r) and made some changes to it and saved as my own
>  >> script (also attached as Gage_run.r). I tried to load the data files
>  >> (Micro_array_dataset.txt& Gene_Set.txt) and got these errors (shown
>  >> in "R Console.txt" file).
>  >> 3) I run the R script file (Gage_run.r) first to see that it loads all
>  >> the input files successfully and then i can move ahead with the tests.
>  >> The output is shown in "R Console.txt" file which shows the errors and
>  >> warnings.
>  >> If you need more additional information. Please do tell me. I will be
>  >> happy to provide that.
>  >> **an expression matrix with genes as rows and samples as columns.
>  >> Thanks,
>  >> Jung
>  >> ------------------------------------------------------------------------
>  >> *From:* Valerie Obenchain [vobencha at fhcrc.org]
>  >> *Sent:* Tuesday, January 17, 2012 10:04 AM
>  >> *To:* Javerjung Sandhu
>  >> *Cc:* bioconductor at r-project.org; luo_weijun at yahoo.com
>  >> *Subject:* Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE
>  >> Analysis and DO a BASIC GAGE analysis using those files
>  >>
>  >> Hello,
>  >>
>  >> I think the vignette is clear that you need (1) a gene set and (2) a
>  >> mircoarray dataset to run the gage analysis. On page 4 they mention
>  >> the importance of having the same ID system for your gene set and
>  >> expression data. Once this is accomplished you can use the gage()
>  >> function.
>  >>
>  >> ## this is the expression data
>  >> gse16873
>  >>
>  >> ## this is the gene set
>  >> kegg.gs
>  >>
>  >> ## call to gage() using 'HN' as control and 'DCIS' as treatment
>  >> gse16873.kegg.p<- gage(gse16873, gsets = kegg.gs,
>  >> ref = hn, samp = dcis)
>  >>
>  >>
>  >> I belive if you have only one column of expression data the 'ref' and
>  >> 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage
>  >> for details. Maybe the package author will comment on this. I've cc'd
>  >> them on this message.
>  >>
>  >> It is still not clear to me what you have tried. It would be helpful
>  >> to know the following,
>  >>
>  >> (1) what is your analysis question (what are you trying to accomplish)
>  >> (2) what have you tried (what functions have you used)
>  >> (3) what errors have you seen from #2
>  >>
>  >>
>  >> Valerie
>  >>
>  >>
>  >>
>  >>
>  >>
>  >>
>  >>
>  >>
>  >>
>  >> On 01/16/2012 04:19 PM, Javerjung Sandhu wrote:
>  >>> Hi Valerie,
>  >>> First of all thanks a lot for replying and helping me. I really
> appreciate that. I am sending you the R source code file which the GAGE
> analysis uses plus two other documents which explains what that package
> does.
>  >>> These are the data files used by the GAGE analysis:
>  >>> ----------------------------
>  >>> Data sets in package ‘gage’:
>  >>> carta.gs Common gene set data collections
>  >>> egSymb Mapping between Entrez Gene IDs and official
>  >>> symbols
>  >>> go.gs Common gene set data collections
>  >>> gse16873 GSE16873: a breast cancer microarray dataset
>  >>> kegg.gs Common gene set data collections
>  >>> -----------------------------------------------------
>  >>> I have only ONE tab delimited data file in the form of a MATRIX
> giving the gene expressions for 173 patients(as columns) and names of
> genes(as rows).
>  >>> I want to know how can i use this package and my data to do the
> GAGE analysis.
>  >>> If you need more information, please tell me. I will be ready to
> provide that.
>  >>> Thanks,
>  >>> Jung
>  >>>
>  >>> ________________________________________
>  >>> From: Valerie Obenchain [vobencha at fhcrc.org]
>  >>> Sent: Monday, January 16, 2012 3:18 PM
>  >>> To: Javerjung Sandhu
>  >>> Cc:bioconductor at r-project.org;luo_weijun at yahoo.com
>  >>> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for
> GAGE Analysis and DO a BASIC GAGE analysis using those files
>  >>>
>  >>> Hi Jung,
>  >>>
>  >>> Please provide the code you've tried and the error you are seeing. For
>  >>> example, did you read your own data into R, then try to use gage() and
>  >>> got an error? We can better help you if we understand your inputs and
>  >>> the function you're having trouble with.
>  >>>
>  >>> Valerie
>  >>>
>  >>>
>  >>> On 01/13/12 13:10, Javerjung Sandhu wrote:
>  >>>> Dear List,
>  >>>> I will highly appreciate your help on this.
>  >>>> For the GAGE analysis package shown by the link given below:
>  >>>> http://www.bioconductor.org/packages/release/bioc/html/gage.html
>  >>>> Could you please tell me how to prepare the Custom INPUT files
> required for this analysis
>  >>>> OR
>  >>>> Send me the SAMPLE DATA files in TXT format so that i know in
> which format i need to put the data& how could i DO a BASIC GAGE
> analysis using those files. I couldn't figure it out and trying it since
> 3 weeks or more.
>  >>>> Best Regards,
>  >>>> Jung
>  >>>>
>  >>>> [[alternative HTML version deleted]]
>  >>>>
>  >>>> _______________________________________________
>  >>>> Bioconductor mailing list
>  >>>> Bioconductor at r-project.org
>  >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>  >>>> Search the
> archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>  > >
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list