[BioC] topGO

James W. MacDonald jmacdon at uw.edu
Tue Aug 26 16:23:27 CEST 2014


Hi Steven,

One of the best ways to figure out what to do is to see what is required.
If you look on page 3 of the topGO vignette (
http://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf),
you can see that the vignette uses a 'geneList' that comes with the
package. We can inspect this object like this:

> library(topGO)
> data(geneList)
> class(geneList)  ## always a good idea to check
[1] "numeric"
> head(geneList)
1095_s_at   1130_at   1196_at 1329_s_at 1340_s_at 1342_g_at
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000

Being a newbie, you might not recognize this data structure. It is a named
vector, where the vector itself is numeric and the names are character. In
other words, the top row above (starting with 1095_s_at) contains the
names, and the second row contains the values. In this case, the names are
the names of the probes, and the numbers are the p-values from a t-test
comparing two groups.

Please note that you don't give us much information to go on, so it isn't
possible to give you much help. In other words, what array are you using?
Do you have mappings of probe ID to GO terms? If it is a common array,
there are likely to be packages in Bioconductor that can help, or you might
need to use an organism level package. Speaking of which, what is the
species? What have you done so far? Did you analyze these data in R? If
not, what is the form of your data? You say something about a csv file; is
that how you have the data right now?

Without knowing some or all of the above, it isn't really possible to give
you anything but a general solution. So here is a general solution:

You need to read in your probe identifiers and p-values and then create a
named vector. So you need to use one of read.table(), read.csv(),
read.delim() or scan() to read these things into R. Once you have done that
(and note that you will almost surely want to set the stringsAsFactors
argument to FALSE for any of the read.xxx functions), then you can create
the geneList like this:

geneList <- {p-values go here}
names(geneList) <- {probe IDs go here}

If you answer the questions above, we can probably give more constructive
help.

Best,

Jim




On Tue, Aug 26, 2014 at 4:06 AM, Steven Stadler <steven.stadler at gmail.com>
wrote:

> Hi! I am new to Bioconductor and topGO ... My aim is to make a go-term
> richment analysis on expression data with a control and two different
> infections. I managed to create my own goterm-gene mapping, but I dont know
> how to create my own geneList. I have a excel sheet with p-values, reads
> and so on ... How can I create this geneList in R? I am also a newbee in R
> ;-)
>
> I would create a csv File withe the genname and its p-value? But how can I
> parse it in R/bioconductor to use it for the creation of a topGO object? It
> would be nice, if someone could tell me the parse command :-) Or an example
> how I can create a topGO object with custom data. Thanx.
>
> Greetings Steven
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list