[BioC] GO enrichment

Warin [guest] guest at bioconductor.org
Sun Aug 10 10:47:30 CEST 2014


Hello everyone
I am very new with bioinformatics work, which I hope someone can give me the answer and suggestions.

I am try to use GOseq package to get GO enrichment for my data which is not built-in oraganism.

>enriched.GO=unsorted_L14.15_S_GO.wall$category[p.adjust(unsorted_L14.15_S_GO.wall$over_ represented_pvalue, method="BH") < 0.05]
> head(enriched.GO)
character(0)

I have prepared data as below
#create LengthData
> unsorted_L14.15_S_LengthData <- unsorted_L14.15_S_gene2length
> unsorted_L14.15_S_id <- as.vector(unsorted_L14.15_S_gene2length[,1])  
> unsorted_L14.15_S_length <- as.numeric(unsorted_L14.15_S_gene2length[,2])
> unsorted_L14.15_S_LengthData <- structure(unsorted_L14.15_S_length, .names=unsorted_L14.15_S_id)

 #PWF=fitting the probability weighting function
unsorted_L14.15_S_pwf = nullp(unsorted_genesL14.15_S, bias.data=unsorted_L14.15_S_length, plot.fit=TRUE)
unsorted_L14.15_S_pwf = nullp(unsorted_genesL14.15_S, bias.data=unsorted_L14.15_S_LengthData, plot.fit=TRUE)

> head(unsorted_L14.15_S_pwf)
             DEgenes bias.data       pwf
Cucsa.000210       0      1512 0.5013243
Cucsa.000250       0       405 0.5182944
Cucsa.000270       0       258 0.5205436

> unsorted_L14.15_S_GO.wall <- goseq(unsorted_L14.15_S_pwf, gene2cat=unsorted_L14.15_S_gene2go, test.cats=c("GO:CC", "GO:BP", "GO:MF"), method="Wallenius", repcnt=2000, use_genes_without_cat=TRUE)
Using manually entered categories.
Calculating the p-values...
> head(unsorted_L14.15_S_GO.wall)
      category over_represented_pvalue under_represented_pvalue numDEInCat numInCat
594 GO:0043565            0.0001000255                0.9999945         17       18
177 GO:0005515            0.0079055773                0.9933285        618     1162
380 GO:0008565            0.0088243286                1.0000000          7        7


I found some category such as  GO:0043565 has potential to be one of the category that have significant enrichment because from the result it obtains 17 DE_genes out 18 genes that assigned to this category. Then I went back to check in my genelist table I found only 7 DE_genes out of 18 genes for this category. So I don't know what I have done wrong. I have someone can help me with this. Thank you so much.


Regards,
warin




 -- output of sessionInfo(): 

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8
[3] LC_TIME=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8
[9] LC_ADDRESS=C
[11] LC_MEASUREMENT=en_US.UTF-8
LC_NUMERIC=C
LC_COLLATE=C
LC_MESSAGES=en_US.UTF-8
LC_NAME=C
LC_TELEPHONE=C
LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats
graphics
other attached packages:
[1] GO.db_2.14.0
[4] DBI_0.2-7
grDevices utils
org.Hs.eg.db_2.14.0
AnnotationDbi_1.26.0

datasets methods
RSQLite_0.11.4
GenomeInfoDb_1.0.2
base
[7] Biobase_2.24.0
[10] limma_3.20.8
[13] BiasedUrn_1.06.1
BiocGenerics_0.10.0
goseq_1.16.2
edgeR_3.6.4
geneLenDataBase_1.0.0
loaded via a namespace (and not attached):
[1] BBmisc_1.7
BSgenome_1.32.0
[4] BiocParallel_0.6.1
Biostrings_2.32.0
[7] GenomicFeatures_1.16.2 GenomicRanges_1.16.3
[10] Matrix_1.1-4
RCurl_1.95-4.1
[13] Rsamtools_1.16.1
XML_3.98-1.1
[16] biomaRt_2.20.0
bitops_1.0-6
[19] checkmate_1.1
codetools_0.2-8
[22] fail_1.2
foreach_1.4.2
[25] iterators_1.0.7
lattice_0.20-29
[28] nlme_3.1-117
plyr_1.8.1
[31] sendmailR_1.1-2
stats4_3.1.0
[34] tools_3.1.0
zlibbioc_1.10.0
BatchJobs_1.2
GenomicAlignments_1.0.2
IRanges_1.22.9
Rcpp_0.11.2
XVector_0.4.0
brew_1.0-6
digest_0.6.4
grid_3.1.0
mgcv_1.8-0
rtracklayer_1.24.2
stringr_0.6.2



--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list