[BioC] RpsiXML issues with latest Biogrid release files
Sara Jane Calafell Gosline, Ms
sara.gosline at mail.mcgill.ca
Tue Dec 8 03:56:36 CET 2009
Hi Tony,
Thanks, I updated my R version and bioconductor and was still able to reproduce the error on a different machine.
I sent the .xml file to David to reproduce. Here is my new sessionInfo():
R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RpsiXML_1.6.0 hypergraph_1.17.0 XML_2.5-3 RBGL_1.21.2
[5] graph_1.23.3 annotate_1.24.0 AnnotationDbi_1.8.1 Biobase_2.6.1
loaded via a namespace (and not attached):
[1] DBI_0.2-4 RSQLite_0.7-1 tools_2.10.0 xtable_1.5-6
sara
On 07/12/09 11:03 AM, "Tony Chiang" <tchiang at fhcrc.org> wrote:
Hi Sara,
The current release of R is 2.10. I don't know if this will fix the problem, but the current versions of the packages are built for the latest release of R, so the first thing to try is updating your R which will update the version of RpsiXML. I will look at your example in a bit.
Tony
On Mon, Dec 7, 2009 at 7:03 AM, Sara JC Gosline <sara.gosline at mail.mcgill.ca> wrote:
Hello again,
I have recently installed and used RpsiXML to successfully parse the latest xml files from intact. However, when I try the same functions with the latest version of Biogrid (to obtain assay-specific interactions instead of experiment-specific), I get a graph with a single node "NA" and 1 interaction. SessionInfo is at the end of the email.
***Parsing xml files to graph:
I used the 'PCA' file since it is relatively short:
g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM-PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T)
1 Entries found
Parsing entry 1
Parsing experiments: ...............................................
Parsing interactors:
100% ========================================>
Parsing interactions:
100% ========================================>
g
[1] "psimi25Graph"
attr(,"package")
[1] "RpsiXML"
nodes(g)
[1] "NA"
edges(g)
$`NA`
[1] "NA"
***Parsing xml file without graph:
To determine if this is something wrong with the parsing, I redo the parsing without formatting to a graph object:
g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM-PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T)
Here is the first bit of output:
g
==================================
interaction entry ( 2009-11-25 ):
==================================
[ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe
[ taxonomy ID ]: 3702 4932 4896
[ interactors ]: there are 1214 interactors in total, here are the first few ones:
sourceDb sourceId shortLabel uniprotId organismName taxId
<NA> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702"
<NA> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702"
<NA> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932"
...
[ interactions ]: there are 2736 interactions in total, here are the first few ones:
[[1]]
interaction ( NA ):
---------------------------------
[ source database ]:
[ source experiment ID ]: 1
[ interaction type ]: protein complementation assay
[ experiment ]: pubmed 17681130
[ participant ]: NA NA
[ bait ]: 1
[ bait UniProt ]: NA
[ prey ]: 2
[ prey UniProt ]: NA
So the interactors and interactions are being parsed correctly, but not being retrieved properly. When I look at the attributes of each interaction I get mostly NA's:
attributes(g at interactions[[1]])
$sourceDb
[1] ""
$sourceId
[1] NA
$interactionType
[1] "protein complementation assay"
$expPubMed
[1] "17681130"
$expSourceId
[1] "1"
$confidenceValue
[1] NA
$participant
<NA> <NA>
NA NA
$bait
[1] "1"
$baitUniProt
[1] NA
$prey
[1] "2"
$preyUniProt
[1] NA
$inhibitor
[1] NA
$neutralComponent
[1] NA
$class
[1] "psimi25Interaction"
attr(,"package")
[1] "RpsiXML"
***Conclusion:
Is there an easy workaround for this? Maybe where I can manually look up identifiers?
Thanks,
sara
***SessionInfo:
sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] grid splines tools stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0
[4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0
[7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4
[10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4
[13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5
[16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0
[19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0
[22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6
[25] AnnotationDbi_1.4.3 Biobase_2.2.2
loaded via a namespace (and not attached):
[1] cluster_1.11.11 GSEABase_1.4.0
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list