[BioC] RpsiXML issues with latest Biogrid release files

Sara Jane Calafell Gosline, Ms sara.gosline at mail.mcgill.ca
Tue Dec 8 03:56:36 CET 2009


Hi Tony,

Thanks, I updated my R version and bioconductor and was still able to reproduce the error on a different machine.

I sent the .xml file to David to reproduce.  Here is my new sessionInfo():

R version 2.10.0 (2009-10-26)
i386-apple-darwin9.8.0

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RpsiXML_1.6.0       hypergraph_1.17.0   XML_2.5-3           RBGL_1.21.2
[5] graph_1.23.3        annotate_1.24.0     AnnotationDbi_1.8.1 Biobase_2.6.1

loaded via a namespace (and not attached):
[1] DBI_0.2-4     RSQLite_0.7-1 tools_2.10.0  xtable_1.5-6


sara


On 07/12/09 11:03 AM, "Tony Chiang" <tchiang at fhcrc.org> wrote:

Hi Sara,

The current release of R is 2.10. I don't know if this will fix the problem, but the current versions of the packages are built for the latest release of R, so the first thing to try is updating your R which will update the version of RpsiXML. I will look at your example in a bit.

Tony

On Mon, Dec 7, 2009 at 7:03 AM, Sara JC Gosline <sara.gosline at mail.mcgill.ca> wrote:
Hello again,

I have recently installed and used RpsiXML to successfully parse the latest xml files from intact. However, when I try the same functions with the latest version of Biogrid (to obtain assay-specific interactions instead of experiment-specific), I get a graph with a single node "NA" and 1 interaction. SessionInfo is at the end of the email.

***Parsing xml files to graph:
I used the 'PCA' file since it is relatively short:

g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM-PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T)
1 Entries found
Parsing entry 1
Parsing experiments: ...............................................
Parsing interactors:
100% ========================================>
Parsing interactions:
100% ========================================>
 g
[1] "psimi25Graph"
attr(,"package")
[1] "RpsiXML"
 nodes(g)
[1] "NA"
 edges(g)
$`NA`
[1] "NA"

***Parsing xml file without graph:
To determine if this is something wrong with the parsing, I redo the parsing without formatting to a graph object:

g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM-PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T)

Here is the first bit of output:
 g
==================================
interaction entry ( 2009-11-25 ):
==================================
[ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe
[ taxonomy ID ]: 3702 4932 4896
[ interactors ]: there are 1214 interactors in total, here are the first few ones:
sourceDb sourceId shortLabel uniprotId organismName taxId
<NA> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702"
<NA> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702"
<NA> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932"
...
[ interactions ]: there are 2736 interactions in total, here are the first few ones:
[[1]]
interaction ( NA ):
---------------------------------
[ source database ]:
[ source experiment ID ]: 1
[ interaction type ]: protein complementation assay
[ experiment ]: pubmed 17681130
[ participant ]: NA NA
[ bait ]: 1
[ bait UniProt ]: NA
[ prey ]: 2
[ prey UniProt ]: NA

So the interactors and interactions are being parsed correctly, but not being retrieved properly. When I look at the attributes of each interaction I get mostly NA's:
attributes(g at interactions[[1]])
$sourceDb
[1] ""

$sourceId
[1] NA

$interactionType
[1] "protein complementation assay"

$expPubMed
[1] "17681130"

$expSourceId
[1] "1"

$confidenceValue
[1] NA

$participant
<NA> <NA>
NA NA

$bait
[1] "1"

$baitUniProt
[1] NA

$prey
[1] "2"

$preyUniProt
[1] NA

$inhibitor
[1] NA

$neutralComponent
[1] NA

$class
[1] "psimi25Interaction"
attr(,"package")
[1] "RpsiXML"



***Conclusion:
Is there an easy workaround for this? Maybe where I can manually look up identifiers?

Thanks,
sara


***SessionInfo:

 sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] grid splines tools stats graphics grDevices utils
[8] datasets methods base

other attached packages:
[1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0
[4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0
[7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4
[10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4
[13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5
[16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0
[19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0
[22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6
[25] AnnotationDbi_1.4.3 Biobase_2.2.2

loaded via a namespace (and not attached):
[1] cluster_1.11.11 GSEABase_1.4.0

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list