[BioC] RpsiXML issues with latest Biogrid release files

Sara JC Gosline sara.gosline at mail.mcgill.ca
Mon Dec 7 16:03:26 CET 2009


Hello again,

I have recently installed and used RpsiXML to successfully parse the 
latest xml files from intact. However, when I try the same functions 
with the latest version of Biogrid (to obtain assay-specific 
interactions instead of experiment-specific), I get a graph with a 
single node “NA” and 1 interaction. SessionInfo is at the end of the email.

***Parsing xml files to graph:
I used the ‘PCA’ file since it is relatively short:
> 
g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM-PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T)
1 Entries found
Parsing entry 1
Parsing experiments: ...............................................
Parsing interactors:
100% ========================================>
Parsing interactions:
100% ========================================>
>  g
[1] "psimi25Graph"
attr(,"package")
[1] "RpsiXML"
>  nodes(g)
[1] "NA"
>  edges(g)
$`NA`
[1] "NA"

***Parsing xml file without graph:
To determine if this is something wrong with the parsing, I redo the 
parsing without formatting to a graph object:
> 
g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM-PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T)

Here is the first bit of output:
>  g
==================================
interaction entry ( 2009-11-25 ):
==================================
[ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae 
Schizosaccharomyces pombe
[ taxonomy ID ]: 3702 4932 4896
[ interactors ]: there are 1214 interactors in total, here are the first 
few ones:
sourceDb sourceId shortLabel uniprotId organismName taxId
<NA> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702"
<NA> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702"
<NA> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932"
<NA> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932"
...
[ interactions ]: there are 2736 interactions in total, here are the 
first few ones:
[[1]]
interaction ( NA ):
---------------------------------
[ source database ]:
[ source experiment ID ]: 1
[ interaction type ]: protein complementation assay
[ experiment ]: pubmed 17681130
[ participant ]: NA NA
[ bait ]: 1
[ bait UniProt ]: NA
[ prey ]: 2
[ prey UniProt ]: NA

So the interactors and interactions are being parsed correctly, but not 
being retrieved properly. When I look at the attributes of each 
interaction I get mostly NA’s:
attributes(g at interactions[[1]])
$sourceDb
[1] ""

$sourceId
[1] NA

$interactionType
[1] "protein complementation assay"

$expPubMed
[1] "17681130"

$expSourceId
[1] "1"

$confidenceValue
[1] NA

$participant
<NA> <NA>
NA NA

$bait
[1] "1"

$baitUniProt
[1] NA

$prey
[1] "2"

$preyUniProt
[1] NA

$inhibitor
[1] NA

$neutralComponent
[1] NA

$class
[1] "psimi25Interaction"
attr(,"package")
[1] "RpsiXML"



***Conclusion:
Is there an easy workaround for this? Maybe where I can manually look up 
identifiers?

Thanks,
sara


***SessionInfo:

>  sessionInfo()
R version 2.8.1 (2008-12-22)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] grid splines tools stats graphics grDevices utils
[8] datasets methods base

other attached packages:
[1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0
[4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0
[7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4
[10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4
[13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5
[16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0
[19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0
[22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6
[25] AnnotationDbi_1.4.3 Biobase_2.2.2

loaded via a namespace (and not attached):
[1] cluster_1.11.11 GSEABase_1.4.0



More information about the Bioconductor mailing list