[BioC] [devteam-bioc] loading/accessing older GO.db and org.Hs.eg.db data

Marc Carlson mcarlson at fhcrc.org
Sat May 31 00:09:34 CEST 2014


Hi Jonathon,

There are plenty of very good reasons why mixing different versions of 
packages is a bad idea and for why it is preferable to use an older 
version of R/Bioconductor if repeating an older analysis.  For one thing 
such combinations of old and new packages are untested and may produce 
unpredictable results.  For another if you are really interested in 
reproducing older results, you should do so by trying to keep all those 
variables the same as they were the 1st time.

But if you have read all about the risks and are still hell-bent on 
doing it anyways the most direct approach would be to swap the database 
files between the source tarballs.

You can find older versions of older packages at the links here (look 
for the box labeled 'Previous Versions' in the lower right hand side of 
the screen)

http://www.bioconductor.org/install/

And since the database schemas have not changed all *that* much you 
might be in luck.  That is for any pair of packages it is possibly the 
case that you could take the .sqlite file from an older tarball and then 
drop it into the inst/extdata directory of a newer source tarball.  This 
kind of hack could work assuming that the schema has not changed too 
much.  But if you go back too far, then you might have more and more 
problems because of additions that were made to the metadata table etc.  
I actually tried this for the bioc 2.10 release (putting it into the 
2.14 release) and this kind of 'brain transplant' seemed to mostly work 
OK (except that the GO queries were messed up - more on that below).

But this is still not recommended.  Not only will you have missing data 
etc.  But there is data in the GO.db package and the org.Hs.eg.db 
package that needs to line up (GOIDs).  And without the assurance that 
this data will match up: some functions that you want to use may simply 
not work properly.  Also if the schema has changed then you may find 
that the swap I described above requires you to make modifications to 
the table structure for the older DB.   For example in the case I tested 
above the GO terms would not work with the extractors.  Why?  Because 
the newer DB adds several views to the data in order to get a 
performance boost.  If I really wanted this old data to work with my new 
package software, I would have to also update its DB to contain those 
newer views.  You should be able to do that be just looking at the newer 
DB and calling .schema on the relevant views.


   Marc






On 05/29/2014 12:20 PM, Maintainer wrote:
> Hello,
>
> I am interested in accessing old versions of GO.db and org.Hs.eg.db data. I would like to them to potentially be different versions, so entirely downgrading bioconductor and/or R (the standard response) does not seem to make sense. Also, I recognize that identifiers and mappings may be missing or not match -- That is ok.
>
> After reading through the AnnotationDbi documentation, it seems to me that annotation packages come from SQLite databases. If so, is it possible to create new annotation objects using older SQLite databases? Are there archived SQLite databases of these data?
>
> If not, what is a reasonable method in bioconductor to get older versions of the data?
>
> There are a few posts here that seem to indicate that different versioning is not a good idea. But, any direction would be much appreciated!
>
> Thanks,
> Jonathan Mortensen
>
> PhD Candidate
> Stanford Center for Biomedical Informatics Research
> Stanford, CA
> jonathanmortensen.com
> 513-225-1935
>
>   -- output of sessionInfo():
>
> N/A
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc



More information about the Bioconductor mailing list