[R] Needing a better solution to a lookup problem.

Mikhail Titov mlt at gmx.us
Wed Mar 14 20:48:29 CET 2012


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Davis, Brian
> Sent: Wednesday, March 14, 2012 2:28 PM
> To: r-help at R-project.org
> Subject: [R] Needing a better solution to a lookup problem.
> 
> I have a solution (actually a few) to this problem, but none are
> computationally efficient enough to be useful.  I'm hoping someone can
> enlighten me to a better solution.
> ...
> I have a solution that works reasonably well on small sets, but my current
> data set is ~100K snp entries, and my regions table has ~200K entries. I
have
> ~1500 files to go through
> 
> I haven't found a good way to efficiently solve this problem.  I've tried
> various versions of mapply/lapply, for loops, etc which get the answer for
> small sets but takes hours (per file) on my real data.  Bioconductor
seemed
> like the obvious place to look, but my GoogleFu must not be that great.  I
> never found anything relevant.
> 
> Any ideas or points to the right direction would be greatly appreciated.

Consider using a database. For instance PostgreSQL can easily handle large
amount of data and can restrict data set to only those that are within a
certain subset. While it requires some DB & SQL knowledge, it will pay off.
And you can query your data right from DB using RODBC or something. Solve
this problem in DB and use R for further analysis.

Mikhail



More information about the R-help mailing list