Needing a better solution to a lookup problem.
Mikhail Titov
mlt at gmx.us
Wed Mar 14 20:48:29 CET 2012
>
> I have a solution (actually a few) to this problem, but none are
> computationally efficient enough to be useful. I'm hoping someone can
> enlighten me to a better solution.
> ...
> I have a solution that works reasonably well on small sets, but my current
> data set is ~100K snp entries, and my regions table has ~200K entries. I
have
> ~1500 files to go through
>
> I haven't found a good way to efficiently solve this problem. I've tried
> various versions of mapply/lapply, for loops, etc which get the answer for
> small sets but takes hours (per file) on my real data. Bioconductor
seemed
> like the obvious place to look, but my GoogleFu must not be that great. I
> never found anything relevant.
>
> Any ideas or points to the right direction would be greatly appreciated.
Consider using a database. For instance PostgreSQL can easily handle large
amount of data and can restrict data set to only those that are within a
certain subset. While it requires some DB & SQL knowledge, it will pay off.
And you can query your data right from DB using RODBC or something. Solve
this problem in DB and use R for further analysis.
Mikhail
