[BioC] pulling functional information for SNPs

Seth Falcon sfalcon at fhcrc.org
Tue May 4 21:58:34 CEST 2010


Hi Jim,

On 5/4/10 11:56 AM, James W. MacDonald wrote:
> Getting a query for 700K things will likely take a long time. Had you
> mentioned that you were using the Affy 6.0 chip, we could have gone in a
> different direction.
>
> biocLite("pd.genomewidesnp.6") ## this will take a while
> library(pd.genomewidesnp.6)
> con <- pd.genomewidesnp.6 at getdb()
> ## there might be a better way of setting up the connection...
> ## If so, Benilton will correct me very soon ;-P
>
> ## check things out
>  > dbListTables(con)
> [1] "featureSet" "featureSetCNV" "pmfeature" "pmfeatureCNV"
> [5] "sequence" "sequenceCNV" "sqlite_stat1" "table_info"
>  > dbListFields(con, "featureSet")
> [1] "fsetid" "man_fsetid" "affy_snp_id" "dbsnp_rs_id"
> [5] "chrom" "physical_pos" "strand" "cytoband"
> [9] "allele_a" "allele_b" "gene_assoc" "fragment_length"
> [13] "fragment_length2" "dbsnp" "cnv"
>
> ## try a 70K query - I won't show how I made the snp vector...
>
>  > length(snps)
> [1] 70000
>  > system.time(dbGetQuery(con, paste("select dbsnp_rs_id, chrom,
> physical_pos from featureSet where dbsnp_rs_id in ('", paste(snps,
> collapse = "','"), "');", sep = "")))
> user system elapsed
> 4.89 1.09 119.09
>
> So about 2 min for 70K query. Not bad.

I was curious about this and tried the above on my laptop, a MacBook Pro 
running at 2.53GHz 4GB RAM (pretty sure it has a 5400 rpm disk)  I get a 
result for the above in ~4 sec.  And I can retrieve 800K in ~30 sec.

Did you use a particularly slow system (NFS perhaps?)

+ seth



-- 
Seth Falcon
Bioconductor Core Team | FHCRC



More information about the Bioconductor mailing list