[BioC] pd.mapping250k.sty package: featureSet:fragment_length
James W. MacDonald
jmacdon at med.umich.edu
Fri Sep 17 18:58:42 CEST 2010
On 9/17/2010 9:53 AM, Zhu, Julie wrote:
> Could someone please tell me whether the fragment_length in the featureSet
> of pd.mapping250k.sty is the fragment_length of the sample? Are there
> documentations available for looking up the meanings of each field?
The fragment_length is the length of the restriction fragment. You could
hypothetically have figured this out yourself by comparing the fragment
length to the data on the netaffx site. Unfortunately, it looks like the
current version of the pd.mapping250k.sty package is out of date when
compared to what netaffx has, as the fragment length data for these two
probesets don't agree.
This is not true of the pd.genomewidesnp.6 package, which is what I have
installed. So for instance,
> dbGetQuery(con, "select fragment_length, fragment_length2, man_fsetid
from featureSet limit 10;")
fragment_length fragment_length2 man_fsetid
1 395 217 SNP_A-2131660
2 NA 702 SNP_A-1967418
3 633 883 SNP_A-1969580
4 831 399 SNP_A-4263484
5 970 611 SNP_A-1978185
6 1508 711 SNP_A-4264431
7 NA 921 SNP_A-1980898
8 NA 243 SNP_A-1983139
9 NA 194 SNP_A-4265735
10 420 858 SNP_A-1995832
the fragment_length and fragment_length2 data here do agree (well, at
least the two I checked agree ;-P) with netaffx.
As for the other field names, most seem clear to me. Is there one in
particular that is not clear?
> Some rows have NAs for most the fields even though the allele information is
> known, is this expected?
It is expected, depending on when the package was built. We are simply
taking data from Affymetrix and re-packaging into an object that is
easier to use, so we are dependent on the data we get from Affy. Since
annotation of genetic data is a moving target, things are always changing.
We only build these packages on a semi-annual basis, so we end up out of
date quite quickly. This is a tradeoff between having the most
up-to-date data, and having stable data packages that people can rely on.
We do provide the functionality to build your own, so if you desire the
most up-to-date package, you can build a personal package using the
> Thanks so much for your help!
> con = db(pd.mapping250k.sty)
> dbListFields(con, "featureSet")
>  "fsetid" "man_fsetid" "dbsnp_rs_id" "chrom"
>  "physical_pos" "strand" "cytoband" "allele_a"
>  "allele_b" "gene_assoc" "fragment_length" "dbsnp"
>  "cnv"
> dbGetQuery(con, "select * from featureSet order by fsetid desc limit 2")
> fsetid man_fsetid dbsnp_rs_id chrom physical_pos strand cytoband
> allele_a allele_b
> 1 238378 SNP_A-4301986 rs6989223 8 5214036 - p23.2
> A G
> 2 238377 SNP_A-2291495 rs11644392<NA> NA<NA> <NA>
> A G
> fragment_length dbsnp
> 1 1667 0
> 2 NA NA
> Best regards,
> R version 2.11.1 (2010-05-31)
>  en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
>  stats graphics grDevices utils datasets methods base
> other attached packages:
>  pd.mapping250k.sty_1.0.0 RSQLite_0.9-2 DBI_0.2-5
>  oligo_1.12.2 oligoClasses_1.10.0 Biobase_2.8.0
>  affxparser_1.20.0
> loaded via a namespace (and not attached):
>  affyio_1.16.0 Biostrings_2.16.9 IRanges_1.6.11
>  splines_2.11.1 tools_2.11.1
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
James W. MacDonald, M.S.
University of Michigan
Department of Human Genetics
1241 E. Catherine St.
Ann Arbor MI 48109-5618
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor