[BioC] smlSet and snpMatrix / copy number analysis

Vincent Carey 525-2265 stvjc at channing.harvard.edu
Tue Jul 29 19:30:02 CEST 2008

> Hi all,
> I am currently looking at the GGtools package, specifically the smlSet. I am
> trying to integrate some snp data which I am using for studying CNV and the
> related gene expression information and I believe this may be the class that
> I I need to use. However due to the design of the snp.matrix it cannot store
> these log2ratios and associated Copy number. Is there an available
> package/class out there that has been created. I have spent some time
> looking at the bioconductor list and can't seem to find one? Could I in
> theory alter the smlSet class to stored my own version of snp.matrix
> instead?

in its current form snp.matrix is tailored to discrete genotype data
represented as raw bytes.  the raw representation gives us space and
speed advantages but special code needs to be written in C to use
this representation, all in the snpMatrix package.

i am interested in supporting CNV-related information in a similar
integrative structure but at the moment i do not have such data.

i believe smlSet is a reasonable starting point for designing such
an integrative structure, but a few points are in order

1) smlSet stands for "snp.matrix list" and this caters for an application
with 4 million snp/sample distributed over 24 list elements representing chromosomes
because i am dealing with the full hapmap phase II snp set.  for 500k-type
assays it would not be necessary to decompose into chromosomes, and
some simplifications would follow

2) a nontrivial component of the smlSet infrastructure deals with
managing snp location data, again for 4mm snp factored into chromosomes.
this is not done in an optimal way and needs to be redesigned.  managing
4 million locations is not pleasant on standard hardware; currently SQLite
is used; data frames and netCDF were examined and found wanting in various
respects for the applications targeted thus far.

bottom line: i'd be happy to hear more about your requirements possibly
off the list and we could discuss design steps for the relevant container
structure and methods.  we could introduce more tools in GGtools in short

could you please indicate your affiliation?

> Many thanks in advance,
> Nathan
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

The information transmitted in this electronic communica...{{dropped:10}}

More information about the Bioconductor mailing list