[BioC] Creating a new instance of oligoSnpSet

Wed Nov 26 22:55:58 CET 2008

Hi Martin,

Thanks for your feedback.

I've put requested info inline below.
I'll work on a reproduce with manageable
data.

Best

Steve McKinney

> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Wednesday, November 26, 2008 1:23 PM
> To: Steven McKinney
> Cc: Bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Creating a new instance of oligoSnpSet
> 
> Hi Steven --
> 
> Steven McKinney wrote:
> > Hi all,
> >
> > Thanks to Robert Scharpf for a quick and detailed
> > off-line response.  For anyone else that may encounter
> > this issue:  my problem was that my featureData object's
> > 'data' slot data frame did not have names "chromosome"
> > and "position" .
> >
> > I originally defined my featureData object as
> >
> >> cclfd <-
> > +   new("AnnotatedDataFrame",
> > +       data = data.frame(position = pData(featureData(ccld)[,
> "MapInfo"]),
> > +         chromosome = pData(featureData(ccld)[, "CHR"]),
> > +         stringsAsFactors = FALSE),
> > +       varMetadata = data.frame(labelDescription = c("position",
> "chromosome")))
> >
> > extracting directly from my ccld object (a SnpSetIllumina object
> > from beadarraySNP command read.SnpSetIllumina()
> >  ccld <- read.SnpSetIllumina(samplesheet =
> "ccl_CNV370SampleSheet_8samples.csv",
> >                              reportfile = "ccl_FinalReport_2.txt")
> > )
> >
> >
> > This yielded an AnnotatedDataFrame object with slot 'data'
> > containing a data frame whose names were not those I had
> > put in the data.frame() code above (namely "position"
> > and "chromosome").
> >
> >> str(cclfd)
> > Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
> >   ..@ varMetadata      :'data.frame':	2 obs. of  1 variable:
> >   .. ..$ labelDescription: chr [1:2] "position" "chromosome"
> >   ..@ data             :'data.frame':	373397 obs. of  2
variables:
> >   .. ..$ MapInfo: num [1:373397] 1.64e+08 1.66e+08 1.66e+08 1.66e+08
> 1.67e+08 ...
> >   .. ..$ CHR    : Factor w/ 25 levels "1","10","11",..: 18 18 18 18
18
> 18 18 18 18 18 ...
> >   .. .. ..- attr(*, "names")= chr [1:373397] "cnvi0000001"
"cnvi0000002"
> "cnvi0000003" "cnvi0000004" ...
> >   ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
> >   ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"]
with
> 1 slots
> >   .. .. ..@ .Data:List of 1
> >   .. .. .. ..$ : int [1:3] 1 1 0
> >
> > So that's my R lesson for today - names specified in a
> > data.frame() call don't necessarily stick!
> 
> Hmm, I'm not sure that's the right lesson -- you don't have to be that
> suspicious of data.frame.

 > foo <- data.frame(fooCol1 = 1:3)
 > bar <- data.frame(barCol1 = 11:13)
 > baz <- data.frame(a = foo, b = bar)
 > baz
   fooCol1 barCol1
 1       1      11
 2       2      12
 3       3      13
 >

I thought the column names 'a' and 'b' specified in the
baz data frame construction would be used, but
this is not the case.  Not suspicious, just surprised.
This behaviour is not indicated nor contra-indicated
in the data.frame() documentation, so it's just one
of those lessons learned by trial.

> 
> It might be AnnotatedDataFrame or oligoSnpSet, though. I wonder what
> your sessionInfo() is? Also what does str(featureData(ccld)) say? An
> unusual thing is the 'names' attribute of cclfd. Any chance of
creating
> a reproducible example (i.e., without access to your files, maybe by
> referencing help pages [using the 'example()' function] or making a
> version with just a few features and using dput)?

> str(featureData(ccld))
Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
  ..@ varMetadata      :'data.frame':	4 obs. of  1 variable:
  .. ..$ labelDescription: chr [1:4] "CHR" "MapInfo" "GTS" "OPA"
  ..@ data             :'data.frame':	373397 obs. of  4 variables:
  .. ..$ CHR    : Factor w/ 25 levels "1","10","11",..: 18 18 18 18 18
18 18 18 18 18 ...
  .. .. ..- attr(*, "names")= chr [1:373397] "cnvi0000001" "cnvi0000002"
"cnvi0000003" "cnvi0000004" ...
  .. ..$ MapInfo: num [1:373397] 1.64e+08 1.66e+08 1.66e+08 1.66e+08
1.67e+08 ...
  .. ..$ GTS    : num [1:373397] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ OPA    : Factor w/ 1 level "HumanCNV370-Quadv3_C": 1 1 1 1 1 1
1 1 1 1 ...
  ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with
1 slots
  .. .. ..@ .Data:List of 1
  .. .. .. ..$ : int [1:3] 1 1 0

> sessionInfo()
R version 2.8.0 Patched (2008-11-06 r46845) 
powerpc-apple-darwin9.5.0 

locale:
C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods

[8] base     

other attached packages:
[1] RSQLite_0.7-1      DBI_0.2-4          VanillaICE_1.4.0
SNPchip_1.6.0     
[5] oligoClasses_1.4.0 Biobase_2.2.1     

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.4.1            SparseM_0.78                  
 [3] affy_1.20.0                    affyio_1.10.1                 
 [5] annotate_1.20.1                beadarraySNP_1.8.0            
 [7] grid_2.8.0                     illuminaHumanv3.db_1.1.2      
 [9] illuminaHumanv3BeadID.db_1.1.2 limma_2.16.3                  
[11] lumi_1.8.3                     lumiHumanAll.db_1.4.0         
[13] lumiHumanIDMapping_1.0.1       mgcv_1.4-1                    
[15] preprocessCore_1.4.0           quantsmooth_1.8.0             
[17] xtable_1.5-4                  
>

> 
> A couple of short-cuts / tips. fData(obj) gives you direct access to
> pData(featureData(obj)). 'extract-then-subset' fData(obj))[,"cols"] --
> will usually be more efficient that subset then extract; there's also
a
> subtle difference that might be causing problems here (as you do it,
you
> end up with a 1-column data frame for 'chromosome', whereas
> extract-then-subset results in a vector). '[[' pulls out a single
column
> with featureData(obj)[["cols"]] (also [[<- can be useful for defining
a
> single column and creating a labelDescription; obj[["cols"]] gives
> direct access to pData(obj)[["cols"]]).

I will work these ideas into my script.  Thanks much for the pointers.

> 
> Martin
> 
> > Explicitly forcing column names and
> > mode "character" for the chromosome column
> > solves the problem
> >
> >  ccld.position <- pData(featureData(ccld)[, "MapInfo"])
> >  names(ccld.position) <- "position"
> >  ccld.chromosome <- pData(featureData(ccld)[, "CHR"])
> >  names(ccld.chromosome) <- "chromosome"
> >  ccld.chromosome$chromosome <-
as.character(ccld.chromosome$chromosome)
> >
> >  cclfd <-
> >    new("AnnotatedDataFrame",
> >        data = data.frame(position = ccld.position,
> >          chromosome = ccld.chromosome,
> >          stringsAsFactors = FALSE),
> >        varMetadata = data.frame(labelDescription = c("position",
> "chromosome")))
> >
> > and I can create the oligoSnpSet object successfully.
> >
> >> cclss <-
> > +   new("oligoSnpSet", copyNumber = logR, calls = gt,
> > +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
> > +       featureData = cclfd, annotation = "HumanCNV370-Quad")
> >> str(cclss)
> > Formal class 'oligoSnpSet' [package "oligoClasses"] with 6 slots
> >
> >
> > So it was the absence of columns named "chromosome" and "position"
> > in the 'data' slot of the featureData object that caused internal
> > code to attempt to acquire chromosome positional information from
> > an annotation source.
> >
> > With the featureData at data data frame having the correct column
> > labels "chromosome" and "position", the annotation argument
> > is not processed further (it is just added to the SnpSet
> > object's 'annotation' slot).
> >
> > Thanks again to Robert Scharpf.
> >
> > Best
> >
> > Steve McKinney
> >
> >
> >
> >
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch on behalf of Steven
> McKinney
> > Sent: Tue 11/25/2008 9:56 PM
> > To: Bioconductor at stat.math.ethz.ch
> > Subject: [BioC] Creating a new instance of oligoSnpSet
> >
> > Hello All,
> >
> > I am trying to get some Illumina HumanCNV370-Quad
> > data into VanillaICE to do some copy number analysis.
> >
> > In attempting to create an object of class "oligoSnpSet"
> > I can not seem to specify an annotation that works.
> >
> > e.g. as specified in a vignette
> >
> >> cclss <-
> > +   new("oligoSnpSet", copyNumber = logR, calls = gt,
> > +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
> > +       featureData = cclfd, annotation = "Illumina550k")
> > Loading required package: Illumina550k
> > Error in db(object) : Illumina550k package not available
> > In addition: Warning message:
> > In library(package, lib.loc = lib.loc, character.only = TRUE,
> logical.return = TRUE,  :
> >   there is no package called 'Illumina550k'
> > Error in dbGetQuery(db(object), sql) :
> >   error in evaluating the argument 'conn' in selecting a method for
> function 'dbGetQuery'
> >
> > or even if I specify some annotation that does exist
> >
> >> cclss <-
> > +   new("oligoSnpSet", copyNumber = logR, calls = gt,
> > +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
> > +       featureData = cclfd, annotation = "hgu133plus2cdf")
> > Loading required package: hgu133plus2cdf
> > Error in db(object) :
> >   trying to get slot "getdb" from an object of a basic class
> ("environment") with no slots
> > Error in dbGetQuery(db(object), sql) :
> >   error in evaluating the argument 'conn' in selecting a method for
> function 'dbGetQuery'
> >
> >
> > Is there a way to work around this annotation bit of building
> > an eSet object?
> >
> > I can't figure out from documentation, reading source code, or
> > experimenting, as to what will work for this annotation argument.
> >
> > I'm a bit hooped as there does not yet appear to be annotation
> > for the Illumina HumanCNV370-Quad, but I have annotation
> > information from other files from Illumina etc.
> >
> > Can I put some dummy object as an argument for annotation
> > and patch it up with my known info?
> >
> > Any ideas?
> >
> >
> > Steven McKinney
> >
> > Statistician
> > Molecular Oncology and Breast Cancer Program
> > British Columbia Cancer Research Centre
> >
> > email: smckinney +at+ bccrc +dot+ ca
> >
> > tel: 604-675-8000 x7561
> >
> > BCCRC
> > Molecular Oncology
> > 675 West 10th Ave, Floor 4
> > Vancouver B.C.
> > V5Z 1L3
> > Canada
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793