[BioC] Creating a new instance of oligoSnpSet

Wed Nov 26 23:34:40 CET 2008

Thanks Steve for persisting... cutting to the chase

Steven McKinney wrote:
> Hi Martin,
> 
> Thanks for your feedback.
> 
> I've put requested info inline below.
> I'll work on a reproduce with manageable
> data.
> 

>>> So that's my R lesson for today - names specified in a
>>> data.frame() call don't necessarily stick!
>> Hmm, I'm not sure that's the right lesson -- you don't have to be that
>> suspicious of data.frame.
> 
>  > foo <- data.frame(fooCol1 = 1:3)
>  > bar <- data.frame(barCol1 = 11:13)
>  > baz <- data.frame(a = foo, b = bar)
>  > baz
>    fooCol1 barCol1
>  1       1      11
>  2       2      12
>  3       3      13

Sorry, my mistake, I guess this is the essence of your problem. I 
thought I'd convinced myself that this was not the case, but I got it 
wrong -- columns really are being named based on foo and bar, rather 
than argument names.

Martin

> 
> I thought the column names 'a' and 'b' specified in the
> baz data frame construction would be used, but
> this is not the case.  Not suspicious, just surprised.
> This behaviour is not indicated nor contra-indicated
> in the data.frame() documentation, so it's just one
> of those lessons learned by trial.
> 
>> It might be AnnotatedDataFrame or oligoSnpSet, though. I wonder what
>> your sessionInfo() is? Also what does str(featureData(ccld)) say? An
>> unusual thing is the 'names' attribute of cclfd. Any chance of
> creating
>> a reproducible example (i.e., without access to your files, maybe by
>> referencing help pages [using the 'example()' function] or making a
>> version with just a few features and using dput)?
> 
>> str(featureData(ccld))
> Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
>   ..@ varMetadata      :'data.frame':	4 obs. of  1 variable:
>   .. ..$ labelDescription: chr [1:4] "CHR" "MapInfo" "GTS" "OPA"
>   ..@ data             :'data.frame':	373397 obs. of  4 variables:
>   .. ..$ CHR    : Factor w/ 25 levels "1","10","11",..: 18 18 18 18 18
> 18 18 18 18 18 ...
>   .. .. ..- attr(*, "names")= chr [1:373397] "cnvi0000001" "cnvi0000002"
> "cnvi0000003" "cnvi0000004" ...
>   .. ..$ MapInfo: num [1:373397] 1.64e+08 1.66e+08 1.66e+08 1.66e+08
> 1.67e+08 ...
>   .. ..$ GTS    : num [1:373397] 0 0 0 0 0 0 0 0 0 0 ...
>   .. ..$ OPA    : Factor w/ 1 level "HumanCNV370-Quadv3_C": 1 1 1 1 1 1
> 1 1 1 1 ...
>   ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
>   ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with
> 1 slots
>   .. .. ..@ .Data:List of 1
>   .. .. .. ..$ : int [1:3] 1 1 0
> 
>> sessionInfo()
> R version 2.8.0 Patched (2008-11-06 r46845) 
> powerpc-apple-darwin9.5.0 
> 
> locale:
> C
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> 
> [8] base     
> 
> other attached packages:
> [1] RSQLite_0.7-1      DBI_0.2-4          VanillaICE_1.4.0
> SNPchip_1.6.0     
> [5] oligoClasses_1.4.0 Biobase_2.2.1     
> 
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.4.1            SparseM_0.78                  
>  [3] affy_1.20.0                    affyio_1.10.1                 
>  [5] annotate_1.20.1                beadarraySNP_1.8.0            
>  [7] grid_2.8.0                     illuminaHumanv3.db_1.1.2      
>  [9] illuminaHumanv3BeadID.db_1.1.2 limma_2.16.3                  
> [11] lumi_1.8.3                     lumiHumanAll.db_1.4.0         
> [13] lumiHumanIDMapping_1.0.1       mgcv_1.4-1                    
> [15] preprocessCore_1.4.0           quantsmooth_1.8.0             
> [17] xtable_1.5-4                  
> 
>> A couple of short-cuts / tips. fData(obj) gives you direct access to
>> pData(featureData(obj)). 'extract-then-subset' fData(obj))[,"cols"] --
>> will usually be more efficient that subset then extract; there's also
> a
>> subtle difference that might be causing problems here (as you do it,
> you
>> end up with a 1-column data frame for 'chromosome', whereas
>> extract-then-subset results in a vector). '[[' pulls out a single
> column
>> with featureData(obj)[["cols"]] (also [[<- can be useful for defining
> a
>> single column and creating a labelDescription; obj[["cols"]] gives
>> direct access to pData(obj)[["cols"]]).
> 
> I will work these ideas into my script.  Thanks much for the pointers.
> 
>> Martin
>>
>>> Explicitly forcing column names and
>>> mode "character" for the chromosome column
>>> solves the problem
>>>
>>>  ccld.position <- pData(featureData(ccld)[, "MapInfo"])
>>>  names(ccld.position) <- "position"
>>>  ccld.chromosome <- pData(featureData(ccld)[, "CHR"])
>>>  names(ccld.chromosome) <- "chromosome"
>>>  ccld.chromosome$chromosome <-
> as.character(ccld.chromosome$chromosome)
>>>  cclfd <-
>>>    new("AnnotatedDataFrame",
>>>        data = data.frame(position = ccld.position,
>>>          chromosome = ccld.chromosome,
>>>          stringsAsFactors = FALSE),
>>>        varMetadata = data.frame(labelDescription = c("position",
>> "chromosome")))
>>> and I can create the oligoSnpSet object successfully.
>>>
>>>> cclss <-
>>> +   new("oligoSnpSet", copyNumber = logR, calls = gt,
>>> +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
>>> +       featureData = cclfd, annotation = "HumanCNV370-Quad")
>>>> str(cclss)
>>> Formal class 'oligoSnpSet' [package "oligoClasses"] with 6 slots
>>>
>>>
>>> So it was the absence of columns named "chromosome" and "position"
>>> in the 'data' slot of the featureData object that caused internal
>>> code to attempt to acquire chromosome positional information from
>>> an annotation source.
>>>
>>> With the featureData at data data frame having the correct column
>>> labels "chromosome" and "position", the annotation argument
>>> is not processed further (it is just added to the SnpSet
>>> object's 'annotation' slot).
>>>
>>> Thanks again to Robert Scharpf.
>>>
>>> Best
>>>
>>> Steve McKinney
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: bioconductor-bounces at stat.math.ethz.ch on behalf of Steven
>> McKinney
>>> Sent: Tue 11/25/2008 9:56 PM
>>> To: Bioconductor at stat.math.ethz.ch
>>> Subject: [BioC] Creating a new instance of oligoSnpSet
>>>
>>> Hello All,
>>>
>>> I am trying to get some Illumina HumanCNV370-Quad
>>> data into VanillaICE to do some copy number analysis.
>>>
>>> In attempting to create an object of class "oligoSnpSet"
>>> I can not seem to specify an annotation that works.
>>>
>>> e.g. as specified in a vignette
>>>
>>>> cclss <-
>>> +   new("oligoSnpSet", copyNumber = logR, calls = gt,
>>> +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
>>> +       featureData = cclfd, annotation = "Illumina550k")
>>> Loading required package: Illumina550k
>>> Error in db(object) : Illumina550k package not available
>>> In addition: Warning message:
>>> In library(package, lib.loc = lib.loc, character.only = TRUE,
>> logical.return = TRUE,  :
>>>   there is no package called 'Illumina550k'
>>> Error in dbGetQuery(db(object), sql) :
>>>   error in evaluating the argument 'conn' in selecting a method for
>> function 'dbGetQuery'
>>> or even if I specify some annotation that does exist
>>>
>>>> cclss <-
>>> +   new("oligoSnpSet", copyNumber = logR, calls = gt,
>>> +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
>>> +       featureData = cclfd, annotation = "hgu133plus2cdf")
>>> Loading required package: hgu133plus2cdf
>>> Error in db(object) :
>>>   trying to get slot "getdb" from an object of a basic class
>> ("environment") with no slots
>>> Error in dbGetQuery(db(object), sql) :
>>>   error in evaluating the argument 'conn' in selecting a method for
>> function 'dbGetQuery'
>>>
>>> Is there a way to work around this annotation bit of building
>>> an eSet object?
>>>
>>> I can't figure out from documentation, reading source code, or
>>> experimenting, as to what will work for this annotation argument.
>>>
>>> I'm a bit hooped as there does not yet appear to be annotation
>>> for the Illumina HumanCNV370-Quad, but I have annotation
>>> information from other files from Illumina etc.
>>>
>>> Can I put some dummy object as an argument for annotation
>>> and patch it up with my known info?
>>>
>>> Any ideas?
>>>
>>>
>>> Steven McKinney
>>>
>>> Statistician
>>> Molecular Oncology and Breast Cancer Program
>>> British Columbia Cancer Research Centre
>>>
>>> email: smckinney +at+ bccrc +dot+ ca
>>>
>>> tel: 604-675-8000 x7561
>>>
>>> BCCRC
>>> Molecular Oncology
>>> 675 West 10th Ave, Floor 4
>>> Vancouver B.C.
>>> V5Z 1L3
>>> Canada
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M2 B169
>> Phone: (206) 667-2793

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793