[R] A problem subsetting a data frame

David Winsemius dwinsemius at comcast.net
Tue Nov 27 07:10:45 CET 2012


On Nov 26, 2012, at 3:05 PM, Aki Hoji wrote:

> Hi all,
>
> I have this microarray large microarray data set (ALL)  from which I  
> would like to subset or extract  a set of data based on a factor  
> ($mol.biol).    I looked up some example of subsetting in, picked up  
> two commands  and tried both  but I got error messages as follows
>
>> testset <- subset(ALL, ALL$mol.biol %in% c("BCR/ABL","ALL1/AF4"))
>
>>> Error in c("BCR/ABL", "ALL1/AF4") : unused argument(s) ("ALL1/AF4")
>
>
>> testset <- ALL[ALL$mol.biol %in% c("BCR/ABL,NEG"), ]
>>> Error in ALL[ALL$mol.biol %in% c(BCR/ABL, NEG), ] :

Looking done below you see mostly "@" signs, not the "$" signs that  
you would expect to see if you were hoping to use the "$" function.  
You need to learn to deal with S4 objects. Does an ExpressionSet- 
object have extractor functions? Is `subset` one of those?

I'm guessing there might be a more approved way of extracting the  
'mol.biol' component of the 'data' dataframe of the 'phenoData'  
component, but this would be the hackish way of approaching it:

ALLpdat <- ALL@ phenoData at data

ALLpdat[ ALLpdat$mol.biol %in% c("BCR/ABL"), ]

I picked a factor level that I could tell should work based on the str  
output below. If you wanted to see the entire set of legal factor  
levels you would type:

levels( ALLpdat$mol.biol)

I really think you need to do quite a bit more self-study since you  
seem to not understand some fairly basic issues about Bioconductor  
sorts of object which are often S4 Formal Classes. You should probably  
get your hands on some vignettes that use these sorts of data  
structures.

-- 
David.

>>
>>>  error in evaluating the argument 'i' in selecting a method for  
>>> function '[': Error in c(BCR/ABL, NEG) : unused argument(s) (NEG)
>
> At this point, I really appreciate any inputs to move forward. ….
>
>> str(ALL)
>> Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
>>  ..@ experimentData   :Formal class 'MIAME' [package "Biobase"]  
>> with 13 slots
>>  .. .. ..@ name             : chr "Chiaretti et al."
>>  .. .. ..@ lab              : chr "Department of Medical Oncology,  
>> Dana-Farber Cancer Institute, Department of Medicine, Brigham and  
>> Women's Hospital, Harvard Med"| __truncated__
>>  .. .. ..@ contact          : chr ""
>>  .. .. ..@ title            : chr "Gene expression profile of adult  
>> T-cell acute lymphocytic leukemia identifies distinct subsets of  
>> patients with different respo"| __truncated__
>>  .. .. ..@ abstract         : chr "Gene expression profiles were  
>> examined in 33 adult patients with T-cell acute lymphocytic  
>> leukemia (T-ALL). Nonspecific filteri"| __truncated__
>>  .. .. ..@ url              : chr ""
>>  .. .. ..@ pubMedIds        : chr [1:2] "14684422" "16243790"
>>  .. .. ..@ samples          : list()
>>  .. .. ..@ hybridizations   : list()
>>  .. .. ..@ normControls     : list()
>>  .. .. ..@ preprocessing    : list()
>>  .. .. ..@ other            : list()
>>  .. .. ..@ .__classVersion__:Formal class 'Versions' [package  
>> "Biobase"] with 1 slots
>>  .. .. .. .. ..@ .Data:List of 1
>>  .. .. .. .. .. ..$ : int [1:3] 1 0 0
>>  ..@ assayData        :<environment: 0x1078636e8>
>>  ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package  
>> "Biobase"] with 4 slots
>>  .. .. ..@ varMetadata      :'data.frame':	21 obs. of  1 variable:
>>  .. .. .. ..$ labelDescription: chr [1:21] " Patient ID" " Date of  
>> diagnosis" " Gender of the patient" " Age of the patient at  
>> entry" ...
>>  .. .. ..@ data             :'data.frame':	128 obs. of  21 variables:
>>  .. .. .. ..$ cod           : chr [1:128] "1005" "1010" "3002"  
>> "4006" ...
>>  .. .. .. ..$ diagnosis     : chr [1:128] "5/21/1997" "3/29/2000"  
>> "6/24/1998" "7/17/1997" ...
>>  .. .. .. ..$ sex           : Factor w/ 2 levels "F","M": 2 2 1 2 2  
>> 2 1 2 2 2 ...
>>  .. .. .. ..$ age           : int [1:128] 53 19 52 38 57 17 18 16  
>> 15 40 ...
>>  .. .. .. ..$ BT            : Factor w/ 10 levels "B","B1","B2",..:  
>> 3 3 5 2 3 2 2 2 3 3 ...
>>  .. .. .. ..$ remission     : Factor w/ 2 levels "CR","REF": 1 1 1  
>> 1 1 1 1 1 1 1 ...
>>  .. .. .. ..$ CR            : chr [1:128] "CR" "CR" "CR" "CR" ...
>>  .. .. .. ..$ date.cr       : chr [1:128] "8/6/1997" "6/27/2000"  
>> "8/17/1998" "9/8/1997" ...
>>  .. .. .. ..$ t(4;11)       : logi [1:128] FALSE FALSE NA TRUE  
>> FALSE FALSE ...
>>  .. .. .. ..$ t(9;22)       : logi [1:128] TRUE FALSE NA FALSE  
>> FALSE FALSE ...
>>  .. .. .. ..$ cyto.normal   : logi [1:128] FALSE FALSE NA FALSE  
>> FALSE FALSE ...
>>  .. .. .. ..$ citog         : chr [1:128] "t(9;22)" "simple alt."  
>> NA "t(4;11)" ...
>>  .. .. .. ..$ mol.biol      : Factor w/ 6 levels "ALL1/AF4","BCR/ 
>> ABL",..: 2 4 2 1 4 4 4 4 4 2 ...
>>  .. .. .. ..$ fusion protein: Factor w/ 3 levels "p190","p190/ 
>> p210",..: 3 NA 1 NA NA NA NA NA NA 1 ...
>>  .. .. .. ..$ mdr           : Factor w/ 2 levels "NEG","POS": 1 2 1  
>> 1 1 1 2 1 1 1 ...
>>  .. .. .. ..$ kinet         : Factor w/ 2 levels  
>> "dyploid","hyperd.": 1 1 1 1 1 2 2 1 1 NA ...
>>  .. .. .. ..$ ccr           : logi [1:128] FALSE FALSE FALSE FALSE  
>> FALSE FALSE ...
>>  .. .. .. ..$ relapse       : logi [1:128] FALSE TRUE TRUE TRUE  
>> TRUE TRUE ...
>>  .. .. .. ..$ transplant    : logi [1:128] TRUE FALSE FALSE FALSE  
>> FALSE FALSE ...
>>  .. .. .. ..$ f.u           : chr [1:128] "BMT / DEATH IN CR" "REL"  
>> "REL" "REL" ...
>>  .. .. .. ..$ date last seen: chr [1:128] NA "8/28/2000"  
>> "10/15/1999" "1/23/1998" ...
>>  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
>>  .. .. ..@ .__classVersion__:Formal class 'Versions' [package  
>> "Biobase"] with 1 slots
>>  .. .. .. .. ..@ .Data:List of 1
>>  .. .. .. .. .. ..$ : int [1:3] 1 1 0
>>  ..@ featureData      :Formal class 'AnnotatedDataFrame' [package  
>> "Biobase"] with 4 slots
>>  .. .. ..@ varMetadata      :'data.frame':	0 obs. of  1 variable:
>>  .. .. .. ..$ labelDescription: logi(0)
>>  .. .. ..@ data             :'data.frame':	12625 obs. of  0 variables
>>  .. .. ..@ dimLabels        : chr [1:2] "featureNames"  
>> "featureColumns"
>>  .. .. ..@ .__classVersion__:Formal class 'Versions' [package  
>> "Biobase"] with 1 slots
>>  .. .. .. .. ..@ .Data:List of 1
>>  .. .. .. .. .. ..$ : int [1:3] 1 1 0
>>  ..@ annotation       : chr "hgu95av2"
>>  ..@ protocolData     :Formal class 'AnnotatedDataFrame' [package  
>> "Biobase"] with 4 slots
>>  .. .. ..@ varMetadata      :'data.frame':	0 obs. of  1 variable:
>>  .. .. .. ..$ labelDescription: chr(0)
>>  .. .. ..@ data             :'data.frame':	128 obs. of  0 variables
>>  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
>>  .. .. ..@ .__classVersion__:Formal class 'Versions' [package  
>> "Biobase"] with 1 slots
>>  .. .. .. .. ..@ .Data:List of 1
>>  .. .. .. .. .. ..$ : int [1:3] 1 1 0
>>  ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"]  
>> with 1 slots
>>  .. .. ..@ .Data:List of 4
>>  .. .. .. ..$ : int [1:3] 2 10 0
>>  .. .. .. ..$ : int [1:3] 2 5 5
>>  .. .. .. ..$ : int [1:3] 1 3 0
>>  .. .. .. ..$ : int [1:3] 1 0 0
>
>
>
> Aki Hoji, Ph.D
> Dept. Infectious Diseases & Microbiology
> University of PIttsburgh
> Rm427 Parran Hall, GSPH-IDM
> 130 Desoto St., Pittsburgh, PA 15261
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list