[R] A problem subsetting a data frame

David Winsemius dwinsemius at comcast.net
Tue Nov 27 07:50:51 CET 2012


On Nov 26, 2012, at 11:10 PM, David Winsemius wrote:

>
> On Nov 26, 2012, at 3:05 PM, Aki Hoji wrote:
>
>> Hi all,
>>
>> I have this microarray large microarray data set (ALL)  from which  
>> I would like to subset or extract  a set of data based on a factor  
>> ($mol.biol).    I looked up some example of subsetting in, picked  
>> up two commands  and tried both  but I got error messages as follows
>>
>>> testset <- subset(ALL, ALL$mol.biol %in% c("BCR/ABL","ALL1/AF4"))
>>
>>>> Error in c("BCR/ABL", "ALL1/AF4") : unused argument(s) ("ALL1/AF4")
>>
>>
>>> testset <- ALL[ALL$mol.biol %in% c("BCR/ABL,NEG"), ]
>>>> Error in ALL[ALL$mol.biol %in% c(BCR/ABL, NEG), ] :
>
> Looking done below you see mostly "@" signs, not the "$" signs that  
> you would expect to see if you were hoping to use the "$" function.  
> You need to learn to deal with S4 objects. Does an ExpressionSet- 
> object have extractor functions? Is `subset` one of those?
>
> I'm guessing there might be a more approved way of extracting the  
> 'mol.biol' component of the 'data' dataframe of the 'phenoData'  
> component, but this would be the hackish way of approaching it:
>
> ALLpdat <- ALL@ phenoData at data
>
> ALLpdat[ ALLpdat$mol.biol %in% c("BCR/ABL"), ]
>
> I picked a factor level that I could tell should work based on the  
> str output below. If you wanted to see the entire set of legal  
> factor levels you would type:
>
> levels( ALLpdat$mol.biol)
>
> I really think you need to do quite a bit more self-study since you  
> seem to not understand some fairly basic issues about Bioconductor  
> sorts of object which are often S4 Formal Classes. You should  
> probably get your hands on some vignettes that use these sorts of  
> data structures.
>

Following that advice one finds many such tutorials with the Google  
search: { ALL expressionset } and there _is_ an extractor function for  
that component of an ExpressionSet object.
See: http://bcb.dfci.harvard.edu/~aedin/courses/BiocDec2011/Slides2.ppt

There is (or was) a tutorial at your own institution:

www.biostat.pitt.edu/biost2055/11/110202_W5_Lab2.doc

(But it will not load with my browser.)

Notice the similarity of the output to  the 'data' portion'  (after  
one installs and loads the ALL package which would have been courteous  
of you to have mentioned).


str(phenoData(ALL))

Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
   ..@ varMetadata      :'data.frame':	21 obs. of  1 variable:
   .. ..$ labelDescription: chr [1:21] " Patient ID" " Date of  
diagnosis" " Gender of the patient" " Age of the patient at entry" ...
   ..@ data             :'data.frame':	128 obs. of  21 variables:
   .. ..$ cod           : chr [1:128] "1005" "1010" "3002" "4006" ...
   .. ..$ diagnosis     : chr [1:128] "5/21/1997" "3/29/2000"  
"6/24/1998" "7/17/1997" ...
   .. ..$ sex           : Factor w/ 2 levels "F","M": 2 2 1 2 2 2 1 2  
2 2 ...
   .. ..$ age           : int [1:128] 53 19 52 38 57 17 18 16 15 40 ...
   .. ..$ BT            : Factor w/ 10 levels "B","B1","B2",..: 3 3 5  
2 3 2 2 2 3 3 ...
   .. ..$ remission     : Factor w/ 2 levels "CR","REF": 1 1 1 1 1 1 1  
1 1 1 ...
   .. ..$ CR            : chr [1:128] "CR" "CR" "CR" "CR" ...
   .. ..$ date.cr       : chr [1:128] "8/6/1997" "6/27/2000"  
"8/17/1998" "9/8/1997" ...
   .. ..$ t(4;11)       : logi [1:128] FALSE FALSE NA TRUE FALSE  
FALSE ...
   .. ..$ t(9;22)       : logi [1:128] TRUE FALSE NA FALSE FALSE  
FALSE ...
   .. ..$ cyto.normal   : logi [1:128] FALSE FALSE NA FALSE FALSE  
FALSE ...
--------snipped further output-----------

So do some searching and self-study.

> -- 
> David.
>
>>>
>>>> error in evaluating the argument 'i' in selecting a method for  
>>>> function '[': Error in c(BCR/ABL, NEG) : unused argument(s) (NEG)
>>
>> At this point, I really appreciate any inputs to move forward. ….
>>
>>> str(ALL)
>>> Formal class 'ExpressionSet' [package "Biobase"] with 7 slots
>>> ..@ experimentData   :Formal class 'MIAME' [package "Biobase"]  
>>> with 13 slots
>>> .. .. ..@ name             : chr "Chiaretti et al."
>>> .. .. ..@ lab              : chr "Department of Medical Oncology,  
>>> Dana-Farber Cancer Institute, Department of Medicine, Brigham and  
>>> Women's Hospital, Harvard Med"| __truncated__
>>> .. .. ..@ contact          : chr ""
>>> .. .. ..@ title            : chr "Gene expression profile of adult  
>>> T-cell acute lymphocytic leukemia identifies distinct subsets of  
>>> patients with different respo"| __truncated__
>>> .. .. ..@ abstract         : chr "Gene expression profiles were  
>>> examined in 33 adult patients with T-cell acute lymphocytic  
>>> leukemia (T-ALL). Nonspecific filteri"| __truncated__
>>> .. .. ..@ url              : chr ""
>>> .. .. ..@ pubMedIds        : chr [1:2] "14684422" "16243790"
>>> .. .. ..@ samples          : list()
>>> .. .. ..@ hybridizations   : list()
>>> .. .. ..@ normControls     : list()
>>> .. .. ..@ preprocessing    : list()
>>> .. .. ..@ other            : list()
>>> .. .. ..@ .__classVersion__:Formal class 'Versions' [package  
>>> "Biobase"] with 1 slots
>>> .. .. .. .. ..@ .Data:List of 1
>>> .. .. .. .. .. ..$ : int [1:3] 1 0 0
>>> ..@ assayData        :<environment: 0x1078636e8>
>>> ..@ phenoData        :Formal class 'AnnotatedDataFrame' [package  
>>> "Biobase"] with 4 slots
>>> .. .. ..@ varMetadata      :'data.frame':	21 obs. of  1 variable:
>>> .. .. .. ..$ labelDescription: chr [1:21] " Patient ID" " Date of  
>>> diagnosis" " Gender of the patient" " Age of the patient at  
>>> entry" ...
>>> .. .. ..@ data             :'data.frame':	128 obs. of  21 variables:
>>> .. .. .. ..$ cod           : chr [1:128] "1005" "1010" "3002"  
>>> "4006" ...
>>> .. .. .. ..$ diagnosis     : chr [1:128] "5/21/1997" "3/29/2000"  
>>> "6/24/1998" "7/17/1997" ...
>>> .. .. .. ..$ sex           : Factor w/ 2 levels "F","M": 2 2 1 2 2  
>>> 2 1 2 2 2 ...
>>> .. .. .. ..$ age           : int [1:128] 53 19 52 38 57 17 18 16  
>>> 15 40 ...
>>> .. .. .. ..$ BT            : Factor w/ 10 levels "B","B1","B2",..:  
>>> 3 3 5 2 3 2 2 2 3 3 ...
>>> .. .. .. ..$ remission     : Factor w/ 2 levels "CR","REF": 1 1 1  
>>> 1 1 1 1 1 1 1 ...
>>> .. .. .. ..$ CR            : chr [1:128] "CR" "CR" "CR" "CR" ...
>>> .. .. .. ..$ date.cr       : chr [1:128] "8/6/1997" "6/27/2000"  
>>> "8/17/1998" "9/8/1997" ...
>>> .. .. .. ..$ t(4;11)       : logi [1:128] FALSE FALSE NA TRUE  
>>> FALSE FALSE ...
>>> .. .. .. ..$ t(9;22)       : logi [1:128] TRUE FALSE NA FALSE  
>>> FALSE FALSE ...
>>> .. .. .. ..$ cyto.normal   : logi [1:128] FALSE FALSE NA FALSE  
>>> FALSE FALSE ...
>>> .. .. .. ..$ citog         : chr [1:128] "t(9;22)" "simple alt."  
>>> NA "t(4;11)" ...
>>> .. .. .. ..$ mol.biol      : Factor w/ 6 levels "ALL1/AF4","BCR/ 
>>> ABL",..: 2 4 2 1 4 4 4 4 4 2 ...
>> snipped
>>
>> Aki Hoji, Ph.D
>> Dept. Infectious Diseases & Microbiology
>> University of PIttsburgh
>> Rm427 Parran Hall, GSPH-IDM
>> 130 Desoto St., Pittsburgh, PA 15261
>>
>


David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list