[R] custom subset method / handling columns selection as logic in '...' parameter

Martin Morgan mtmorgan at fhcrc.org
Thu Dec 20 15:46:58 CET 2007


Eric --

Please don't cross post

Please simplify your example so that others do not have to work hard
to understand what you are asking

See additional response on the Bioconductor mailing list.

Martin

"Eric Lecoutre" <ericlecoutre at gmail.com> writes:

> Dear R-helpers & bioconductor
>
>
> Sorry for cross-posting, this concerns R-programming stuff applied on
> Bioconductor context.
> Also sorry for this long message, I try to be complete in my request.
>
> I am trying to write a subset method for a specific class (ExpressionSet
> from Bioconductor) allowing selection more flexible than "[" method .
>
> The schema I am thinking for is the following:
>
> subset.ExpressionSet <- function(x,subset,...){
>
> }
>
> I will use the subset argument for rows (genes), as in default method.
>
> Now I would like to allow to select different columns (features) based on
> phenotypic data.
> phenotypic data provides detailed information about the columns.
>
> Basically, first function I have written allows the following:
>
>> sub1 <- subset(ExpressionSetObject, subset=NULL, V1=value1, v2=value2)
> # subset=NULL takes all rows
>
> See: there are two conditions on two variables belonging to the associated
> data.frame encapsulated in the ExpressionSetObject (to be complete, the
> conditions will be applied on more of 2 columns, as they are used on the
> phylogenic data.frame that concerns all variables)
> To simplify a little bit, this would nearly return:
> ExpressionSetObject[,V1==value & V2==value]
>
> This is nice as I can already handle any number of conditions on variables
> values thanks to '...'. First step is
> conditions <- list(...) and are then handled later in code
>
> Nevertheless, those conditions are basic (one value).
>
> I would like to handle arbitrary conditions, such as: V1 %in% c(value1,
> value2)
> More simple expression would be passed with V2==value instead of V2=value2
>
> My very problem is that I don't know how to turn '...' into an object
> containing those conditions that could be used later.
>
> My attempt which seems the nearest is:
>
>> foo <- function(...){
>> as.expression(substitute(list(...)))
>> }
>>foo(x==1,y%in%1:2)
> expression(list(x == 1, y %in% 1:2))
>
> where as I would like to have something like
> list(expression(x==1), expression(y %in% 1:2))
> those expressions beeing evaluated later on in the context of my specific
> object.
>
>
> Are there any existing function where '...' are already handled the way I
> want so that I can mimic?
>
> Thanks for any insight.
>
>
> Eric
>
> ---
>
> For those who have Biobase available, here is my current subset function and
> a demo-case that explains a little bit.
>
>
> library(Biobase)
> example(ExpressionSet) # create sample object
> print(expressionSet)
>
> # now my subset function as it is
>
> subset.ExpressionSet <- function(x,subset=NULL,verbose=TRUE,...){
>   # subset is used to subset on rows
>   # ... is used to make multiple conditions on columns based on pData
>   # list of conditions is handled in ...
>     stopifnot(is(x,"ExpressionSet"))
>     phenoData <- pData(x)
>     listCriteria <- list(...)
>     if (is.null(subset)) subset <- rep(TRUE,nrow(exprs(x)))
>     subset <- subset & !is.na(subset)
>     retainedCriteria <- list()
>     tmp <- sapply(names(listCriteria), function(critname) {
>       if(!critname %in% colnames(phenoData)){
>         if (verbose) cat("\n*** subsetCompounds: Dropped
> criteria:",critname, "not in phenoData of object\n")
>       }else{
>         if(is.null(listCriteria[critname])) listCriteria[[critname]]<-
> unique(phenoData[,critname])
>         retainedCriteria[[critname]] <<-  phenoData[,critname] %in%
> listCriteria[critname]
>       }
>       })
>       criteriaValues <- do.call("cbind",retainedCriteria)
>
>      selectedColumns <- rownames(phenoData)[apply(criteriaValues,1,logic)]
>       ## cbind(phenoData,criteriaValues)
>       out <- x[subset,selectedColumns]
>     if (verbose)  cat('\n',length(selectedColumns),' columns selected
> (',paste(selectedColumns,collapse=' '),
>       ')\n',sep='')
>      invisible(return(out))
>   }
>
> # looking at phenotypic data associated with the sample expressionSet
>> pData(expressionSet)
>      sex    type score
> A Female Control  0.75
> B   Male    Case  0.40
> C   Male Control  0.73
> D   Male    Case  0.42
> E Female    Case  0.93
> F   Male Control  0.22
> G   Male    Case  0.96
> H   Male    Case  0.79
> I Female    Case  0.37
> J   Male Control  0.63
> K   Male    Case  0.26
> L Female Control  0.36
> M   Male    Case  0.41
> N   Male    Case  0.80
> O Female    Case  0.10
> P Female Control  0.41
> Q Female    Case  0.16
> R   Male Control  0.72
> S   Male    Case  0.17
> T Female    Case  0.74
> U   Male Control  0.35
> V Female Control  0.77
> W   Male Control  0.27
> X   Male Control  0.98
> Y Female    Case  0.94
> Z Female    Case  0.32
>
>
> # now the sample use
>> (subset1 =subset(expressionSet,sex="Male",type="Control"))
> 7 columns selected (C F J R U W X)
> ExpressionSet (storageMode: lockedEnvironment)
> assayData: 500 features, 7 samples
>   element names: exprs, se.exprs
> phenoData
>   sampleNames: C, F, ..., X  (7 total)
>   varLabels and varMetadata description:
>     sex: Female/Male
>     type: Case/Control
>     score: Testing Score
> featureData
>   featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at  (500 total)
>   fvarLabels and fvarMetadata description: none
> experimentData: use 'experimentData(object)'
> Annotation: hgu95av2
>
>
> # what I would like to allow in use:
> (subset2 = subset(expressionSet, sex=="Male", score > 0.75) # note the ==
> instead of =
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-help mailing list