[R] Fwd: Conditional inclusion of an element in an R object

MacQueen, Don macqueen1 at llnl.gov
Sat Jan 11 01:17:43 CET 2014


At the risk of being annoying ...

Your original question was,
 "Is there a way to dynamically include columns in a dataframe?"

The answer is yes. One way, and I think the simplest, is to calculate the
names of the columns you want to keep, and then use an expression like I
suggested, that is, one like
  a1[ , names.to.keep]
instead of using select().

Or calculate the names  you do not want to keep, if that is easier, and
use for example
  a1[, setdiff(names(a1),names.to.drop) ]

Based on your most recent email it seems like calculating which columns to
keep or drop may be difficult. But I would still suggest a better approach
would be to focus on calculating a character vector of column names. Maybe
you can convert your lc1 and lc2 objects to vectors of column names.

Speaking personally, nested ifelse() expressions make me want to get up
and run away. So I've very reluctant to put any effort into trying to
figure out what they produce. But that's just my preference; others may
feel differently.

-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 1/10/14 3:24 PM, "Santosh" <santosh2005 at gmail.com> wrote:

>I don't think apropos or indexing would help. I am open to your
>suggestions/tips.
>
>I usually get multiple versions of a dataset (even with the same column
>names). In the source data, I occasionally notice inconsistencies...
>formatting issues, column naming issues etc..
>
>As shown In the "a1" example,.. the values that are supposed to be in
>column "Qr" are sometimes in column "b". Such differences between versions
>crop up due to various unknown reasons, e.g. when different programmers
>prepare the data set or if the existing practices/processes change.
>
>Likewise, formats of certain date-time columns (not shown in the example)
>also vary, the time date format may be in "%m/%d/%Y %H:%M", "%d %b %Y
>%H:%M" or  "%d%b%Y %H:%M"
>
>So, I would like use programming methods to pick the right one if
>available
>or don't pick at all.
>
>Besides, is there an R equivalent of "%m[/][.]%d[/][,]%y
>[%H[:%M[:%S[.%N]]][%p][[(]%3Z[)]]]" available in Splus (?class.timeDate)
>for tackling time-date format inconsistencies as mentioned above.
>
>Thanks,
>Santosh
>
>
>On Fri, Jan 10, 2014 at 2:32 PM, Bert Gunter <gunter.berton at gene.com>
>wrote:
>
>> Don's response seems apropos to me. Do you understanding indexing,
>> i.e. the "[" operator? If not, you should read An Introduction to R or
>> other tutorial (there are many good ones on the web). If that is not
>> the issue, you need to explain more clearly why his answer does not
>> suffice.
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>> (650) 467-7374
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>> H. Gilbert Welch
>>
>>
>>
>>
>> On Fri, Jan 10, 2014 at 1:59 PM, Santosh <santosh2005 at gmail.com> wrote:
>> > My intention is to include certain columns if they meet certain
>>criteria.
>> > For example, if "b" is one of the columns of a1, then keep otherwise
>> don't.
>> >
>> > HTH..
>> > santosh
>> >
>> >
>> > On Fri, Jan 10, 2014 at 1:01 PM, MacQueen, Don <macqueen1 at llnl.gov>
>> wrote:
>> >
>> >> Apologies, but all that ifelse() stuff is too hard to follow.
>> >>
>> >> What I would do is compute a character vector of column names to
>>keep,
>> >> then do
>> >>
>> >>   a1[ , names.to.keep]
>> >>
>> >> -Don
>> >>
>> >> --
>> >> Don MacQueen
>> >>
>> >> Lawrence Livermore National Laboratory
>> >> 7000 East Ave., L-627
>> >> Livermore, CA 94550
>> >> 925-423-1062
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 1/10/14 12:53 PM, "Santosh" <santosh2005 at gmail.com> wrote:
>> >>
>> >> >Dear Rxperts...
>> >> >
>> >> >I would like to conditionally include an element (as a column) in a
>> >> >dataframe. Please see the  sample code below:
>> >> >
>> >> >There is a correction to the earlier post.. my apologies...
>> >> >a1 <- data.frame(P=rep(1,10),Qr=LETTERS[1:10],b=letters[1:10],
>> >> >R=rep(c("A","B"),each=5))
>> >> >
>> >> > lc1 <- list(C1 = "P",C2 =
>> >>
>> 
>>>ifelse(is.element("Q",names(a1)),"Q",ifelse(is.element("b",names(a1)),"b
>>>",
>> >> >NULL)),C3="R")
>> >> > lc2 <- list(C1 = "P",C2 =
>> >>
>> 
>>>ifelse(is.element("Q",names(a1)),"Q",ifelse(is.element("b",names(a1)),"b
>>>",
>> >> >NULL)),
>> >> >C3=ifelse(is.element("Ra",names(a1)),"Ra",NULL))
>> >> >*The error for the above:*
>> >> >Error in ifelse(is.element("Ra", names(a1)), "Ra", NULL) :
>> >> >  replacement has length zero
>> >> >In addition: Warning message:
>> >> >In rep(no, length.out = length(ans)) :
>> >> >  'x' is NULL so the result will be NULL
>> >> >
>> >> >a2 <- subset(a1, sel=unlist(lc1)) # this works
>> >> >a3 <- subset(a1, sel=unlist(lc2)) # this doesn't work
>> >> >
>> >> >Is there a way to dynamically include columns in a dataframe?
>> >> >
>> >> >Regards,
>> >> >santosh
>> >> >
>> >> >
>> >> >On Fri, Jan 10, 2014 at 12:45 PM, Santosh <santosh2005 at gmail.com>
>> wrote:
>> >> >
>> >> >> Dear Rxperts...
>> >> >>
>> >> >> I would like to conditionally include an element (as a column) in
>>a
>> >> >> dataframe. Please see the  sample code below:
>> >> >>
>> >> >> a1 <-
>> >> >>
>> >>
>> 
>>>>data.frame(P=rep(1,10),Qr=LETTERS[1:10],b=letters[1:10],R=rep(c("A","B"
>>>>),
>> >> >>each=5))
>> >> >>
>> >> >>  lc1 <- list(C1 = "P",C2 =
>> >> >>
>> >>
>> 
>>>>ifelse(is.element("Q",names(a1)),"Q",ifelse(is.element("b",names(a1)),"
>>>>b"
>> >> >>,NULL)),C3="R")
>> >> >>  lc2 <- list(C1 = "P",C2 =
>> >> >>
>> >>
>> 
>>>>ifelse(is.element("Q",names(a1)),"Q",ifelse(is.element("b",names(a1)),"
>>>>b"
>> >> >>,NULL)),C3="Ra")
>> >> >>
>> >> >> a2 <- subset(a1, sel=unlist(lc1)) # this works
>> >> >> a3 <- subset(a1, sel=unlist(lc2)) # this doesn't
>> >> >>
>> >> >> Is there a way to dynamically include columns in a dataframe?
>> >> >>
>> >> >> Regards,
>> >> >> santosh
>> >> >>
>> >> >
>> >> >       [[alternative HTML version deleted]]
>> >> >
>> >> >______________________________________________
>> >> >R-help at r-project.org mailing list
>> >> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >PLEASE do read the posting guide
>> >> >http://www.R-project.org/posting-guide.html
>> >> >and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list