[R] data.frame transformation

Bill.Venables at csiro.au Bill.Venables at csiro.au
Tue Mar 15 00:16:49 CET 2011


It is possible to do it with numeric comparisons, as well, but to make life comfortable you need to turn off the warning system temporarily.

df <- data.frame(q1 = c(0,0,33.33,"check"),
                 q2 = c(0,33.33,"check",9.156),
                 q3 = c("check","check",25,100),
                 q4 = c(7.123,35,100,"check"))

conv <- function(x, cutoff) {
	oldOpt <- options(warn = -1)
	on.exit(options(oldOpt))
	x <- as.factor(x)
	lev <- as.numeric(levels(x))
	levels(x)[!is.na(lev) & lev < cutoff] <- "."
	x
}

Check:
> (df1 <- data.frame(lapply(df, conv, cutoff = 10)))
     q1    q2    q3    q4
1     .     . check     .
2     . 33.33 check    35
3 33.33 check    25   100
4 check     .   100 check
> 

Bill Venables. 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
Sent: Tuesday, 15 March 2011 6:29 AM
To: andrija djurovic
Cc: r-help at r-project.org
Subject: Re: [R] data.frame transformation


On Mar 14, 2011, at 3:51 PM, andrija djurovic wrote:

> I would like to hide cells with values less the 10%, so "." or just  
> "" doesn't make me any difference. Also I used apply combined with
> as.character:
>
> apply(df, 2, function(x)  ifelse(as.character(x) < 10,".",x))
>
> This is, probably not a good solution, but it works except that I  
> lose  row names and because of that I was wondering if there is some  
> other way to do this.
>
> Anyway thank you both i will try to do this before combining numbers  
> and strings.

I saw your later assertion that it didn't work which surprised me. My  
version of your data followed my advice not to use factors and your  
effort did succeed when the columns were character rather than factor.  
I put back the row numbers by coercing back to a data.frame. `apply`  
returns a matrix.

 > df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33," check",9.156),
+ q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),  
stringsAsFactors=FALSE)
 > as.data.frame(apply(df, 2, function(x)  ifelse(as.character(x) <  
10,".",x)))
      q1    q2    q3    q4
1     .     . check 7.123
2     . 33.33 check    35
3 33.33     .    25   100
4 check 9.156   100 check

There is a danger of using character collation in that if there are  
any leading characters in those strings that are below "1" such as a  
<blank> or any other punctuation, they will get "dotted".

 > "," < "1"
[1] TRUE
 > "." < "1"
[1] TRUE
 > "-" < "1"
[1] TRUE

And "1.check" would also get "dotted"

 > "1.check" < 10
[1] TRUE

>
> Andrija
>
> On Mon, Mar 14, 2011 at 8:11 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On Mar 14, 2011, at 2:52 PM, andrija djurovic wrote:
>
> Hi R users,
>
> I have following data frame
>
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
> q3=c("check","check",25,100),q4=c(7.123,35,100,"check"))
>
> and i would like to replace every element that is less then 10  
> with . (dot)
> in order to obtain this:
>
>    q1    q2    q3    q4
> 1     .     . check     .
> 2     . 33.33 check    35
> 3 33.33 check    25   100
> 4 check     .   100 check
>
> I had a lot of difficulties because each variable is factor.
>
> Right, so comparisons with "<" will throw an error.  I would  
> sidestep the factor problem with stringsAsFactors=FALSE in the  
> data.frame call. You might want to reconsider the "." as a missing  
> value. If you are coming from a SAS background, you should try to  
> get comfortable with NA or NA_character as a value.
>
>
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
>  q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),  
> stringsAsFactors=FALSE)
>
> is.na(df) <- t(apply(df, 1, function(x)  as.numeric(x) < 10))
>
> Warning messages:
> 1: In FUN(newX[, i], ...) : NAs introduced by coercion
> 2: In FUN(newX[, i], ...) : NAs introduced by coercion
> 3: In FUN(newX[, i], ...) : NAs introduced by coercion
> 4: In FUN(newX[, i], ...) : NAs introduced by coercion
> > df
>     q1    q2    q3    q4
> 1  <NA>  <NA> check  <NA>
> 2  <NA> 33.33 check    35
>
> 3 33.33 check    25   100
> 4 check  <NA>   100 check
>
>
> Could someone help me with this?
>
> Thanks in advance for any help.
>
> Andrija
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list