[R] data.frame transformation

David Winsemius dwinsemius at comcast.net
Mon Mar 14 21:28:54 CET 2011


On Mar 14, 2011, at 3:51 PM, andrija djurovic wrote:

> I would like to hide cells with values less the 10%, so "." or just  
> "" doesn't make me any difference. Also I used apply combined with
> as.character:
>
> apply(df, 2, function(x)  ifelse(as.character(x) < 10,".",x))
>
> This is, probably not a good solution, but it works except that I  
> lose  row names and because of that I was wondering if there is some  
> other way to do this.
>
> Anyway thank you both i will try to do this before combining numbers  
> and strings.

I saw your later assertion that it didn't work which surprised me. My  
version of your data followed my advice not to use factors and your  
effort did succeed when the columns were character rather than factor.  
I put back the row numbers by coercing back to a data.frame. `apply`  
returns a matrix.

 > df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33," check",9.156),
+ q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),  
stringsAsFactors=FALSE)
 > as.data.frame(apply(df, 2, function(x)  ifelse(as.character(x) <  
10,".",x)))
      q1    q2    q3    q4
1     .     . check 7.123
2     . 33.33 check    35
3 33.33     .    25   100
4 check 9.156   100 check

There is a danger of using character collation in that if there are  
any leading characters in those strings that are below "1" such as a  
<blank> or any other punctuation, they will get "dotted".

 > "," < "1"
[1] TRUE
 > "." < "1"
[1] TRUE
 > "-" < "1"
[1] TRUE

And "1.check" would also get "dotted"

 > "1.check" < 10
[1] TRUE

>
> Andrija
>
> On Mon, Mar 14, 2011 at 8:11 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On Mar 14, 2011, at 2:52 PM, andrija djurovic wrote:
>
> Hi R users,
>
> I have following data frame
>
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
> q3=c("check","check",25,100),q4=c(7.123,35,100,"check"))
>
> and i would like to replace every element that is less then 10  
> with . (dot)
> in order to obtain this:
>
>    q1    q2    q3    q4
> 1     .     . check     .
> 2     . 33.33 check    35
> 3 33.33 check    25   100
> 4 check     .   100 check
>
> I had a lot of difficulties because each variable is factor.
>
> Right, so comparisons with "<" will throw an error.  I would  
> sidestep the factor problem with stringsAsFactors=FALSE in the  
> data.frame call. You might want to reconsider the "." as a missing  
> value. If you are coming from a SAS background, you should try to  
> get comfortable with NA or NA_character as a value.
>
>
> df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
>  q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),  
> stringsAsFactors=FALSE)
>
> is.na(df) <- t(apply(df, 1, function(x)  as.numeric(x) < 10))
>
> Warning messages:
> 1: In FUN(newX[, i], ...) : NAs introduced by coercion
> 2: In FUN(newX[, i], ...) : NAs introduced by coercion
> 3: In FUN(newX[, i], ...) : NAs introduced by coercion
> 4: In FUN(newX[, i], ...) : NAs introduced by coercion
> > df
>     q1    q2    q3    q4
> 1  <NA>  <NA> check  <NA>
> 2  <NA> 33.33 check    35
>
> 3 33.33 check    25   100
> 4 check  <NA>   100 check
>
>
> Could someone help me with this?
>
> Thanks in advance for any help.
>
> Andrija
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list