[R] creating a scale (factor) based on a continuous variable nested within levels of factor

hind lazrak hindstata at gmail.com
Sun Nov 7 17:29:50 CET 2010


Hello Dennis and r-helpers

Thank you very much for your reply.

The problem is solved now even if I don't seen why the command that I
had posted as an alternative solution  did not work...
hDatPretty$liking <- by(hDatPretty$rating, hDatPretty$songId,function (z) {
   cut(hDatPretty$z, c(-10, -4,4,10),
   labels=c('dislike', 'neutral', 'like'))}

Hind

On Sun, Nov 7, 2010 at 1:45 AM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> If I get your meaning, the cut() function would appear to be your friend in
> this problem.
>
> hDatPretty$liking <- cut(hDatPretty$rating, breaks = c(-11, -4, 4, 11),
>                                       labels = c('dislike', 'neutral',
> 'like'))
>
> HTH,
> Dennis
>
> On Sat, Nov 6, 2010 at 11:15 PM, hind lazrak <hindstata at gmail.com> wrote:
>>
>> Hello R-helpers
>>
>>
>> I hope that my subject line is not detering anyone from helping me out:)
>> I have been stuck of a few hours now, and I don't seem to pinpoint
>> where the problem is.
>>
>>
>> I have a data.frame which is structured as follow:
>> str(hDatPretty)
>> 'data.frame': 1665 obs. of  8 variables:
>> $ time    : num  0 1.02 2.05 3.07 4.09 ...
>> $ hr      : num  62.4 63.6 64.6 65.5 66.2 ...
>> $ emg     : num  3.3 3.42 3.52 3.57 3.6 ...
>> $ respRate: num  50.4 50.6 50.7 50.8 50.9 ...
>> $ scr     : num  1.7 1.72 1.73 1.74 1.75 ...
>> $ skinTemp: num  28.1 28.2 28.2 28.2 28.2 ...
>> $ rating  : num  4 4 4 4 4 4 4 4 4 4 ...
>> $ songId  : Factor w/ 37 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1
>> ...
>>
>> It consists of ratings ($rating) given by people (here the id variable
>> is not indicated as this is a subset with only one person) for each of
>> the 37 songs ($songId) they listen to.
>> While they are listening we measure physiological responses (emg,
>> hr,...) every second over a period of 45 seconds.
>> Here's a quick peek at the data
>> head(hDatPretty)
>>
>>        time       hr      emg respRate      scr skinTemp rating songId
>> 1.1 0.000000 62.42135 3.300562 50.40538 1.703105 28.14489      4      1
>> 1.2 1.022727 63.59057 3.424884 50.59292 1.718110 28.16189      4      1
>> 1.3 2.045455 64.59840 3.515219 50.73523 1.730594 28.17836      4      1
>> 1.4 3.068182 65.47707 3.573151 50.83909 1.740594 28.19422      4      1
>> 1.5 4.090909 66.22192 3.597183 50.90466 1.748086 28.20948      4      1
>> 1.6 5.113636 66.89209 3.588530 50.91911 1.753385 28.22414      4      1
>>
>> So, every study participant gives one rating (from -10 to 10) for each
>> song
>> If we tab the data this is what we have (for the first 10 songs)
>> table(hDatPretty$songId, hDatPretty$rating)
>>
>>
>>     -10 -9 -7 -3  0  1  3  4  5  7  8  9 10
>>  1    0  0  0  0  0  0  0 45  0  0  0  0  0  # song 1 gets a score of 4
>>  2    0  0  0  0  0  0 45  0  0  0  0  0  0  # song 2 gets a score of 3
>>  3    0  0 45  0  0  0  0  0  0  0  0  0  0  #.
>>  4    0 45  0  0  0  0  0  0  0  0  0  0  0
>>  5    0  0  0  0  0  0  0  0  0 45  0  0  0
>>  6    0  0  0  0  0  0  0  0  0  0  0  0 45
>>  7    0  0  0  0  0  0  0  0  0  0 45  0  0  #song 7 gets a score of 8
>>  8    0  0  0 45  0  0  0  0  0  0  0  0  0
>>  9    0  0  0  0  0  0  0 45  0  0  0  0  0
>>  10   0  0  0  0  0 45  0  0  0  0  0  0  0
>>
>> What I would like to do is to create another scale ( a factor) based
>> on the ratings with the following levels
>> -10;-4 == dislike where -4 is included
>> -4;4 == neutral where -4 is excluded
>> 4;10 == like  where 4 is excluded
>>
>> My code to obtain this new variable
>>
>> liking <- numeric(length(hDatPretty$rating))
>> liking[hDatPretty$rating <= -4] <- 'dislike'
>> liking[hDatPretty$rating > -4 & hDatPretty$rating <= 4] <- 'neutral'
>> liking[hDatPretty$rating > 4] <- 'like'
>>
>> hDatPretty['liking']<- factor(liking)
>>
>> The problem that I have is that for some reasons it does assign
>> different values to the same rating for some songs but not all (?)
>> See for example
>>
>>   dislike like neutral
>> 1        0    8      37   ## Here is one problem where the song #
>> 1gets two 'liking' scores while the rating is constant
>> 2        0    0      45
>> 3       45    0       0
>> 4       45    0       0
>> 5        0   45       0
>> 6        0   45       0
>> 7        0   45       0
>> 8        0    0      45
>> 9        0   10      35  ## here is a similar problem
>>
>> Could you PLEASE help me with the proper code to obtain my 'liking'
>> variable for each of the song based on the rating each song gets?
>>
>> Many thanks.
>>
>>
>> Hind
>> p.s.: I have also tried the cut() in the code as follow...unsuccesfully
>>
>> hDatPretty$liking <- by(hDatPretty$rating, hDatPretty$songId,
>>    function (z) { cut(hDatPretty$z, c(-10, -4,4,10),
>>    labels=c('dislike', 'neutral', 'like'))})
>>
>> Error in cut.default(hDatPretty$z, c(-10, -4, 4, 10), labels =
>> c("dislike",  :
>>  'x' must be numeric
>>
>> again thank you.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list