[R] generate ordered categorical variable in R

Bert Gunter bgunter.4567 at gmail.com
Wed Sep 16 22:40:34 CEST 2015


Nope. Take it back. I stand uncorrected.

> system.time(z <-sample(1:10,1e6, rep=TRUE))
   user  system elapsed
  0.045   0.001   0.047

> system.time(z <-sample.int(10,1e6,rep=TRUE))
   user  system elapsed
  0.012   0.000   0.013


sample() has to do subscripting in the general case; sample.int doesn't.

But I would agree that the difference is likely almost always unnoticeable.


-- Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Wed, Sep 16, 2015 at 1:34 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
> Yes. Thanks Marc. I stand corrected.
>
> -- Bert
> Bert Gunter
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
>    -- Clifford Stoll
>
>
> On Wed, Sep 16, 2015 at 1:28 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
>>
>>> On Sep 16, 2015, at 1:06 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
>>>
>>> Yikes! The uniform distribution is a **continuous** distribution over
>>> an interval. You seem to want to sample over a discrete distribution.
>>> See ?sample for that, as in:
>>>
>>> sample(1:4,100,rep=TRUE)
>>>
>>> ## or for this special case and faster
>>>
>>> sample.int(4,size=100,rep=TRUE)
>>
>>
>> Bert,
>>
>> I am not sure that it is really faster, since internally, sample() calls sample.int():
>>
>>> sample
>> function (x, size, replace = FALSE, prob = NULL)
>> {
>>     if (length(x) == 1L && is.numeric(x) && x >= 1) {
>>         if (missing(size))
>>             size <- x
>>         sample.int(x, size, replace, prob)
>>     }
>>     else {
>>         if (missing(size))
>>             size <- length(x)
>>         x[sample.int(length(x), size, replace, prob)]
>>     }
>> }
>>
>>
>> set.seed(1)
>>
>>> system.time(x1 <- sample(1e10, 1e8, replace = TRUE))
>>    user  system elapsed
>>   2.755   0.170   2.925
>>
>>
>> set.seed(1)
>>> system.time(x2 <- sample.int(1e10, 1e8, replace = TRUE))
>>    user  system elapsed
>>   2.767   0.183   2.951
>>
>>
>>> all(x1 == x2)
>> [1] TRUE
>>
>>
>> Regards,
>>
>> Marc
>>
>>
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>   -- Clifford Stoll
>>>
>>>
>>> On Wed, Sep 16, 2015 at 10:11 AM, thanoon younis
>>> <thanoon.younis80 at gmail.com> wrote:
>>>> Dear R- users
>>>>
>>>> I want to generate ordered categorical variable vector with 200x1 dimension
>>>> and from 1 to 4 categories and i tried with this code
>>>>
>>>> Q1=runif(200,1,4) the results are not just 1 ,2 3,4, but the results with
>>>> decimals like 1.244, 2.342,4,321 and so on ... My question how can i
>>>> generate a vector and also a matrix with orered categorical variables and
>>>> without decimals just 1,2,3 ,4 ,1,2,3,4, ....
>>>>
>>>> Many thanks in advance
>>



More information about the R-help mailing list