[R] duplicated() with long vectors

Stephen Politzer-Ahles politzerahless at gmail.com
Wed Dec 5 23:04:12 CET 2012


Hi Sarah,

Thanks a lot for your explanation. I was mistakenly under the
impression that duplicated() only looked at immediately preceding
element, not all preceding elements.

What I was trying to do was get a vector saying, for each item,
whether that item is the same as the preceding item. Now that I think
of it, I could do this easily by copying the vector, shifting it over
one (by removing the first element and adding something to the end),
and then just compare the elements of the two vectors directly.

Best,
Steve

On Wed, Dec 5, 2012 at 3:08 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
> Hi,
>
> duplicated() doesn't just look at consecutive values, but anywhere in
> the object. Since your 12320-element vector has only 48 separate
> values, and all of them occur before the last 30 elements, so
> duplicated() returns TRUE.
>
> You might be looking for something involving rle(). What are you
> trying to accomplish?
>
> Sarah
>
> On Wed, Dec 5, 2012 at 3:53 PM, Stephen Politzer-Ahles
> <politzerahless at gmail.com> wrote:
>> Hello,
>>
>> duplicated() does not seem to work for a long vector. For example, if
>> you download the data from
>> https://docs.google.com/open?id=0B6-m45Jvl3ZmNmpaSlJWMXo5bmc (a vector
>> with about 12,000 numbers) and then run the following code which does
>> duplicated() over the whole vector but just shows the last 30
>> elements:
>>
>> data.frame( tail(verylong, 30), tail(duplicated(verylong), 30) )
>>
>> you'll see that at the end of the very long vector everything is
>> listed as a duplicate of the preceding element (even though it
>> shouldn't be). On the other hand, if you run the following code which
>> just takes out the last 30 elements of the vector and does duplicated
>> on them:
>>
>> data.frame( tail(verylong, 30), duplicated(tail(verylong, 30)) )
>>
>> you get the correct results (FALSE shows up wherever the value in the
>> first column changes). Does anyone know why this happens, and if
>> there's a fix? I notice the documentation for duplicated() says: "Long
>> vectors are supported for the default method of duplicated, but may
>> only be usable if nmax is supplied."  But I've tried running this with
>> a high value of nmax given, and it still gives me the same problem.
>>
>> So far the only way I've figured out to get this duplicated()-like
>> vector is to use a for loop going through one item at a time, but that
>> takes about a minute to run.
>>
>> Best,
>> Steve Politzer-Ahles
>>
>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org



-- 
Stephen Politzer-Ahles
University of Kansas
Linguistics Department
http://people.ku.edu/~sjpa/




More information about the R-help mailing list