[R] [FORGED] Re: remove

Bert Gunter bgunter.4567 at gmail.com
Sun Feb 12 17:19:25 CET 2017


My understanding was that the discordant names has been identified. So
in the example the OP gave, removing rows with first = "Alex" is done
by:

df[df$first !="Alex",]

If that is not the case, as others have pointed out, various forms of
tapply() (by, ave, etc.) can be used. I agree that that is not so
"basic," so I apologize if my understanding was incorrect.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Feb 11, 2017 at 10:04 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>
> On 12/02/17 18:36, Bert Gunter wrote:
>>
>> Basic stuff!
>>
>> Either subscripting or ?subset.
>>
>> There are many good R tutorials on the web. You should spend some
>> (more?) time with some.
>
>
> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't seem
> basic to me.  The only way that I can see how to go at it is via
> a for loop:
>
> rdln <- function(X) {
> # Remove discordant last names.
>     ok <- logical(nrow(X))
>     for(nm in unique(X$first)) {
>         xxx <- unique(X$last[X$first==nm])
>         if(length(xxx)==1) ok[X$first==nm] <- TRUE
>     }
>     Y <- X[ok,]
>     Y <- Y[order(Y$first),]
>     rownames(Y) <- 1:nrow(Y)
>     Y
> }
>
> Calling the toy data frame "melvin" rather than "df" (since "df" is the name
> of the built in F density function, it is bad form to use it as the name of
> another object) I get:
>
>> rdln(melvin)
>   first week last
> 1   Bob    1 John
> 2   Bob    2 John
> 3   Bob    3 John
> 4  Cory    1 Jack
> 5  Cory    2 Jack
>
> which is the desired output.  If there is a "basic stuff" way to do this
> I'd like to see it.  Perhaps I will then be toadally embarrassed, but they
> say that this is good for one.
>
> cheers,
>
> Rolf
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>> On Sat, Feb 11, 2017 at 9:02 PM, Val <valkremk at gmail.com> wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks. Somehow
>>> during the recording periods, their last name was misreported.   For
>>> each person,   the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df <- read.table(header=TRUE, text='first  week last
>>> Alex    1  West
>>> Bob     1  John
>>> Cory    1  Jack
>>> Cory    2  Jack
>>> Bob     2  John
>>> Bob     3  John
>>> Alex    2  Joseph
>>> Alex    3  West
>>> Alex    4  West ')
>>>
>>> Desired output
>>>
>>>       first  week last
>>> 1     Bob     1   John
>>> 2     Bob     2   John
>>> 3     Bob     3   John
>>> 4     Cory     1   Jack
>>> 5     Cory     2   Jack



More information about the R-help mailing list