[R] Delete the first instances of the unique values of a vector in R

Tunga Kantarcı tungakantarci at gmail.com
Tue Jan 10 22:39:48 CET 2017


Consider a data frame which I name as rwrdatafile. It includes several
variables stored in columns. For each variable there are 1000
observations and hence 1000 rows. The interest lies in the values of
the second column of this data frame, that is in rwrdatafile[,2]. What
I am trying to accomplish is to delete the rows of the data frame if
it is the first instance of a unique value in rwrdatafile[,2]. That
is, the values stored in rwrdatafile[,2] look like

1
4
4
4
4
4
4
6
6

and the routine should delete 1 (and the other values in that row),
the first 4 (and the other values in that row), and the first 6 (and
the other values in that row). I did an online search, and indeed
there are similar examples, but they did not help for what I am trying
to achieve. What is specific to what I am trying to achieve is that
the routine should use a for loop. I have written a routine that is
not using a for loop and it works fine and I paste it below
(Vector-oriented coding in R). I need to write a for loop that
accomplishes the same task. In fact, I have written this for loop but
it has a problem (Scalar-oriened coding in R pasted below). Note that
the data stored in rwrdatafile[,2] has three unique values (there are
more but for making the example that does not matter) which are 1, 4,
6. The for loop I have written first determines the number of unique
values in rwrdatafile[,2], with length(unique(rwrdatafile[,2])), and
uses that number in the sequence of the for loop. The length is 3 so
the sequence is 1:3. But there is a catch! When 1 is deleted (and
other values row wise), the length decreases to 2 but the for loop
attempts 3 and therefore it returns NULL at the end of the loop.
Therefore I subtract 1 from the length. But this is not good coding. I
wondered about the NULL result and it took me a while to figure out
the problem, and worse is that I could have never found the problem.
So the for loop here is not reliable because it requires that the user
knows that there are multiple instances of the unique values (so
multiple instances of 1). How can I fix the problem? The restriction I
have is that I need to keep the for loop and it should resemble the
for loop I have written for MATLAB (pasted below). The aim is to
translate the MATLAB routine as close as possible in R. So I do not
want to deviate (much) from the MATLAB version of the code because
otherwise I cannot compare the routines while I am teaching this. That
is, I need to use a function in the for loop in R that is as close as
possible to the find function (with the first option) of MATLAB.

# Scalar-oriented coding in R
length(unique(rwrdatafile[,2]))
for (i in 1:(.Last.value-1)){
  rwrdatafile = rwrdatafile[-(which(rwrdatafile[,2] ==
unique(rwrdatafile[,2])[i])[1]),]
}

# Vector-oriented coding in R
unique(rwrdatafile[,2])
tag = match(.Last.value,rwrdatafile[,2])
rwrdatafile = rwrdatafile[!row.names(rwrdatafile) %in% tag,]

# Scalar-oriented coding in MATLAB
unique(mwmatfile.data(:,2));
for i = ans'
    mwmatfile.data(find(mwmatfile.data(:,2) == i,1,'first'),:) = [];
end



More information about the R-help mailing list