[R] Selecting names with regard to visit frequency

arun smartpink111 at yahoo.com
Wed Jul 24 00:39:00 CEST 2013

```Hi Michael,
It could be due to some extra space.  If you use read.table(..., fill=TRUE), it should read.  Then, there would be missing values.  Using ?dput() will be better.

dput(df1)
structure(list(x = c(2L, 5L, 4L, 6L, 24L, 7L, 12L, 3L, 5L)), .Names = "x", class = "data.frame", row.names = c("A1",
"A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9"))
Now, try the code by assigning:
df1<- structure(list(x.....

It wouldn't work with decimals because here:
3:5
#[1] 3 4 5 #it will matching all values that are 3,4, and 5

Trying this on another dataset:

df2<- structure(list(x = c(2, 5, 4.4, 6, 24, 7, 12, 3.6, 5)), .Names = "x", class = "data.frame", row.names = c("A1",
"A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9"))
vec2<- unlist(df2)
names(vec2)<- row.names(df2)
vec2
#  A1   A2   A3   A4   A5   A6   A7   A8   A9
# 2.0  5.0  4.4  6.0 24.0  7.0 12.0  3.6  5.0
names(vec2)[vec2%in% 3:5] #incorrect
#[1] "A2" "A9"

names(vec2)[vec2%in% seq(3,5,by=0.1)]
#[1] "A2" "A3" "A8" "A9"

#If I change
vec2[3]<- 4.46
names(vec2)[vec2%in% seq(3,5,by=0.1)]
#[1] "A2" "A8" "A9"
names(vec2)[round(vec2,1)%in% seq(3,5,by=0.1)]
#[1] "A2" "A3" "A8" "A9"

names(vec2)[vec2>=3 & vec2<=5] #should be better in such cases
#[1] "A2" "A3" "A8" "A9"

It is also better to check R FAQ 7.31.

A.K.

Hi Arun,
Perhaps these are dataframes I am working with, and have mistaken
them for vectors (I am still very new at this and learning the data
structures).

I tried to read the text in as you have it here (copied and pasted), but it did not work.
Error in read.table(text = " \n\"\",\"x\" \n\"A1\",2 \n\"A2\",5
\n\"A3\",4 \n\"A4\",6 \n\"A5\",24 \n\"A6\",7 \n\"A7\",12 \n\"A8\",3
\n\"A9\",5 \n",  :
more columns than column names

I retried both:
names(vec1)[vec1%in% 3:5]

&

names(vec1)[!is.na(match(vec1,3:5))]

before and after processing my current dataframe to a vector but
I get a NULL return. I also get a NULL return if I unlist the dataframe
and try to execute:
names(vec1)[vec1>=3 & vec1<=5]

All 3 do work if I keep the dataframe in its original form, instead of using:
vec1<-unlist(df1)
names(vec1)<- row.names(df1)

I discovered another issue, however. I am working with a couple
datasets, one of them has whole numbers the other has percentages in
place of visits such as:
"A1",0.2
"A2",0.5
...
the two options:

names(vec1)[vec1%in% 3:5]
names(vec1)[!is.na(match(vec1,3:5))]

do not seem to work with ranges given in decimals (and that is
probably what I originally tested them on) but are fine with whole
numbers.

Thanks,
steele

```