[R] subsetting problem with multiple criteria: Works in some but not all cases.

jim holtman jholtman at gmail.com
Thu Nov 1 21:27:50 CET 2007


I think the problem is with your use of "==" instead of "%in%"; try

matching <- subset(mydata[,c(j+1,j+7)], mydata[,j+1] %in% lone.word)



On 11/1/07, John Kane <jrkrideau at yahoo.ca> wrote:
> I am trying to compare some word lists which have an
> associate set of numbers. I want to compare word list
> aa with bb and find only those words which are
> unique to bb, then compare bb with cc, etc.
>
> I thought that I should be able to do this by using
> setdiff to get the unique words and then subset the
> data frame to get the unique names and corresponding
> numbers but I am misunderstanding something.
>
> When I run the code below a) I get lots of warning and
> b) I get the correct results for 4 of the 5
> comparisons. However the comparison of  three with
> four (cc,dd) gives me an empty subset.
>
> Can anyone point out my error or suggest a better way
> to do this?
> Thanks
>
> ============================================================================
>
> mydata  = data.frame(aa = Cs(cat, dog, horse, cow),
> bb = c("mouse", "dog", "cow", "pigeon"),
> cc  =c("emu", "rat", "crow", "cow"),
> dd = c("cow", "camel", "manatee", "parrot") ,
> ee = c( "coat", "hat", "dog", "camel") ,
> ff = c("knife","dog", "cow", "pigeon"),
> ann = c(1,2,3,4),
> bnn = c(5,6,7,8),
> cnn = c(9,10,11,12),
> dnn = c(13,14,15,16),
> enn = c(17,18,19,20),
> fnn = c(21,22,23,24))
>
> wordnames <- c("word", "number")
> word.list  <- rep(vector("list", 1), 5)
>
> for(j in 1:5) {
> lone.word <- setdiff(mydata[,j+1],mydata[,j]);
> lone.word
> matching <- subset(mydata[,c(j+1,j+7)],
> mydata[,j+1]==lone.word); matching
> word.list[[j]] <- matching; names(word.list[[j]])<-
> wordnames
> }
> word.list
>
> =============================================================================
> R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
> [1] Hmisc_3.4-2 gdata_2.3.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> lattice_0.17-1
>
>
> R version 2.6.0 (2007-10-03)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_Canada.1252;LC_CTYPE=English_Canada.1252;LC_MONETARY=English_Canada.1252;LC_NUMERIC=C;LC_TIME=English_Canada.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
> [1] Hmisc_3.4-2 gdata_2.3.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.9 grid_2.6.0     gtools_2.4.0
> lattice_0.17-1
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list