[R] how to Subset based on partial matching of columns?

David L Carlson dcarlson at tamu.edu
Thu Apr 9 21:56:29 CEST 2015


>From Sarah's data frame you can get what you want directly with the table() function which will create a table object, mydf.tbl. If you want a data frame you need to convert the table using as.data.frame.matrix() to make mydf.df. Finally combine the two data frames if your x column consists of unique values in ascending order to make mydf.all.

> mydf.tbl <- table(mydf$x, mydf$code)
> mydf.tbl
   
    LGTY MY GM+ RS TY
  1    0      1  0  0
  2    1      0  0  0
  3    0      0  1  0
  4    0      0  0  1
> mydf.df <- as.data.frame.matrix(mydf.tbl)
> mydf.df
  LGTY MY GM+ RS TY
1    0      1  0  0
2    1      0  0  0
3    0      0  1  0
4    0      0  0  1
> mydf.all <- data.frame(mydf, mydf.df)
> mydf.all
  x   code LGTY MY.GM. RS TY
1 1 MY GM+    0      1  0  0
2 2   LGTY    1      0  0  0
3 3     RS    0      0  1  0
4 4     TY    0      0  0  1


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of samarvir singh
Sent: Thursday, April 9, 2015 8:50 AM
To: Sarah Goslee
Cc: r-help
Subject: Re: [R] how to Subset based on partial matching of columns?

Thank you. Sarah Goslee. I am rather new in learning R. So people like you
are great support. Really appreciate you, taking the time to correct my
mistakes. Thanks

On Thu 9 Apr, 2015 6:54 pm Sarah Goslee <sarah.goslee at gmail.com> wrote:

> Hi,
>
> Please don't put quotes around your code. It makes it hard to copy and
> paste. Alternatively, don't post in HTML, because it screws up your
> code.
>
> On Wed, Apr 8, 2015 at 8:57 PM, samarvir singh <samarvir1996 at gmail.com>
> wrote:
> > So I have a list that contains certain characters as shown below
> >
> > `list <- c("MY","GM+" ,"TY","RS","LG")`
>
> That's a character vector, not a list. A list is a specific type of object
> in R.
>
> > And I have a variable named "CODE" in the data frame as follows
> >
> > `code <- c("MY GM+", ,"LGTY", "RS","TY")`
>
> That doesn't work, and I have no idea what you expect to have there,
> so I'm deleting the extra comma. Also, your vector is named code, not
> CODE.
>
> code <- c("MY GM+", "LGTY", "RS","TY")
> x <- c(1:4)
>
> > 'x <- c(1:5)
> > `df <- data.frame(x,code)`
>
> You problably actually want
> mydf <- data.frame(x, code, stringsAsFactors=FALSE)
>
> Note I changed the name, because df() is a base R function.
>
>
> > Now I want to create 5 new variables named "MY","GM+","TY","RS","LG"
> >
> > Which takes binary value, 1 if there's a match case in the CODE variable
> >
> >     df
> >      x  code         MY GM+ TY RS LG
> >     1  MY GM+  1     1      0    0   0
> >     2                  0     0      0    0   0
> >     3  LGTY       0     0     1     0   1
> >     4  RS           0     0      0    1    0
> >     5  TY           0     0      1    0    0
>
> grepl() will give you a logical match
>
> data.frame(mydf, sapply(code, function(x)grepl(x, mydf$code)),
> stringsAsFactors=FALSE, check.names=FALSE)
>
> Sarah
>
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list