[R] HOW TO FILTER DATA

MacQueen, Don macqueen1 at llnl.gov
Thu Jan 4 17:41:36 CET 2018


Just a couple of minor comments:

> help.search('read_delim')
No vignettes or demos or help files found with alias or concept or
title matching 'read_delim' using regular expression matching.

read_delim is not part of base R; it must come from some unnamed non-base package. I'd recommend using base R as much as possible for someone who is new to R, as I suspect the original poster is.

The call to subset would be better written as

  df_new <- subset(df, IPC == 'H04M001/02' | IPC == 'C07K016/26' )
instead of
  df_new <- subset(df, df$IPC == 'H04M001/02' | df$IPC == 'C07K016/26' )

IPC is a variable within the data frame, so it is unnecessary to include the data frame's name in the logical expression.

-Don


--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 1/3/18, 12:54 PM, "R-help on behalf of Leilei Ruan" <r-help-bounces at r-project.org on behalf of ruanleilei at gmail.com> wrote:

    Try the code below:
    
    
    df <- read_delim("C:/Users/lruan1/Desktop/1112.csv", "|", escape_double =
    FALSE, trim_ws = TRUE)
    
    df_new <- subset(df,df$IPC == 'H04M001/02'| df$IPC == 'C07K016/26' )
    
    You can add more condition with "|" in the subset function. Good luck!
    
    On Wed, Jan 3, 2018 at 2:53 PM, Saptorshee Kanto Chakraborty <
    chkstr at unife.it> wrote:
    
    > Hello,
    >
    > I have a data of Patents from OECD in delimited text format with IPC being
    > one column, I want to filter the data by selecting only certain IPC in that
    > column and delete other rows which do not have my required IPCs. Please,
    > can anybody guide me doing it, also the IPC codes are string variables.
    >
    > The data is somewhat like below, but its a huge dataset containing more
    > than 11 million rows
    >
    >
    > Appln_id|Prio_Year|App_year|IPC
    > 1|1999|2000|H04Q007/32
    > 1|1999|2000|G06K019/077
    > 1|1999|2000|H01R012/18
    > 1|1999|2000|G06K017/00
    > 1|1999|2000|H04M001/2745
    > 1|1999|2000|G06K007/00
    > 1|1999|2000|H04M001/02
    > 1|1999|2000|H04M001/275
    > 2|1991|1992|C12N015/62
    > 2|1991|1992|C12N015/09
    > 2|1991|1992|C07K019/00
    > 2|1991|1992|C07K016/26
    >
    >
    >
    > Thanking You
    >
    >         [[alternative HTML version deleted]]
    >
    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/
    > posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.
    >
    
    	[[alternative HTML version deleted]]
    
    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    



More information about the R-help mailing list