[R] Can I improve the efficiency of my scan() command?

Liaw, Andy andy_liaw at merck.com
Sat Apr 12 00:28:22 CEST 2003


> From: Pierre Kleiber [mailto:pkleiber at honlab.nmfs.hawaii.edu]
> 
> Ko-Kang Kevin Wang wrote:
[snipped]
> > 
> > It worked all right, but I'm just wondering if there is a 
> more efficient 
> > way (it takes about 10 minutes to run the above scripts, 
> for my 300,000 x 
> > 25 CSV file)?
> > 
> > For example, the CSV file has 25 columns but I don't need 3 
> of them (6, 7, 
> > and 22).  What I have done is to scan them in anyway, 
> convert the list 
> > into a data frame then remove the 3 columns.  Just wonder if it is 
> > possible to simply ignore them in scan() to make the process faster?
> > 
> 
> 
> It might not make a lot of difference in your case where you are
> reading many fields and want to ignore a few, but if you want to read
> a few out of many, it would help to preprocess the input file using,
> for example, awk as in the following, which would pick up fields 1, 2,
> and 4:
> 
>  > con <- pipe("awk -F , '{print $1,$3 $4}' ../Data/Rating.csv")
>  > rating <- scan(con, what = list(
> +                  usage = "",
> +                  mileage = 0,
> +                  excess = "")
> +            , quiet = TRUE, skip = 1)
>  > close(con)

Or even pipe("cut -d, -f1,3-4 ...")

Andy

> 
> I do this sort of thing a lot using various utilities; so I've defined
> the following function to take care of opening and closing the
> connection:
> 
> scanpipe <- function(x,...) {
>    con <- pipe(x)
>    out <- scan(con,...)
>    close(con)
>    out
> }
> 
> 
> -- 
> -----------------------------------------------------------------
> Pierre Kleiber             Email: pkleiber at honlab.nmfs.hawaii.edu
> Fishery Biologist                     Tel: 808 983-5399/737-7544
> NOAA FISHERIES - Honolulu Laboratory         Fax: 808 983-2902
> 2570 Dole St., Honolulu, HI 96822-2396
> -----------------------------------------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

------------------------------------------------------------------------------



More information about the R-help mailing list