[R] select part of files from a list.files

Mon May 21 17:42:32 CEST 2012

Hi Jeff,

Does this work okay for you?

ST <- list(data.frame(a=1:10),
  data.frame(b=c(NA,NA,NA,NA,NA,6:10)),
  data.frame(c=c(1,NA,NA,4:10)),
  data.frame(d=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)),
  data.frame(e=c(1,2,3,4,NA,NA,7:9,NA)))

doit <- function(data, rows, minpresent) {
  if (sum(!is.na(data[rows, ])) >= minpresent) {
    data
  } else {NULL}
}

results <- lapply(ST, doit, rows = 1:5, minpresent = 2)
## print results
results

in your actual case, you would change to rows = 1:9000 and minpresent
= 1000.  You will have a list where each element is a dataset, and if
the dataset does not meet requirements, the element is NULL.

Hope this helps,

Josh

On Mon, May 21, 2012 at 8:32 AM, jeff6868
<geoffrey_klein at etu.u-bourgogne.fr> wrote:
> Hi everyone.
>
> I'm working on a list of files (about 50 files). I've listed them thanks to
> the function: list.files.
> Each of my files contains 35000 lines of data. These files may also contain
> some missing values NA (sometimes till 10 000 NAs following each other).
> The aim is to do some correlation matrices between these files (I already
> have the script). But as I have often missing values, the script doesn't
> work yet for all my files.
>
> In this topic, I would like to select a part of the data of these files
> before the correlation.
> In the files list I've created, I would like to select only the 9000 first
> lines of each of my files: myfiles[1:9000,1], and then, in these 9000 lines,
> I would like to keep only in my list the files which contains at least 1000
> non-NA lines (so numeric data) on my 9000 lines.
>
> I would like then to apply my script on this list of files which contains at
> least 1000 numeric data on the first 9000 lines of my whole data.
>
> I've created easy data.frames for the example, if someone could explain me
> how I can do this easily (at least 2 non NA values for the 5 first lines for
> example for these fake data.frames just here).
> Thank you very much!
>
> ST1 <- data.frame(a=1:10)
> ST2 <- data.frame(b=c(NA,NA,NA,NA,NA,6:10))
> ST3 <- data.frame(c=c(1,NA,NA,4:10))
> ST4 <- data.frame(d=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA))
> ST5 <- data.frame(e=c(1,2,3,4,NA,NA,7:9,NA))
>
> ( in this example, the aim is to keep only in the list.files: ST1, ST3 and
> ST5 because they all contains at least 2 non-NA values in the 5 first lines,
> and so to remove from the list.files ST2 and ST4 because they contain both
> too much NAs in the first 5 lines). Hope you've understood! Thanks again!
>
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/