[R] selecting a subset of files to be processed

(Ted Harding) Ted.Harding at wlandres.net
Sat Jul 28 20:32:29 CEST 2012

And, in addition to the tip from Rui (and similar from Joshua) below,
I would advise that there is one good reason not to try doing it
in "pure Linux".

The only source (that I know of) in Linux itself for random numbers
can be tapped by something like

  cat /dev/random > filename

/dev/random stores noise generated by the timings of system events
(keyboard presses, mouse-clicks, disk accesses, interrupts, etc.)
after subjecting them to a high-entropy stirring process. See:

  man random

It yields them in the form of random bytes (each of 8 random 0/1 bits)
and you would have to devise some means of coverting those onto a
form suitable for accessing a directory listing at random. Not a
pretty task!

There is also the command 'rand' available in the openSSL toolkit,
but that still outputs the results in the same format as /dev/random.

If you really want to do this outside R, the I would suggest writing
a little C program (to be run from the Linux command line). C can
do its own random number generation, with results returned as
real (double), and then apply these to select at random from the
contents of a file generated by something like

  ls filesdir > filelist.txt

and output the random selection.


On 28-Jul-2012 18:00:38 Rui Barradas wrote:
> Hello,
> If the files are to be processed in R select a random sample in R.
> Using list.files() you can assign a character vector with the filenames 
> of interest and then sample from that vector.
> ?list.files
> filenames <- list.files(path, pattern)
> rand.sampl <- sample(filenames, 45)
> Hope this helps,
> Rui Barradas
> Em 28-07-2012 18:49, Erin Hodgess escreveu:
>> Dear R People:
>> I am using a Linux system in which I have about 3000 files.
>> I would like to randomly select about 45 of those files to be processed in
>> R.
>> Could I make the selection in R or should I do it in Linux, please?
>> This is with R-2.15.1.
>> Thanks,
>> erin
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 28-Jul-2012  Time: 19:32:26
This message was sent by XFMail

More information about the R-help mailing list