[R] reading long matrix

Fri Dec 23 03:15:50 CET 2005

One correction.  I had hard coded the last statement for testing
with the data provided.  Change it to this for generality:

result <- array(nums, c(nr, nc, n), c(NULL, NULL, L[breaks]))

On 12/22/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> One way to do this is to use read.fwf.  I have borrowed Jim's
> use of scan and use a similar calculation to get the indexes
> of the breaks, breaks.  We then determine the common number
> of rows and columns in each species.
>
> The second group of statements replaces all 9's with spaces
> so that upon parsing them as numbers they will be NAs and then sets
> up a text connection to the resulting character vector.  These are then
> read in by read.fwf, nr rows at a time and the result is
> unlist'ed to a numeric vector, nums.  The last statement
> reshapes it into an array and adds the species names as
> the last dimension names.
>
> # read data in
> L <- scan("clipboard", what = "")
> breaks <- grep("^[[:alpha:]]", L)
> nr <- breaks[2] - breaks[1] - 1; nc <- nchar(L[2])
>
> # parse numbers
> n <- length(L[-breaks]) / nr
> con <- textConnection(gsub("9", " ", L[-breaks]))
> nums <- unlist(replicate(n, read.fwf(con, widths = rep(1, nc), n = nr)))
> result <- array(nums, c(6,9,3), c(NULL, NULL, L[breaks]))
>
>
> On 12/22/05, Colin Beale <c.beale at macaulay.ac.uk> wrote:
> > Hi,
> >
> > I'm needing some help finding a function to read a large text file into an array in R. The data are essentially presence / absence / na data for many species and come as a grid with each species name (after two spaces) at the beginning of the matrix defining the map for that species. An excerpt could therefore be:
> >
> >  SPECIES1
> > 999001099
> > 900110109
> > 011101000
> > 901100101
> > 110100019
> > 901110019
> >
> >  SPECIES2
> > 999000099
> > 900110119
> > 011101100
> > 901010101
> > 110000019
> > 900000019
> >
> >  SPECIES3
> > 999001099
> > 900100109
> > 011100010
> > 901100100
> > 110100019
> > 901110019
> >
> > where 9 is actually na, 0 is absence and 1 presence. The final array I want to create should have dimensions that are the x and y coordinates and the number of species (known in advance). (In this example dim = c(9,6,3)). It would be sort of neat if the code could also read the species name into the appropriate names attribute, but this is a refinement that I could probably do if someone can help me read the data into R and into an array in the first place. I'm currently thinking a line by line approach using readLines might be the best option, but I've got a very long file - well over 100 species, each a matrix of 70 x 100 datapoints. making this option rther time consuming, I expect - especially as the next dataset has 1300 species and a much larger grid...
> >
> > Any hints would be gratefully recieved.
> >
> > Colin Beale
> > Macaulay Land Use Research Institute
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>