[R] Read a dataset with different lengths

Gabor Grothendieck ggrothendieck at myway.com
Mon Mar 21 19:28:09 CET 2005


Xiyan Lon <xiyanlon <at> gmail.com> writes:

: 
: Dear useR again,
: How can I read a dataset if lines in dataset did not have same
: elements (have different lengths), For example:
: 
: 1    2,  4, 16,  1,  1,  3,  1,  1, 15,  5,  1,  1, 14,  1,  1
: 2    2, 13,  5,  1,  1,  3,  1,  1, 15,  5,  1,  1, 14,  1,  1
: 3    4,  5, 11,  1,  1,  6,  1,  1,  5, 14,  1,  1, 15,  1,  1
: 4    2,  5,  9,  1,  1, 14,  1,  1,  8, 16,  1,  1, 13,  1,  1
: 5    3,  7, 14,  1,  1, 14,  1,  1,  5, 21,  1,  1,  8,  1,  1
: 6            6,  3,  1, 12,  1,  1,  5,  8,  1,  1, 15,  1,  1
: 7            6,  3,  1, 11,  1,  1, 10,  7,  1,  1, 21,  1,  1
: 8           21, 20,  9,  1,  1,  6,  1,  1, 13, 10,  1,  1,  1
: 9    5,  7, 21,  1,  1, 13,  1,  1, 14,  2,  1,  1,  6,  1,  1
: 10   8, 14, 10,  1,  1,  5,  1,  1, 10,  5,  1,  1,  5,  1,  1
: 11   5, 20, 17,  1,  1, 19,  1,  1, 14,  7,  1,  1,  6,  1,  1
: 12   7,  4, 11,  1,  1,  2,  1,  1,  5, 13,  1,  1, 14,  1,  1
: 13   7, 14, 13,  1,  1,  6,  1,  1, 13, 16,  1,  1, 17,  1,  1
: 14   7, 14,  5,  1,  1,  5,  1,  1,  5, 17,  1,  1, 17,  1,  1
: 15           3,  9, 12,  1,  1, 18,  1,  1,  6,  1,  4,  1,  1
: 16   7, 10,  5,  1,  1, 12,  1,  1,  5, 17,  1,  1, 13,  1,  1
: 17  12,  8, 16,  1,  1,  5,  1,  1,  8, 10,  1,  1, 14,  1,  1
: 18   5, 11,  7,  1,  1,  5,  1,  1, 18, 13,  1,  1, 17,  1,  1
: 19   7, 13,  8,  1,  1, 14,  1,  1,  5, 17,  1,  1, 13,  1,  1
: 20   7, 18, 21,  1,  1, 16,  1,  1,  5, 17,  1,  1, 13,  1,  1
: 
: I know that in BioC package rmutil have a function (read.list) to
: handle different lengths sets of lines but it did not work.
: > library(rmutil)
: Error in library(rmutil) : 'rmutil' is not a valid package -- installed < 
2.0.0?
: > 

rmutil can be found here:
 http://popgen.unimaas.nl/~jlindsey/rcode.html

: 
: Are there any others function to handle this.



nf <- count.fields(myfile, sep = ",")
z <- read.table(myfile, sep = ",", fill = TRUE, colClass = rep(numeric(), nf))

If the first line is longest you can omit the colClass argument
and the nf computation.

The above returns a data frame with one line per row and NAs at the end
to fill it out as necessary.  If you need a list of rows without the
NAs:

lapply(as.data.frame(t(data.matrix(z))), na.omit)




More information about the R-help mailing list