[R] How might I work with a data.frame where each physical row represents several logical rows?

David Wolfskill r at catwhisker.org
Thu Jan 5 05:51:02 CET 2017


On Wed, Jan 04, 2017 at 08:33:46PM -0800, David Winsemius wrote:
>  ...
> Perhaps something like this:
> 
> # function to read the values in 'values':
>  parse_values <- function(x) {scan(text= gsub( "\\[|\\]","",x), sep=",") }
> 
> # the apply function reads line-by-line
>  new_dat <- apply(test_data, 1, function(d) data.frame( as.list(d[!names(d)  %in% "values"]), nvals <- parse_values(d['values']) ) )

Hmmm.... OK; that looks a lot better than the stuff that was coming to
my mind -- thanks! :-)

> ...
> # Could suppress the report from scan by adding quiet = TRUE
> # now take this list of 4 line data.frames and "rbind" them
> # If you wanted these to remain character you would use stringsAsFactors=FALSE in the data.frame call
> > new_df <- do.call("rbind", new_dat)

Aye.

> ...
> > (I will also end up collecting all of the records for a given timestamp
> > and hostname, and creating one very wide record with all of the data
> > from the set of records thus found.  I already have (yes, Perl) code to
> > do this -- though if there's a reasonable way to avoid that, I'm
> > interested.)
> 
> I thought you wanted the data in long form.

Sorry; I'm not understanding what you mean: My background is a lot more
toward systems administration than statistical analysis.

The repository I'm using has a rather large number of individual metrics
from a given server -- each provided on a separate row.  (That's why one
of the columns is called "name" -- it provides the (base) "name" of the
metric that corresponds to the "values" on the given row.)  I'll plan to
assemble the rows for a given server & timestamp into a single row --
thuse, I would have the tcp_connection_count for the "last ACK" state
and for the "fin_wait_2" state, as well as CpuSystem, CpuUser, CpuIdle,
... for the given server & timestamp on a single row (eventually).

> ...

Thanks again!

Peace,
david
-- 
David H. Wolfskill				r at catwhisker.org
Epistemology for post-truthers: How do we select parts of reality to ignore?

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 603 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20170104/98740ec4/attachment.bin>


More information about the R-help mailing list