[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

Wed Aug 14 17:23:21 CEST 2013

(This is a repost from a little while ago. I assume my mail got silently bounced because I used some rather strange email routing. If it did get through, and I simply haven't seen it or a response, then please accept my apologies)
Hi,

I'm new to R, and new to statistics. I'm *trying* to learn R, but I'm struggling with the R-intro, mainly (I think) due to the fact that I have no background in stats, and some of the language is unfamiliar to me (I started with C and Perl, mainly) so I might use the wrong terms. I think the "R in action" book might help, but recommendations are welcome.

I have a whole bunch of network timings (ICMP echos) between different groups of nodes using two different networks. I want to compare the timings between the groups and across networks, as I /believe/ that one network has much greater variability than the other. I want to prove this, one way or the other, and I think a graphical view of the ~20000 results would help. The initial histograms/kernel densities I've produced so far support that theory (i.e. they look a bit like the Normal distribution, but one network is much more "stretched out" and "bumpy"), but I've resorted to pre-processing that data in Perl in order to produce the graphs. I think R can be used to do all of this in one.

For each network, I have files like this:

===
RoomA RoomB 0.34
RoomC RoomA 0.12
RoomB RoomA 0.12
===

The columns are: From, To, and Time taken. There are 4 rooms in total.
The data's unsorted, and there will be multiple pairs (i.e. I haven't done de-duplication of pairings via the handshake algorithm, I just pinged everything from everything). There will be multiple entries for each pairing.

The graphs I think I want to produce are:

For "From RoomA", overlay each timing graph for every other room. That means there will be 4 kernel densities (well actually I'd take a histogram plotted as a line, as I think that's more appropriate, and I don't know what a kernel density is) on one graph.
I'd also like to do the above for "From RoomB", "From RoomC", and "From RoomD", so I'd end up with with 4 graphs (all with the same xlim/ylim) each with 4 lines plotted. I'd eventually like those produced as vector Postscript for inclusion in a report, but I think I can handle that with ?postscript() and ?layout()

I've got as far as importing the data with
read.table("eth_ping_timings.dat", col.names=c("From", "To", "Time"))
Then I can do "standard" simple operations on Foo$Time. "Factoring" (if that is indeed the term) is where I fall down. I simply don't know how to break out the pairings.

Is R actually the way to go for this? I feel pretty confident I could cobble together some Perl which produces Postscript to describe the curves, but I suspect that once I produce what these graphs, I will immediately think of other questions to ask, and R sounds like it's the proper tool to ask those questions.

cheers
jack

________________________________

This email and any files transmitted with it are confide...{{dropped:10}}