[R] help with line graphs - rather lengthy to explain need

David L Carlson dc@rl@on @ending from t@mu@edu
Fri Nov 30 19:46:45 CET 2018


Reformatting helps because your spreadsheet as currently designed is not R-friendly or tidy. R data structures include vectors, matrices, data.frames, and lists. If you try to create your own structure you are just creating problems for yourself.

Your numeric data are a matrix - all numbers (but they could as well be all character data). They could also be viewed as a vector if you stacked all of the columns on top of one another. Your first two rows are vectors, but location is not numeric (and it is not clear if sample number is to be treated as numeric or character, e.g. sample 0056 would have to be treated as character whereas 56 could be numeric). 

A data.frame is a collection of vectors with headings (column names) and different columns can be different types, e.g. some numeric and some character, BUT all of the values in a column must be the same type. You could make this work if you combined location and sample number into a single row, but if you want to keep them separate, your spreadsheet cannot be converted into a data frame. If you try to read your data into R Commander, it will probably treat the first row (sample numbers) as column names. The second row is characters so R will convert all of your measurements to character strings (and then probably to factors). The mess that you are complaining about.

R Commander is helpful and useful, but it only helps if you use the data structures that R provides. You can also type commands into the script window in R Commander if you need to do something that is not available on the menus. If you want to use R, you are really going to have to invest a bit of time understanding how the program works. There are many free resources to help you learn more about R. 

David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: Robert D. Bowers M.A. [mailto:rdbowers using mail.usf.edu] 
Sent: Friday, November 30, 2018 10:48 AM
To: David L Carlson <dcarlson using tamu.edu>
Subject: Re: [R] help with line graphs - rather lengthy to explain need

I'm not really sure how re-formatting it like that would help - IMO that 
doesn't make sense - but then, I also have to admit that I learned 
programming LONG before I learned statistics (1974 vs 2006/2008) and 
tend to think in terms of arrays (and spreadsheets) when working with 
data - and really don't understand the difference between ggplot and the 
"standard" plotting found in Rcmdr (which can't handle more than a few 
cases - a few samples).

I've been using Rcmdr in this because it simplifies a lot of the steps, 
and is closer to the formal statistics software I studied in school.

Part of my problem is the learning curve - and I really don't have the 
time to try to re-learn a lot of the things I studied a few years ago 
(when I first experienced R and studied it on my own).  I've not done 
much statistical stuff in the last couple of years... I've been working 
on other aspects of my research (including gathering samples and 
generating data).

Matplot is a new one for me - thanks for mentioning it.  Maybe that will 
do what I want.  I'll look at it and see what it can do (and how to get 
the data properly into it - a problem I've encountered because I think 
so 'old-fashioned').

Bob

On 11/29/18 8:37 PM, David L Carlson wrote:
> I'm not sure we have enough details to answer your question, but you may need to think about organizing your spreadsheet differently. Perhaps one sheet that has just the data and a second sheet that has the sample number and the location. Import those separately into R.
>
> Your data are in wide format so matplot() would work for what you want to do, but ggplot may easier if you organize them in long format - one long column of readings, one column of sample numbers (repeated for each of the 2048 measurements from a single sample (and the same for the location column).
>
> If this doesn't put you on the right track, give us a .csv file of a subset of the data (e.g. 10 columns and 20 rows) to play with. You can just copy/paste it into your message. If you save it as an attachment, rename the extension to .txt so the list processor does not strip it out.
>
> David L. Carlson
> Department of Anthropology
> Texas A&M University
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Robert D. Bowers M.A.
> Sent: Thursday, November 29, 2018 3:24 PM
> To: r-help using r-project.org
> Subject: [R] help with line graphs - rather lengthy to explain need
>
> I am trying to figure out the best way to organize and plot data
> generated by a Excel spreadsheet (one driving a sample turntable and
> collecting optical spectra).
>
> The output of the equipment and software is an excel spreadsheet with
> sample numbers in the first row, and in the first column there is the
> wavelength in nm.  2048 individual measurements (per wavelength) - 2048
> rows plus the sample number row, and at present I've tested 250 samples,
> with a LOT more to follow.
>
> After I get the spreadsheet, I add a row (just below the sample numbers)
> containing site locations.  I've collected 50 samples per site (each
> assigned a different number), so far 5 sites.  The spreadsheet ends up
> with 2050 rows, 250 columns.
>
> What I want to do is generate a line graph of the data (which could be
> separated out into sections of the optical spectrum), with line colors
> assigned by the site name.  Once that's done, the graphs make sense
> (right now the only way I can do that is using the spreadsheet software,
> and assigning each line the color manually - a very tiresome and
> time-consuming process).
>
> So far, I've tried everything I can to get a graph out using R, without
> luck.  I'm rusty with R and programming... I've used Rcmdr (tried
> transposing data, various settings and so on) and 'played' with ggplot -
> no success.  I'm using Rcmdr to make it easier to work out the bugs,
> then will write a short program to process data.
>
> What I'd like to know is (1) what would be the best way to organize the
> data - sample numbers (cases) in the first row, or in the first column
> with the next row or column being the site name, (2) how would I get
> ggplot to plot the line graph showing all of the samples (number listing
> not important) and all (or a selection) of the different wavelengths,
> while assigning line color based on site name.  Once that's done, I can
> show the within-group vs between-group variation compared to wavelength.
>
> To give an idea of what the data look like:
>
> (name = Longwave)
>
> Sample     34900   34901  34902    34903    34904    (and so on)
>
> Site            Tp         Tc          Cr           Ws Gs
>
> 200(nm)    300.5    783.9    101.3      623.8     1385.7
>
> 201....
>
> You get the idea.  (maximum measurement value is 4098, the instrument
> takes multiple scans and averages them).
>
> If I can figure this out, it will speed up my work - which I need to do
> so I can get a grant proposal off on time.
>
> Thank you,
>
> Bob
>
> Doctoral Candidate, Applied Anthropology
>
> University of South Florida
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list