[R] Working With Variables Having Different Lengths

Rich Shepard rshepard at appl-ecosys.com
Sat Oct 22 00:17:10 CEST 2011


On Fri, 21 Oct 2011, David Winsemius wrote:

> What problem are you trying to solve?

    What I need now is to compare TDS (total dissolved solids) with specific
conductivity and the ions that are normally comprise TDS. Before running any
regression models I need to look at these data from three points of view:
all data from all sites within each hydrographic drainage basin collected
during the past 30 years; average (or total) concentrations (not yet decided
on what makes the most ecological sense) within a stream having multiple
collection sites; and by site within certain streams.

   Here is the data frame structure:

   str(chemdata)
'data.frame':	47244 obs. of  6 variables:
   $ site    : Factor w/ 143 levels "BC-0.5","BC-1",..: 134 134 134 127 127
   $ sampdate: Date, format: "2006-12-06" "2006-12-06" ...
   $ param   : Factor w/ 66 levels "AGP","ANP","ANP/AGP",..: 58 66 12 24 59 66
   $ quant   : num  1.08e+04 7.95 1.80e-02 2.80e+02 1.90e+01 8.44 1.62e+03
   $ stream  : Factor w/ 24 levels "BCrk","CCrk",..: 4 4 4 21 21 21 4
   $ basin   : Factor w/ 2 levels "BasinEast","BasinWest": 1 1 1 1 1 1 1 1 1 2 ...

   While all the data sets used in the books I've read are simpler and well
illustrate the analyses presented, what I've not read is guidance on how
complex data sets could (or should) be partitioned into smaller but still
related data sets to facilitate analyses. Or, how I extract the relevant
rows and columns for specific analyses.

> That seems very unlikely. What we need is a clearer description of that
> values that your "param" variable can assume, and what you want to within
> categories of those values. We also need you to stop dropping context.

   There are 66 different chemicals in the param factor. However, for the
immediate effort, only 7 are needed. They are coded 'TDS', 'Cond', 'Mg',
'SO4', 'Cl', 'Na', and 'Ca'.

   From the database table I know the number of non-NULL (non-NA) rows for
each parameter:

 	TDS	2181
 	Cond	 820
 	Mg	1120
 	SO4	1980
 	Cl	1971
 	Na	 866
 	Ca	1110

   Not all were required to be measured at all sites from the beginning in
1981. I do not yet know how many rows have non-NULL values for the 6 pairs
compared with TDS.

   If there's more information to provide I'll gladly do so.

Thanks,

Rich



More information about the R-help mailing list