# [R] Working with data frames

William Dunlap wdunlap at tibco.com
Thu Dec 11 17:06:44 CET 2014

```Here is a reproducible example
> str(d)
'data.frame':   3 obs. of  2 variables:
\$ Name: Factor w/ 3 levels "Adam","Bob","Xavier": 2 3 1
\$ Age : int  2 25 1

Do you get something similar?  If not, show us what you have (you
could trim it down to a few columns).

Let's try some plots.
> plot(d\$Age)
This shows a plot of d\$Age (on y axis) vs "Index", where Index is
1:length(d\$Age).  The points are at (1,2), (2,25), and (3,1). You gave
plot() no information about what should be on the x axis so it gave
you the index numbers.

Now asking for d\$Name on the x axis and d\$Age on the y.
> plot(d\$Name, d\$Age)
This put the names, in alphabetical order on the x axis.  The y axis
ranges from about 0 to 25 and neither axis is labelled.  There are
thick horizontal line segments where you expect the the points to
be.  These are degenerate boxplots - when you ask to plot a
'factor' variable on the x axis and numbers on the y you get such
a plot.

Some folks suggested you avoid factors by adding stringsAsFactors=FALSE
> plot(d2\$Name, d2\$Age)
Error in plot.window(...) : need finite 'xlim' values
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf
You get no plot at all.

You can get closer to what I think you want with
with(d, {
plot(as.integer(Name), Age, axes=FALSE, xlab="Name")
axis(side=2) # draw the usual y axis
axis(side=1, at=seq_along(levels(Name)), lab=levels(Name))
})
If you want the names in a different order on the x axis, then reconstruct
the factor object d\$Name with a different order of levels.  E.g.,
d\$Name <- factor(d\$Name, levels=c("Xavier", "Bob", "Adam"))
and replot.

There are various plotting packages, e.g., ggplot2, that can make this
sort of thing easier, but I think the recommendation not to use factors
is wrong.  You do need to learn how to use them to your advantage.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Dec 11, 2014 at 5:00 AM, Sun Shine <phaedrusv at gmail.com> wrote:

> Hello
>
> I am struggling with data frames and would appreciate some help please.
>
> I have a data set of 13 observations and 80 variables. The first column is
> the names of different political area boundaries (e.g. MHad, LBNW, etc),
> the first row is a vector of variable names concerning various census data
> (e.g. age.T, hse.Unk, etc.). The first cell [1,1] is blank.
>
> run some analyses on this data frame. If I want to get a list of the names
> of the political areas (i.e. the first column), the result is a vector of
> numbers which appear to correlate with the factors, but I don't get the
> text names, just the corresponding number. So, if I want to plot something
> basic, like the area that uses the most gas for central heating, for
> example:
>
> > plot(data.set\$ch.Gas)
>
> The result is the y-axis gives the gas usage for the areas, but the x-axis
> gives only the numbers of the areas, not the names of the areas (which is
> preferred).
>
> So, two questions:
>
> (1) have I set up my csv file correctly to be read as a data frame as the
> first row of all of the remaining columns with the values for that
> political area in the corresponding row in the column with the specific
> variable name? So far, looking through tutorials and books seems to suggest
> yes, but at this point I'm no longer sure.
>
> (2) How can I access the names of the political areas when plotting so
> that these are given on the x-axis instead of the numbers?
>
> Thanks for any help.
>
> Cheers
> Sun
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help