[R] data.frame and subsetting problem

Don MacQueen macq at llnl.gov
Sun Jun 17 00:57:41 CEST 2007


I too have no idea what the object named "x2" is, or where it came 
from. Particularly since after your use of subset(), the new 
dataframe, y, *does* include a row where V2 = 'color'.

But I have a guess at what your problem may be.

In your original dataframe ("x") the first and second columns are 
factors, because that is the default behavior of read.delim().

Factors have levels. The second column has 5 levels. Try
   levels(x$V2)
to see.

When you use subset(), you get fewer rows, but the fact that there 
were five levels is retained.

Then, the plot function sees that that there are five levels, and 
includes an empty place-holder for the level(s) with no data.

Try something like
    y <- data.frame(subset(x, V1 == "shirt"))
    y$V2 <- factor(unique(format(y$V2)))
to force it to get rid of the now-empty factor levels.

There are other ways to do this, I just don't happen to remember any 
of them at the moment.

If I'm right this is a question that comes up fairly often. Might 
even be in the FAQs.

-Don


At 12:15 PM -0700 6/16/07, Michelle Wynn wrote:
>I have read the R online help and wiki and I cannot seem to get something to
>work the way I need it to.
>
>I want to create a new data frame from an subset of an existing data frame
>which has no reference to the original superset.  If you following this
>example, what I am trying to do may make more sense.
>
>I have a file with values like this:
>
>shirt,size,40
>shirt,color,10
>shirt,length,10
>shirt,brand, 1
>shoes,style,5
>shoes,brand,4
>shoes,color,1
>
>and I read it into a dataframe like:
>x <- data.frame(read.delim("temp2.txt", sep=",", header=FALSE))
>
>I then want to plot just a subset of this data (say shirts only)...
>y <- data.frame(subset(x, V1 == "shirt"))
>plot(x2[,2:3])
>
>when I do, the resulting plot contains an empty value for 'color' even
>though my subset has no value in column V2 that equals 'color' anymore.
>
>Is it possible create a new data.frame that truly deletes the rows from the
>original data frame that I am excluding with the subset parameter?
>
>Thanks,
>Michelle
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.


-- 
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov



More information about the R-help mailing list