[R] mismatch between match and unique causing ecdf (well, approxfun) to fail

Meyners, Michael meyners.m at pg.com
Tue Jun 9 13:38:14 CEST 2015


Thanks Martin. 
Yep, I understand it is documented and my code wasn't as it should've been -- the confusion comes from the fact that it worked ok for hundreds of situations that seem very much alike, but one situation breaks. I agree that you typically can't be sure about having only numerical data in the data frame, but I was sure I had by design (numeric results of simulations, so no factors or anything else) and was then sloppy in passing the rows of the data frame to ecdf. So wondering what makes this situation different from all the others I had... 
Anyway, point taken and working solution found, so all fine :-)
Cheers, Michael

> -----Original Message-----
> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
> Sent: Montag, 8. Juni 2015 16:43
> To: Meyners, Michael
> Cc: r-help at r-project.org
> Subject: Re: [R] mismatch between match and unique causing ecdf (well,
> approxfun) to fail
> 
> 
> > Aehm, adding on this: I incorrectly *assumed* without testing that
> rounding would help; it doesn't:
> > ecdf(round(test2,0)) 	# a rounding that is way too rough for my
> application...
> > #Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> >
> > Digging deeper: The initially mentioned call to unique() is not very helpful,
> as test2 is a data frame, so I get what I deserve, an unchanged data frame
> with 1 row. Still, the issue remains and can even be simplified further:
> >
> > > ecdf(data.frame(a=3, b=4))
> > Empirical CDF
> > Call: ecdf(data.frame(a = 3, b = 4))
> >  x[1:2] =      3,      4
> >
> > works ok, but
> >
> > > ecdf(data.frame(a=3, b=3))
> > Error in xy.coords(x, y) : 'x' and 'y' lengths differ
> >
> > doesn't (same for a=b=1 or 2, so likely the same for any a=b).
> > Instead,
> >
> > > ecdf(c(a=3, b=3))
> > Empirical CDF
> > Call: ecdf(c(a = 3, b = 3))
> >  x[1:1] =      3
> >
> > does the trick. From ?ecdf, I get that x should be a numeric vector -
> apparently, my misuse of the function by applying it to a row of a data frame
> (i.e. a data frame with one row). In all my other (dozens of) cases that
> worked ok, though but not for this particular one. A simple unlist() helps:
> 
> You were lucky.   To use a one-row data frame instead of a
> numerical vector will typically *not* work unless ... well, you are lucky.
> 
> No, do *not*  pass data frame rows instead of numeric vectors.
> 
> >
> > > ecdf(unlist(data.frame(a=3, b=3)))
> > Empirical CDF
> > Call: ecdf(unlist(data.frame(a = 3, b = 3)))
> >  x[1:1] =      3
> >
> > Yet, I'm even more confused than before: in my other data, there were
> also duplicated values in the vector (1-row-data frame), and it never caused
> any issue. For this particular example, it does. I must be missing something
> fundamental...
> >
> 
> well.. I'm confused about why you are confused, but if you are thinking
> about passing rows of data frames as numeric vectors, this means you are
> sure that your data frame only contains "classical numbers" (no factors, no
> 'Date's, no...).
> 
> In such a case, transform your data frame to a numerical matrix
> *once* preferably using  data.matrix(<d.fr>) instead of just
> as.matrix(<d.fr>) but in this case it should not matter.
> Then *check* the result and then work with that matrix from then on.
> 
> All other things probably will continue to leave you confused ..
> ;-)
> 
> Martin Maechler,
> ETH Zurich



More information about the R-help mailing list