[R] Curve Fitting/Regression with Multiple Observations

Liaw, Andy andy_liaw at merck.com
Fri Apr 30 13:52:35 CEST 2010


You may want to run 

RSiteSearch("monotone splines")

at the R prompt.  The 3rd hit looks quite promising.  However, if I 
understand your data, you have multiple y values for the same x
values.  If so, can you justify inverting the regression function?

The traffic on this mailing list is very high, and the signal to
noise ratio is rather low.  This has the tendency of burning out
those who started with good intentions to help.

Andy 

From: Kyeong Soo (Joseph) Kim
> 
> Dear Keith,
> 
> Thanks for the suggestion and taking your time to respond to it.
> 
> But, you misunderstand something and seems that you do not read all my
> previous e-mails.
> For instance, can a hand-drawing curve give you an inverse function
> (analytically or numerically) so that you can find an x value given
> the y value (not just for one, but for hundreds of points)?
> 
> As for the statistical inferences, I admit that my communications were
> not that very clear. My intention is to get a smoothed curve from the
> simulation data in a statistically meaningful way as much as possible
> for my intended use of the resulting curve.
> 
> As said before, I don't know all the thorough theoretical details
> behind regression and curve fitting functions available in R (know the
> basics though as one with PhD in Elec. Eng. unlike someone's
> assessment), but am doing my best to catch up reading textbooks and
> manuals, and posting this question to this list is definitely a way to
> learn from many experts and advanced users of R.
> 
> By the way, I wonder why most of the responses I've received from this
> list are so cynical (or skeptical?) and in some sense done in a quite
> arrogant way. It's very hard to imagine that one would receive such
> responses in my own areas of computer simulation and optical
> communications/networking. If a newbie asks a question to the list not
> making much sense or another FAQ, that is usually ignored (i.e., no
> response) because all we are too busy to deal with that. Sometimes,
> though, a kind soul (like Gabor) takes his/her own valuable time and
> doesn't mind explaining all the details from simple basics.
> 
> Again, what I want to hear from the list is the proper use of
> regression/curve fitting functions of R for my simulation data with
> replications: Applying after taking means or directly on them? So far
> I haven't heard anyone even specifically touching my question,
> although there were several seemingly related suggestions.
> 
> Regards,
> Joseph
> 
> On Fri, Apr 30, 2010 at 4:25 AM, kMan <kchamberln at gmail.com> wrote:
> > Dear Joseph,
> >
> > If you do not need to make any inferences, that is, you 
> just want it to look pretty, then drawing a curve by hand is 
> as good a solution as any. Plus, there is no reason for 
> expert testimony to say that the curve does not mean anything.
> >
> > Sincerely,
> > KeithC.
> >
> > -----Original Message-----
> > From: Kyeong Soo (Joseph) Kim [mailto:kyeongsoo.kim at gmail.com]
> > Sent: Tuesday, April 27, 2010 2:33 PM
> > To: Gabor Grothendieck
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Curve Fitting/Regression with Multiple Observations
> >
> > Frankly speaking, I am not looking for such a framework.
> >
> > The system I'm studying is a communication network (like 
> M/M/1 queue, but way too complicated to mathematically 
> analyze it using classical queueing theory) and the 
> conclusion I want to make is qualitative rather than 
> quantatitive -- a high-level comparative study of various 
> network architectures based on the "equivalence principle" (a 
> concept specific to netwokring, not in the general sense).
> >
> > What l want in this regard is a smooth, non-decreasing (hence
> > one-to-one) function built out of simulation data because 
> later in my processing, I need an inverse function of the 
> said curve to find out an x value given the y value. That 
> was, in fact, the reason I used the exponential (i.e., 
> non-decreasing function) curve fiting.
> >
> > Even though I don't need a statistical inference framework 
> for my work, I want to make sure that my use of 
> regression/curve fitting techniques with my simulation data 
> (as a tool for getting the mentioned curve) is proper and a 
> usual practice among experts like you.
> >
> > To get answer to my question, I digged a lot through the 
> Internet but found no clear explanation so far.
> >
> > Your suggestions and providing examples (always!) are much 
> appreciated, but I am still not sure the use of those 
> regression procedures with the kind of data I described is a 
> right way to do.
> >
> > Again, many thanks for your prompt and kind answers, Joseph
> >
> >
> > On Tue, Apr 27, 2010 at 8:46 PM, Gabor Grothendieck 
> <ggrothendieck at gmail.com> wrote:
> >> If you are looking for a framework for statistical 
> inference you could
> >> look at additive models as in the mgcv package which has  a book
> >> associated with it if you need more info. e.g.
> >>
> >> library(mgcv)
> >> fm <- gam(dist ~ s(speed), data = cars)
> >> summary(fm)
> >> plot(dist ~ speed, cars, pch = 20)
> >> fm.ci <- with(predict(fm, se = TRUE), cbind(0, -2*se.fit, 
> 2*se.fit) +
> >> c(fit)) matlines(cars$speed, fm.ci, lty = c(1, 2, 2), col = c(1, 2,
> >> 2))
> >>
> >>
> >> On Tue, Apr 27, 2010 at 3:07 PM, Kyeong Soo (Joseph) Kim
> >> <kyeongsoo.kim at gmail.com> wrote:
> >>> Hello Gabor,
> >>>
> >>> Many thanks for providing actual examples for the problem!
> >>>
> >>> In fact I know how to apply and generate plots using various R
> >>> functions including loess, lowess, and smooth.spline procedures.
> >>>
> >>> My question, however, is whether applying those 
> procedures directly
> >>> on the data with multiple observations/duplicate 
> points(?) is on the
> >>> sound basis or not.
> >>>
> >>> Before asking my question to the list, I checked 
> smooth.spline manual
> >>> pages and found the mentioning of "cv" option related 
> with duplicate
> >>> points, but I'm not sure "duplicate points" in the manual has the
> >>> same meaning as "multiple observations" in my case. To 
> me, the manual
> >>> seems a bit unclear in this regard.
> >>>
> >>> Looking at "car" data, I found it has multiple points 
> with the same
> >>> "speed" but different "dist", which is exactly what I mean by
> >>> multiple observations, but am still not sure.
> >>>
> >>> Regards,
> >>> Joseph
> >>>
> >>>
> >>> On Tue, Apr 27, 2010 at 7:35 PM, Gabor Grothendieck
> >>> <ggrothendieck at gmail.com> wrote:
> >>>> This will compute a loess curve and plot it:
> >>>>
> >>>> example(loess)
> >>>> plot(dist ~ speed, cars, pch = 20)
> >>>> lines(cars$speed, fitted(cars.lo))
> >>>>
> >>>> Also this directly plots it but does not give you the 
> values of the
> >>>> curve separately:
> >>>>
> >>>> library(lattice)
> >>>> xyplot(dist ~ speed, cars, type = c("p", "smooth"))
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Apr 27, 2010 at 1:30 PM, Kyeong Soo (Joseph) Kim
> >>>> <kyeongsoo.kim at gmail.com> wrote:
> >>>>> I recently came to realize the true power of R for statistical
> >>>>> analysis -- mainly for post-processing of data from large-scale
> >>>>> simulations -- and have been converting many of existing
> >>>>> Python(SciPy) scripts to those based on R and/or Perl.
> >>>>>
> >>>>> In the middle of this conversion, I revisited the 
> problem of curve
> >>>>> fitting for simulation data with multiple observations resulting
> >>>>> from repetitions.
> >>>>>
> >>>>> In the past, I first processed simulation data (i.e., 
> multiple y's
> >>>>> from repetitions) to get a mean with a confidence interval for a
> >>>>> given value of x (independent variable) and then applied spline
> >>>>> procedure for those mean values only (i.e., unique 
> pairs of (x_i,
> >>>>> y_i) for i=1, 2, ...) to get a smoothed curve. Because of rather
> >>>>> large confidence intervals, however, the resulting curves were
> >>>>> hardly smooth enough for my purpose, I had to fix the 
> function to
> >>>>> exponential and used least square methods to fit its 
> parameters for data.
> >>>>>
> >>>>> >From a plot with confidence intervals, it's rather 
> easy for one to
> >>>>> visually and manually(?) figure out a smoothed curve for it.
> >>>>> So I'm thinking right now of directly applying spline 
> (or whatever
> >>>>> regression procedures for this purpose) to the 
> simulation data with
> >>>>> repetitions rather than means. The simulation data in this case
> >>>>> looks like this (assuming three repetitions):
> >>>>>
> >>>>> # x    y
> >>>>> 1      1.2
> >>>>> 1      0.9
> >>>>> 1      1.3
> >>>>> 2      2.2
> >>>>> 2      1.7
> >>>>> 2      2.0
> >>>>> ...      ....
> >>>>>
> >>>>> So my idea is to let spline procedure handle the fluctuations in
> >>>>> the data (i.e., in repetitions) by itself.
> >>>>> But I wonder whether this direct application of spline 
> procedures
> >>>>> for data with multiple observations makes sense from the
> >>>>> statistical analysis (i.e., theoretical) point of view.
> >>>>>
> >>>>> It may be a stupid question and quite obvious to many, but
> >>>>> personally I don't know where to start.
> >>>>> It would be greatly appreciated if anyone can shed a 
> light on this
> >>>>> in this regard.
> >>>>>
> >>>>> Many thanks in advance,
> >>>>> Joseph
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, 
> reproducible code.
> >>>>>
> >>>>
> >>>
> >>
> >
> >
> >
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list