[R] Random Forests: Question about R^2

Dimitri Liakhovitski ld7631 at gmail.com
Mon Apr 20 19:45:12 CEST 2009


I would like to summarize. Would you please confirm that my summary is
correct? Thank you very much!

Determining R^2 in Random Forests (for a Regression Forest):

1. For each individual case, record a mean prediction on the dependent
variable y across all trees for which the case is OOB (Out-of-Bag);
2. For each individual case, calculate a residual: residual = observed
y - mean predicted y (from step 1)
3. Calculate mean square residual MSE: MSE = sum of all individual
residuals (from step 2) / n
4. Because MSE/var(y) represents the proportion of y variance that is
due to error, then R^2 = 1 - MSE/var(y).

If it's correct, my last question would be:
I am getting as many R^2 as the number of trees because each time the
residuals are recalculated using all trees built so far, correct?

Thank you very much!
Dimitri


On Mon, Apr 13, 2009 at 6:22 PM, Liaw, Andy <andy_liaw at merck.com> wrote:
> Apologies: that should have been sum(residual^2)!
>
>> -----Original Message-----
>> From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com]
>> Sent: Monday, April 13, 2009 4:35 PM
>> To: Liaw, Andy
>> Cc: R-Help List
>> Subject: Re: [R] Random Forests: Question about R^2
>>
>> Andy,
>> thank you very much!
>> One clarification question:
>>
>> If MSE = sum(residuals) / n, then
>> in the formula (1 - mse / Var(y)) - shouldn't one square mse before
>> dividing by variance?
>>
>> Dimitri
>>
>>
>> On Mon, Apr 13, 2009 at 10:52 AM, Liaw, Andy
>> <andy_liaw at merck.com> wrote:
>> > MSE is the mean squared residuals.  For the training data, the OOB
>> > estimate is used (i.e., residual = data - OOB prediction, MSE =
>> > sum(residuals) / n, OOB prediction is the mean of
>> predictions from all
>> > trees for which the case is OOB).  It is _not_ the average
>> OOB MSE of
>> > trees in the forest.
>> >
>> > I hope there's no question about how the pseudo R^2 is computed on a
>> > test set?  If you understand how that's done, I assume the
>> confusion is
>> > only how the OOB MSE is formed.
>> >
>> > Best,
>> > Andy
>> >
>> > From: Dimitri Liakhovitski
>> >>
>> >> Dear Random Forests gurus,
>> >>
>> >> I have a question about R^2 provided by randomForest (for
>> regression).
>> >> I don't succeed in finding this information.
>> >>
>> >> In the help file for randomForest under "Value" it says:
>> >>
>> >> rsq: (regression only) - "pseudo R-squared'': 1 - mse / Var(y).
>> >>
>> >> Could someone please explain in somewhat more detail how
>> exactly R^2
>> >> is calculated?
>> >> Is "mse" mean squared error for prediction?
>> >> Is "mse" an average of mse's for all trees run on out-of-bag
>> >> holdout samples?
>> >> In other words - is this R^2 based on out-of-bag samples?
>> >>
>> >> Thank you very much for clarification!
>> >>
>> >> --
>> >> Dimitri Liakhovitski
>> >> MarketTools, Inc.
>> >> Dimitri.Liakhovitski at markettools.com
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> > Notice:  This e-mail message, together with any
>> attachments, contains
>> > information of Merck & Co., Inc. (One Merck Drive,
>> Whitehouse Station,
>> > New Jersey, USA 08889), and/or its affiliates (which may be known
>> > outside the United States as Merck Frosst, Merck Sharp & Dohme or
>> > MSD and in Japan, as Banyu - direct contact information for
>> affiliates is
>> > available at http://www.merck.com/contact/contacts.html) that may be
>> > confidential, proprietary copyrighted and/or legally
>> privileged. It is
>> > intended solely for the use of the individual or entity
>> named on this
>> > message. If you are not the intended recipient, and have
>> received this
>> > message in error, please notify us immediately by reply e-mail and
>> > then delete it from your system.
>> >
>> >
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> MarketTools, Inc.
>> Dimitri.Liakhovitski at markettools.com
>>
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates (which may be known
> outside the United States as Merck Frosst, Merck Sharp & Dohme or
> MSD and in Japan, as Banyu - direct contact information for affiliates is
> available at http://www.merck.com/contact/contacts.html) that may be
> confidential, proprietary copyrighted and/or legally privileged. It is
> intended solely for the use of the individual or entity named on this
> message. If you are not the intended recipient, and have received this
> message in error, please notify us immediately by reply e-mail and
> then delete it from your system.
>
>



-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com




More information about the R-help mailing list