[R] Regression model for predicting ranks of the dependent variable

David Winsemius dwinsemius at comcast.net
Mon Sep 16 21:33:31 CEST 2013


On Sep 16, 2013, at 10:53 AM, Saumya Gupta wrote:

> I have a training dataset which contains statistics of football players for the year 2009, and their ranks for the year 2010. For example:
> 

RHelp is not the place to ask for help on homework or Kaggle challenges.

Read:

> https://stat.ethz.ch/mailman/listinfo/r-help
> 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

-- 
David Winsemius.

> 
>  Player
>  No. of goals
>  No. of matches
>  Age
>  Rank (in 2010)
> 
> 
>  A
>  5
>  1
>  35
>  1
> 
> 
> The above ranks have been calculated on the basis of a 'score' (which is unknown) given to each player, which is a function of the 3 variables. It could be something like:Score = (No. of goals/No. of matches) - Age^(1/2)After you arrange the scores in descending order, you get the ranks. The scores are not known and neither is the formula for calculating it. Only the ranks are given.
> Now, I have statistics for football players for the year 2013, and I have to predict their ranks for the year 2014, which should give the same result if the formula used in 2010 were used. Calculating their scores is not necessary and even finding out the formula is not the objective. The objective is just to predict their ranks. But, finding the exact formula for calculating scores will be a bonus.
> Date: Mon, 16 Sep 2013 10:20:08 -0600
> Subject: Re: [R] Regression model for predicting ranks of the dependent variable
> From: 538280 at gmail.com
> To: saumya.gupta at outlook.com
> CC: r-help at r-project.org
> 
> What question (or questions) are you trying to answer?  Any advice we may give will depend on what you are trying to accomplish.
> 
> On Sat, Sep 14, 2013 at 2:12 PM, Saumya Gupta <saumya.gupta at outlook.com> wrote:
> 
> I have a dataset which has several predictor variables and a dependent variable, "score" (which is numeric). The score for each row is calculated using a formula which uses some of the predictor variables. But, the "score" figures are not explicitly given in the dataset. The scores are only arranged in ascending order, and the ranks of the numbers are given (like 1, 2, 3, 4, etc.; rank 1 means that the particular row had the highest score, 2 means it had the second highest score and so on). So, if the data has 100 rows, the output has ranks from 1 to 100.
> 
> 
> I don't think it would be proper to treat the output column as a numeric one, since it is an ordinal variable, and the distance (difference in scores) between ranks 1 and 2 may not be the same as that between ranks 2 and 3. However, most R regression models for ordinal regression are made for output such as (high, medium, low), where each level of the output does not necessarily correspond to a unique row. In my case, each output (rank) corresponds to a unique row.
> 
> 
> So please suggest me what models I could use for this problem. Will treating the output as numeric instead of ordinal be a reasonable approximation? Or will the usual models for ordinal regression work on this dataset as well?
> 
> 
>        [[alternative HTML version deleted]]
> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list