[R] Calculate NAs from known data: how to?

Brian G. Peterson brian at braverock.com
Tue Oct 17 13:48:53 CEST 2006


Torleif Markussen Lunde wrote:
> In a dataset I have length and age for cod. The age, however, is ony
> given for 40-100% of the fish. What I need to do is to fill inn the NAs
> in a correct way, so that age has a value for each length. This is to be
> done for each sample seperately (there are 324 samples), meaning the NAs
> for sampleno 1 shall be calculated from the known values from sampleno 
1.
> 
> As for example length 55 cm can be both 4 and 5 years, I guess a fish
> with NA age and length 55 cm should be given a "random" age given a
> probability for example "55 cm = 4 years has a p=75%, while 55 cm = 4
> years has a p=25%". Those "p-values" should be calculated from the real
> data.
> 
> How can this be done in R, and what is the right way to do it?

Given the size of your sample, wouldn't it be more statistically valid to
set the age of the NA records to the mean age of records of matching
length?  I suppose you could also use resampling or a bootstrap, but I'm
not sure that adding randomization will give results that are any more 
statistically valid than using the mean.

Regards,

   - Brian



More information about the R-help mailing list