[R] R-help question

R. Michael Weylandt michael.weylandt at gmail.com
Tue Aug 14 07:11:28 CEST 2012


On Sun, Aug 12, 2012 at 10:58 PM, Louise Cowpertwait
<louisecowpertwait at gmail.com> wrote:
> Hi there,
>
> I have subscribed to R-help but am not sure how to view or post questions? I think this is the right way.

Indeed!

>
> I am planning on doing a multivariate regression investigating the relationship between depression (a continuous variable) and social support variables (mostly continuous, some categorical) among older people. I have a number of demographic and health-related variables that I am including as control variables. I have a large dataset from nearly 4,000 individuals.
>
> I need to check whether my data is 1) Missing at Random (MAR) and 2) Missing Completely At Random (MCAR).
>
> Here are three questions that I have related to this:
>
>
> 1) To check whether the data is MAR, I dichotomised a variable into missing and not missing, and checked for any significant differences in means (for continuous) or proportions (for categorical) of the other variables. I did this for each of the variables in my analysis. Is this correct?

Something like a classic chi-sq test or small sample analogue? Seems
somewhat reasonable, though you'll have to worry about making sure you
don't get tripped up by doing too many tests. See ?p.adjust.methods
for some references.

>
> 2) Because of the size of my dataset, relationships for my MAR analysis are coming up as significant when, practically, the differences in means or proportions are not meaningful. Is it acceptable for me to argue as such, and say that the data is effectively MAR despite statistical significance?

I'm not sure there is a "statistical" answer to that: it's going to
depend much more on the nature of your data set. Let your "meta"
knowledge of the source of missingness guide things here.

>
> Sorry this is not a question specifically to R (more of a stats question) so no problem if no-one can help, though it would be greatly appreciated.
>
> 3) I have no idea how to check whether the data is Missing Completely At Random in R. I think this involves seeing whether those who had missing data for one variable were more likely to have missing data in other variables? If so, I don't know how to do this. Or, I need to do an overall test like Little's test of missing completely at random. I have spent ages looking online and at packages and can't find anything.

You might want to check the rms package and the accompanying book
(Regression Modeling Strategies) by Frank Harrell. It has the best
coverage of MAR/MCAR/Imputation/etc that I've read on a "practical"
basis.

Cheers,
Michael

>
> Please help! I don't want to use SPSS!
>
> Cheers,
>
> Louise
>
>
>
>
>
> Louise Cowpertwait
> louisecowpertwait at gmail.com
> 021 258 9795
> Auckland, NZ
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list