[R] Maintaining data order in factanal with missing data

PIKAL Petr petr.pikal at precheza.cz
Fri Jul 26 16:06:18 CEST 2013


Hi

There are probably better options but

merge(data.frame(x=1:154),data.frame(x=names(ab.1.fa[[1]]), y=ab.1.fa[[1]]), all.x=T)

gives you data frame with NA when there was missing value in the first data.frame.

You probably can automate the process a bit with nrow function.

Regards
Petr



> -----Original Message-----
> From: Justin Delahunty [mailto:ACU at genius.net.au]
> Sent: Friday, July 26, 2013 3:34 PM
> To: PIKAL Petr; 'Justin Delahunty'; 'Justin Delahunty'; r-help at r-
> project.org
> Subject: RE: [R] Maintaining data order in factanal with missing data
> 
> Hi Petr,
> 
> So sorry, I accidentally attached the complete data set rather than the
> one with missing values. I've attached the correct file to this email.
> RE:
> init.dfs() being local, I hadn't even thought of that. I've been away
> from OOP for close to 15 years now, so it might be time to revise!
> 
> The problem I have is that with missing values the list of factor
> scores returned (ab.w1.fa$factor.scores) does not map onto the
> originating data frame (ab.w1.df) as it no longer includes the cases
> which had missing values. So while the original data set for ab.w1.df
> contains 154 ordered cases, the factor analysis contains only 150.
> 
> I am seeking a way to map the values derived from the factor analysis
> (ab.w1.fa$factor.scores) back to their original ordered position, so
> that these factor score variables may be merged back into the master
> data frame (ab.df). A unique ID for each case is available ($dmid)
> which I had thought to use when merging the new variables, however I
> don't know how to implement this.
> 
> 
> Thanks for your help,
> 
> Justin
> 
> 
> -----Original Message-----
> From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> Sent: Friday, 26 July 2013 10:59 PM
> To: Justin Delahunty; Justin Delahunty; r-help at r-project.org
> Subject: RE: [R] Maintaining data order in factanal with missing data
> 
> Hi
> 
> Well, the function init.dfs does nothing as all data frames created
> inside it does not propagate to global environment and there is nothing
> what the function returns.
> 
> Tha last line (when used outside a function) gives warnings but there
> is no sign of error.
> 
> When
> 
> > head(ab.1.df)
>   dmid   g5oab2      g53      g54      g55   g5ovb1
> 1    1 1.418932 1.805227 2.791152 3.624116 3.425586
> 2    2 2.293907 1.187830 1.611237 1.748526 3.816533
> 3    3 2.836536 2.679523 1.279639 2.674986 2.452395
> 4    4 1.872259 3.278359 1.785872 2.458315 1.146480
> 5    5 1.467195 1.180747 3.564127 3.007682 2.109506
> 6    6 3.098512 3.151974 3.969379 3.750571 1.497358
> > head(ab.2.df)
>   dmid   w2oab3      w22      w23      w24   w2ovb1
> 1    1 4.831362 5.522764 7.809366 6.969172 7.398385
> 2    2 6.706346 4.101742 1.434697 5.266775 5.357641
> 3    3 3.653806 2.666885 1.209326 5.125556 4.963374
> 4    4 7.221255 7.649152 6.540398 6.648506 2.576081
> 5    5 1.848023 5.044314 2.761881 3.307220 1.454234
> 6    6 7.606429 4.911766 2.034813 2.638573 2.818834
> > head(ab.3.df)
>   dmid   w3oab3   w3oab4   w3oab7   w3oab8   w3ovb1
> 1    1 5.835609 6.108220 6.587721 2.451461 2.785467
> 2    2 4.973198 1.196815 6.388056 1.110877 4.226463
> 3    3 3.800367 6.697287 5.235345 6.666829 6.319073
> 4    4 1.093141 1.477773 2.269252 3.194978 4.916342
> 5    5 1.975060 7.204516 4.825435 1.775874 3.484027
> 6    6 3.273361 2.243805 5.326547 5.720892 6.118723
> >
> 
> > str(ab.1.fa)
> List of 2
>  $ rescaled.scores: Named num [1:154] 3.43 3.83 2.43 1.1 2.08 ...
>   ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
>  $ factor.loadings: Named num [1:5] -0.0106 -0.0227 -0.1093 -0.0912
> 0.9975
>   ..- attr(*, "names")= chr [1:5] "g5oab2" "g53" "g54" "g55" ...
> > str(ab.2.fa)
> List of 2
>  $ rescaled.scores: Named num [1:154] 6.34 5.24 5.3 1.91 2.16 ...
>   ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
>  $ factor.loadings: Named num [1:5] -0.2042 0.0063 -0.2287 -0.0119
> 0.7138
>   ..- attr(*, "names")= chr [1:5] "w2oab3" "w22" "w23" "w24" ...
> > str(ab.3.fa)
> List of 2
>  $ rescaled.scores: Named num [1:154] NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN ...
>   ..- attr(*, "names")= chr [1:154] "1" "2" "3" "4" ...
>  $ factor.loadings: Named num [1:5] -0.1172 0.0128 -0.0968 0.106 0.9975
>   ..- attr(*, "names")= chr [1:5] "w3oab3" "w3oab4" "w3oab7" "w3oab8"
> ...
> 
> Anyway I have no idea what you consider wrong?
> 
> Regards
> Petr
> 
> 
> 
> > -----Original Message-----
> > From: Justin Delahunty [mailto:ACU at genius.net.au]
> > Sent: Friday, July 26, 2013 2:22 PM
> > To: PIKAL Petr; 'Justin Delahunty'; r-help at r-project.org
> > Subject: RE: [R] Maintaining data order in factanal with missing data
> >
> > Hi Petr,
> >
> > Thanks for the quick response. Unfortunately I cannot share the data
> I
> > am working with, however please find attached a suitable R workspace
> > with generated data. It has the appropriate variable names, only the
> > data has been changed.
> >
> > The last function in the list (init.dfs()) I call to subset the
> > overall data set into the three waves, then conduct the factor
> > analysis on each
> > (1 factor CFA); it's just in a function to ease re-typing in a new
> > workspace.
> >
> >
> > Thanks,
> >
> > Justin
> >
> > -----Original Message-----
> > From: PIKAL Petr [mailto:petr.pikal at precheza.cz]
> > Sent: Friday, 26 July 2013 7:35 PM
> > To: Justin Delahunty; r-help at r-project.org
> > Subject: RE: [R] Maintaining data order in factanal with missing data
> >
> > Hi
> >
> > You provided functions, so far so good. But without data it would be
> > quite difficult to understand what the functions do and where could
> be
> > the issue.
> >
> > I suspect combination of complete cases selection together with
> subset
> > and factor behaviour. But I can be completely out of target too.
> >
> > Petr
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > project.org] On Behalf Of s00123776 at myacu.edu.au
> > > Sent: Friday, July 26, 2013 9:35 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Maintaining data order in factanal with missing data
> > >
> > > Hi,
> > >
> > >
> > >
> > > I'm new to R, so sorry if this is a simple answer. I'm currently
> > > trying to collapse some ordinal variables into a composite; the
> > > program ideally should take a data frame as input, perform a factor
> > > analysis, compute factor scores, sds, etc., and return the rescaled
> > > scores and loadings. The difficulty I'm having is that my data set
> > > contains a number of NA, which I am excluding from the analysis
> > > using complete.cases(), and thus the incomplete cases are
> "skipped".
> > > These functions are for a longitudinal data set with repeated waves
> > > of
> > data,
> > > so the final rescaled scores from each wave need to be saved as
> > > variables grouped by a unique ID (DMID). The functions I'm trying
> to
> > > implement are as follows:
> > >
> > >
> > >
> > > weighted.sd<-function(x,w){
> > >
> > >                                 sum.w<-sum(w)
> > >
> > >                                 sum.w2<-sum(w^2)
> > >
> > >                                 mean.w<-sum(x*w)/sum(w)
> > >
> > >
> > > x.sd.w<-sqrt((sum.w/(sum.w^2-sum.w2))*sum(w*(x-mean.w)^2))
> > >
> > >                                 return(x.sd.w)
> > >
> > >                                 }
> > >
> > >
> > >
> > > re.scale<-function(f.scores, raw.data, loadings){
> > >
> > >
> > > fz.scores<-(f.scores+mean(f.scores))/(sd(f.scores))
> > >
> > >
> > > means<-apply(raw.data,1,weighted.mean,w=loadings)
> > >
> > >
> > > sds<-apply(raw.data,1,weighted.sd,w=loadings)
> > >
> > >                                 grand.mean<-mean(means)
> > >
> > >                                 grand.sd<-mean(sds)
> > >
> > >
> > > final.scores<-((fz.scores*grand.sd)+grand.mean)
> > >
> > >                                 return(final.scores)
> > >
> > >                                 }
> > >
> > >
> > >
> > > get.scores<-function(data){
> > >
> > >
> > > fact<-
> > > factanal(data[complete.cases(data),],factors=1,scores="regression")
> > >
> > >                                 f.scores<-fact$scores[,1]
> > >
> > >                                 f.loads<-fact$loadings[,1]
> > >
> > >                                 rescaled.scores<-re.scale(f.scores,
> > > data[complete.cases(data),], f.loads)
> > >
> > >                                 output.list<-list(rescaled.scores,
> > > f.loads)
> > >
> > >                                 names(output.list)<-
> > > c("rescaled.scores",
> > > "factor.loadings")
> > >
> > >                                 return(output.list)
> > >
> > >                                 }
> > >
> > >
> > >
> > > init.dfs<-function(){
> > >
> > >
> > > ab.1.df<-subset(ab.df,,select=c(dmid,g5oab2:g5ovb1))
> > >
> > >
> > > ab.2.df<-subset(ab.df,,select=c(dmid,w2oab3:w2ovb1))
> > >
> > >
> > > ab.3.df<-subset(ab.df,,select=c(dmid,
> > > w3oab3, w3oab4, w3oab7, w3oab8, w3ovb1))
> > >
> > >
> > >
> > >                                 ab.1.fa<-get.scores(ab.1.df[-1])
> > >
> > >                                 ab.2.fa<-get.scores(ab.2.df[-1])
> > >
> > >                                 ab.3.fa<-get.scores(ab.3.df[-1])
> > >
> > >
> > >                                 }
> > >
> > >
> > >
> > > Thanks for your help,
> > >
> > >
> > >
> > > Justin
> > >
> > >
> > > 	[[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > > guide.html and provide commented, minimal, self-contained,
> > > reproducible code.
> >
> 
> 



More information about the R-help mailing list