[R] finding birth position

Deepankar Basu basu.15 at osu.edu
Fri Oct 26 17:16:41 CEST 2007


Thanks a lot for all the comments and suggestions. It has helped me
solve the problem. I find the "wide" to "long" transformation of the
data especially helpful. I used this in STATA but was not aware that I
could do the same in R.

Deepankar 


On Fri, 2007-10-26 at 08:44 -0500, Douglas Bates wrote:
> Another approach is to convert the data frame that you have in what is
> sometimes called the "wide" format to the "long" format.  See ?reshape
> for details on this transformation.
> 
> In the process of doing the conversion I would also convert the sex of
> the child to a factor with meaningful levels and the family number to
> a factor.
> 
> > birth   # data in the original, "wide" format
>   b1 b2 b3 b4 b5 b6
> 1  1  2  1  2 NA NA
> 2  2  2 NA NA NA NA
> 3  1  2  1  1  1 NA
> 4  2  1 NA NA NA NA
> 5  1 NA NA NA NA NA
> 6  2  1  2  1 NA NA
> > bl <- reshape(birth, varying = list(1:6),
>                 v.names = "sex", timevar = "ord",
> 		idvar = "family", direction = "long")
> > head(bl, n = 8)  # a data frame with 3 columns
>     ord sex family
> 1.1   1   1      1
> 2.1   1   2      2
> 3.1   1   1      3
> 4.1   1   2      4
> 5.1   1   1      5
> 6.1   1   2      6
> 1.2   2   2      1
> 2.2   2   2      2
> > bl$sex <- factor(bl$sex, labels = c("M", "F")) # use a factor with meaningful labels
> > bl <- subset(bl, !is.na(sex))  # remove records of births that did not occur
> > bl$family <- factor(bl$family) # convert family to a factor
> > str(bl)         # resulting structure has only 18 rows
> 'data.frame':	18 obs. of  3 variables:
>  $ ord   : int  1 1 1 1 1 1 2 2 2 2 ...
>  $ sex   : Factor w/ 2 levels "M","F": 1 2 1 2 1 2 2 2 2 1 ...
>  $ family: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
> > bl
>     ord sex family
> 1.1   1   M      1
> 2.1   1   F      2
> 3.1   1   M      3
> 4.1   1   F      4
> 5.1   1   M      5
> 6.1   1   F      6
> 1.2   2   F      1
> 2.2   2   F      2
> 3.2   2   F      3
> 4.2   2   M      4
> 6.2   2   M      6
> 1.3   3   M      1
> 3.3   3   M      3
> 6.3   3   F      6
> 1.4   4   F      1
> 3.4   4   M      3
> 6.4   4   M      6
> 3.5   5   M      3
> > subset(bl, sex == "M")  # these are the births of males only
>     ord sex family
> 1.1   1   M      1
> 3.1   1   M      3
> 5.1   1   M      5
> 4.2   2   M      4
> 6.2   2   M      6
> 1.3   3   M      1
> 3.3   3   M      3
> 3.4   4   M      3
> 6.4   4   M      6
> 3.5   5   M      3
> > with(subset(bl, sex == "M"), tapply(ord, family, min)) # first male birth in family
>  1  2  3  4  5  6
>  1 NA  1  2  1  2
> 
> The wide format may seem a natural representation for such data but
> frequently it is inefficient and awkward.  The long format is much
> easier to manipulate in R.
> 
> On 10/25/07, jim holtman <jholtman at gmail.com> wrote:
> > You might want to consider another representation, but it would depend
> > on how you want to use it.  Here is a 'list' that records for each row
> > the position of the boys; does this start to give you the type of data
> > that you want?  These are the numeric values of where the boys occur.
> >
> > > x.m
> >      b1 b2 b3 b4 b5 b6
> > [1,]  1  2  1  2 NA NA
> > [2,]  2  2 NA NA NA NA
> > [3,]  1  2  1  1  1 NA
> > [4,]  2  1 NA NA NA NA
> > [5,]  1 NA NA NA NA NA
> > [6,]  2  1  2  1 NA NA
> > > apply(x.m, 1, function(a)which(a == 1))
> > [[1]]
> > b1 b3
> >  1  3
> >
> > [[2]]
> > named integer(0)
> >
> > [[3]]
> > b1 b3 b4 b5
> >  1  3  4  5
> >
> > [[4]]
> > b2
> >  2
> >
> > [[5]]
> > b1
> >  1
> >
> > [[6]]
> > b2 b4
> >  2  4
> >
> > >
> >
> >
> > On 10/25/07, Deepankar Basu <basu.15 at osu.edu> wrote:
> > > Hi All,
> > >
> > > I have data on the sequence of births for families with completed
> > > fertility cycle (in a data frame); the relevant variables are called b1,
> > > b2, b3, b4, b5, b6 and record the birth of the first, second, ..., sixth
> > > child. So,
> > > b1=1 if the first birth is male,
> > > b1=2 if the first birth is female,
> > > and b1=NA if the family did not record any first birth.
> > >
> > > Similarly for b2, b3, b4, b5 and b6.
> > >
> > > I want to record the positions of the male children within their
> > > family's birth history. So, I was thinking of creating six variables
> > > boy_1, boy_2, ..., boy_6. boy_1 would record the position of the first
> > > boy, boy_2 would record the position of the second boy and so on till
> > > boy_6. I want to assign a value of zero to boy_i if the family in
> > > question did not have the i_th boy.
> > >
> > > I am not sure how best to do this (i.e., whether to create variables as
> > > I have suggested or do something else) and would appreciate any
> > > suggestions. Later, I want to use the information on the position of the
> > > male births to compute a likelihood function and do an MLE.
> > >
> > > Here is how my data frame would look:
> > >
> > > b1 b2 b3 b4 b5 b6
> > > 1 2 1 2 NA NA
> > > 2 2 NA NA NA NA
> > > 1 2 1 1 1 NA
> > > 2 1 NA NA NA NA
> > > 1 NA NA NA NA NA
> > > 2 1 2 1 NA NA
> > >
> > > Thanks in advance.
> > >
> > > Deepankar
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem you are trying to solve?
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >



More information about the R-help mailing list