[R] finding birth position

Douglas Bates bates at stat.wisc.edu
Fri Oct 26 15:44:22 CEST 2007


Another approach is to convert the data frame that you have in what is
sometimes called the "wide" format to the "long" format.  See ?reshape
for details on this transformation.

In the process of doing the conversion I would also convert the sex of
the child to a factor with meaningful levels and the family number to
a factor.

> birth   # data in the original, "wide" format
  b1 b2 b3 b4 b5 b6
1  1  2  1  2 NA NA
2  2  2 NA NA NA NA
3  1  2  1  1  1 NA
4  2  1 NA NA NA NA
5  1 NA NA NA NA NA
6  2  1  2  1 NA NA
> bl <- reshape(birth, varying = list(1:6),
                v.names = "sex", timevar = "ord",
		idvar = "family", direction = "long")
> head(bl, n = 8)  # a data frame with 3 columns
    ord sex family
1.1   1   1      1
2.1   1   2      2
3.1   1   1      3
4.1   1   2      4
5.1   1   1      5
6.1   1   2      6
1.2   2   2      1
2.2   2   2      2
> bl$sex <- factor(bl$sex, labels = c("M", "F")) # use a factor with meaningful labels
> bl <- subset(bl, !is.na(sex))  # remove records of births that did not occur
> bl$family <- factor(bl$family) # convert family to a factor
> str(bl)         # resulting structure has only 18 rows
'data.frame':	18 obs. of  3 variables:
 $ ord   : int  1 1 1 1 1 1 2 2 2 2 ...
 $ sex   : Factor w/ 2 levels "M","F": 1 2 1 2 1 2 2 2 2 1 ...
 $ family: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ...
> bl
    ord sex family
1.1   1   M      1
2.1   1   F      2
3.1   1   M      3
4.1   1   F      4
5.1   1   M      5
6.1   1   F      6
1.2   2   F      1
2.2   2   F      2
3.2   2   F      3
4.2   2   M      4
6.2   2   M      6
1.3   3   M      1
3.3   3   M      3
6.3   3   F      6
1.4   4   F      1
3.4   4   M      3
6.4   4   M      6
3.5   5   M      3
> subset(bl, sex == "M")  # these are the births of males only
    ord sex family
1.1   1   M      1
3.1   1   M      3
5.1   1   M      5
4.2   2   M      4
6.2   2   M      6
1.3   3   M      1
3.3   3   M      3
3.4   4   M      3
6.4   4   M      6
3.5   5   M      3
> with(subset(bl, sex == "M"), tapply(ord, family, min)) # first male birth in family
 1  2  3  4  5  6
 1 NA  1  2  1  2

The wide format may seem a natural representation for such data but
frequently it is inefficient and awkward.  The long format is much
easier to manipulate in R.

On 10/25/07, jim holtman <jholtman at gmail.com> wrote:
> You might want to consider another representation, but it would depend
> on how you want to use it.  Here is a 'list' that records for each row
> the position of the boys; does this start to give you the type of data
> that you want?  These are the numeric values of where the boys occur.
>
> > x.m
>      b1 b2 b3 b4 b5 b6
> [1,]  1  2  1  2 NA NA
> [2,]  2  2 NA NA NA NA
> [3,]  1  2  1  1  1 NA
> [4,]  2  1 NA NA NA NA
> [5,]  1 NA NA NA NA NA
> [6,]  2  1  2  1 NA NA
> > apply(x.m, 1, function(a)which(a == 1))
> [[1]]
> b1 b3
>  1  3
>
> [[2]]
> named integer(0)
>
> [[3]]
> b1 b3 b4 b5
>  1  3  4  5
>
> [[4]]
> b2
>  2
>
> [[5]]
> b1
>  1
>
> [[6]]
> b2 b4
>  2  4
>
> >
>
>
> On 10/25/07, Deepankar Basu <basu.15 at osu.edu> wrote:
> > Hi All,
> >
> > I have data on the sequence of births for families with completed
> > fertility cycle (in a data frame); the relevant variables are called b1,
> > b2, b3, b4, b5, b6 and record the birth of the first, second, ..., sixth
> > child. So,
> > b1=1 if the first birth is male,
> > b1=2 if the first birth is female,
> > and b1=NA if the family did not record any first birth.
> >
> > Similarly for b2, b3, b4, b5 and b6.
> >
> > I want to record the positions of the male children within their
> > family's birth history. So, I was thinking of creating six variables
> > boy_1, boy_2, ..., boy_6. boy_1 would record the position of the first
> > boy, boy_2 would record the position of the second boy and so on till
> > boy_6. I want to assign a value of zero to boy_i if the family in
> > question did not have the i_th boy.
> >
> > I am not sure how best to do this (i.e., whether to create variables as
> > I have suggested or do something else) and would appreciate any
> > suggestions. Later, I want to use the information on the position of the
> > male births to compute a likelihood function and do an MLE.
> >
> > Here is how my data frame would look:
> >
> > b1 b2 b3 b4 b5 b6
> > 1 2 1 2 NA NA
> > 2 2 NA NA NA NA
> > 1 2 1 1 1 NA
> > 2 1 NA NA NA NA
> > 1 NA NA NA NA NA
> > 2 1 2 1 NA NA
> >
> > Thanks in advance.
> >
> > Deepankar
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list