[R] Replacing NAs in long format

arun smartpink111 at yahoo.com
Sun Nov 4 01:01:09 CET 2012


Hi,
May be this helps:
dat2<-read.table(text="
idr  schyear  year
1        4          -1
1        5            0
1        6            1
1        7            2
2        9            0
2        10            1
2        11          2
",sep="",header=TRUE)

 dat2$flag<-unlist(lapply(split(dat2,dat2$idr),function(x) rep(ifelse(any(apply(x,1,function(x) x[2]<=5 & x[3]==0)),1,0),nrow(x))),use.names=FALSE)
 dat2
#  idr schyear year flag
#1   1       4   -1    1
#2   1       5    0    1
#3   1       6    1    1
#4   1       7    2    1
#5   2       9    0    0
#6   2      10    1    0
#7   2      11    2    0
A.K.




----- Original Message -----
From: Christopher Desjardins <cddesjardins at gmail.com>
To: jim holtman <jholtman at gmail.com>
Cc: r-help at r-project.org
Sent: Saturday, November 3, 2012 7:09 PM
Subject: Re: [R] Replacing NAs in long format

I have a similar sort of follow up and I bet I could reuse some of this
code but I'm not sure how.

Let's say I want to create a flag that will be equal to 1 if schyear  < = 5
and year = 0 for a given idr. For example

> dat

idr   schyear   year
1         4           -1
1         5            0
1         6            1
1         7            2
2         9            0
2        10            1
2        11           2

How could I make the data look like this?

idr   schyear   year   flag
1         4           -1     1
1         5            0     1
1         6            1     1
1         7            2     1
2         9            0     0
2        10            1    0
2        11           2     0


I am not sure how to end up not getting both 0s and 1s for the 'flag'
variable for an idr. For example,

dat$flag = ifelse(schyear <= 5 & year ==0, 1, 0)

Does not work because it will create:

idr   schyear   year   flag
1         4           -1     0
1         5            0     1
1         6            1     0
1         7            2     0
2         9            0     0
2        10            1    0
2        11           2     0

And thus flag changes for an idr. Which it shouldn't.

Thanks,
Chris


On Sat, Nov 3, 2012 at 5:50 PM, Christopher Desjardins <
cddesjardins at gmail.com> wrote:

> Hi Jim,
> Thank you so much. That does exactly what I want.
> Chris
>
>
> On Sat, Nov 3, 2012 at 1:30 PM, jim holtman <jholtman at gmail.com> wrote:
>
>> > x <- read.table(text = "idr  schyear year
>> +  1       8    0
>> +  1       9    1
>> +  1      10   NA
>> +  2       4   NA
>> +  2       5   -1
>> +  2       6    0
>> +  2       7    1
>> +  2       8    2
>> +  2       9    3
>> +  2      10    4
>> +  2      11   NA
>> +  2      12    6
>> +  3       4   NA
>> +  3       5   -2
>> +  3       6   -1
>> +  3       7    0
>> +  3       8    1
>> +  3       9    2
>> +  3      10    3
>> +  3      11   NA", header = TRUE)
>> >  # you did not specify if there might be multiple contiguous NAs,
>> >  # so there are a lot of checks to be made
>> >  x.l <- lapply(split(x, x$idr), function(.idr){
>> +     # check for all NAs -- just return indeterminate state
>> +     if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
>> +     # repeat until all NAs have been fixed; takes care of contiguous
>> ones
>> +     while (any(is.na(.idr$year))){
>> +         # find all the NAs
>> +         for (i in which(is.na(.idr$year))){
>> +             if ((i == 1L) && (!is.na(.idr$year[i + 1L]))){
>> +                 .idr$year[i] <- .idr$year[i + 1L] - 1
>> +             } else if ((i > 1L) && (!is.na(.idr$year[i - 1L]))){
>> +                 .idr$year[i] <- .idr$year[i - 1L] + 1
>> +             } else if ((i < nrow(.idr)) && (!is.na(.idr$year[i +
>> 1L]))){
>> +                 .idr$year[i] <- .idr$year[i + 1L] -1
>> +             }
>> +         }
>> +     }
>> +     return(.idr)
>> + })
>> > do.call(rbind, x.l)
>>      idr schyear year
>> 1.1    1       8    0
>> 1.2    1       9    1
>> 1.3    1      10    2
>> 2.4    2       4   -2
>> 2.5    2       5   -1
>> 2.6    2       6    0
>> 2.7    2       7    1
>> 2.8    2       8    2
>> 2.9    2       9    3
>> 2.10   2      10    4
>> 2.11   2      11    5
>> 2.12   2      12    6
>> 3.13   3       4   -3
>> 3.14   3       5   -2
>> 3.15   3       6   -1
>> 3.16   3       7    0
>> 3.17   3       8    1
>> 3.18   3       9    2
>> 3.19   3      10    3
>> 3.20   3      11    4
>> >
>> >
>>
>>
>> On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins
>> <cddesjardins at gmail.com> wrote:
>> > Hi,
>> > I have the following data:
>> >
>> >> data[1:20,c(1,2,20)]
>> > idr  schyear year
>> > 1       8    0
>> > 1       9    1
>> > 1      10   NA
>> > 2       4   NA
>> > 2       5   -1
>> > 2       6    0
>> > 2       7    1
>> > 2       8    2
>> > 2       9    3
>> > 2      10    4
>> > 2      11   NA
>> > 2      12    6
>> > 3       4   NA
>> > 3       5   -2
>> > 3       6   -1
>> > 3       7    0
>> > 3       8    1
>> > 3       9    2
>> > 3      10    3
>> > 3      11   NA
>> >
>> > What I want to do is replace the NAs in the year variable with the
>> > following:
>> >
>> > idr  schyear year
>> > 1       8    0
>> > 1       9    1
>> > 1      10   2
>> > 2       4   -2
>> > 2       5   -1
>> > 2       6    0
>> > 2       7    1
>> > 2       8    2
>> > 2       9    3
>> > 2      10    4
>> > 2      11   5
>> > 2      12    6
>> > 3       4   -3
>> > 3       5   -2
>> > 3       6   -1
>> > 3       7    0
>> > 3       8    1
>> > 3       9    2
>> > 3      10    3
>> > 3      11   4
>> >
>> > I have no idea how to do this. What it needs to do is make sure that for
>> > each subject (idr) that it either adds a 1 if it is preceded by a value
>> in
>> > year or subtracts a 1 if it comes before a year value.
>> >
>> > Does that make sense? I could do this in Excel but I am at a loss for
>> how
>> > to do this in R. Please reply to me as well as the list if you respond.
>> >
>> > Thanks!
>> > Chris
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>
>

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list