[R] Replacing columns in a data frame using a previous condition

Dimitris Rizopoulos Dimitris.Rizopoulos at med.kuleuven.be
Thu Feb 14 21:44:37 CET 2008


try this:

GERU[6:318] <- lapply(GERU[6:318], function (x) {
     if (length(unique(x[!is.na(x)])) >= 5) x[x == 2] <- 3
     x
})


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
      http://www.student.kuleuven.be/~m0390867/dimitris.htm


Quoting Jorge Iván Vélez <jorgeivanvelez at gmail.com>:

> Dear R-list,
>
> I'm working with a data frame which dimensions are
>
>> dim(GERU)
> [1] 3468  318
>
> and looks like
>
>> GERU[1:10,1:10]
>        ped ind par1 par2 sex sta rs7696470 rs7696470.1 rs1032896 rs1032896.1
> 1  USA5854   2    0    0   2   1         4           4         1           1
> 2  USA5854   3    1    2   1   1         4           4         1           1
> 3  USA5854   4    1    2   2   2         1           4         1           3
> 4  USA5854   5    1    2   1   2         4           2         2           1
> 5  USA5855   1    0    0   1   1         0           0         0           0
> 6  USA5855   2    0    0   2   2         1           0         0           0
> 7  USA5855   3    1    2   1   2         0           2         0           0
> 8  USA5855   4    1    2   1   1         2           0         2           1
> 9  USA5855   5    1    2   1   2         0           1         0           0
> 10 USA5856   1    0    0   1   1        3           3         3           3
>
> What I would like to do is:
>
> 1. Identify which column (from 6 to 318) has more than 4 categories (I
> solved that). In GERU would be rs7696470 and rs7696470.1.
> 2. Using the columns in step 1, replace its entries equals to 2 for 3. For
> example, rs7696470 would be 4,4,1,4,0,1,0,3,0,3 and so on.
> 3. Once replaced the entries, I need to rewrite the columns in GERU.
>
> Here is what I've done:
>
>> # Function to identify columns with 3 or more categories
>> tx=function(x) ifelse(dim(table(x))>4,1,0)
>
>> # Identifying the columns
>> M4=apply(GUPN[,-c(1:6)],2,tx)
>> names(which(MR==1))                    # Step 1
>  [1] "rs335322"     "rs335322.1"   "rs186750"     "rs186750.1"
> "rs1565901"    "rs1565901.1"  "rs1565902"
>  [8] "rs1565902.1"  "rs11131334"   "rs11131334.1" "rs1948616"    "
> rs1948616.1"  "rs4484334"    "rs4484334.1"
> [15] "rs1497921"    "rs1497921.1"  "rs1391320"    "rs1391320.1"
> "rs1497913"    "rs1497913.1"  "rs996208"
> [22] "rs996208.1"
>> # Step 2
>> REPLACE=GUPN[,names(which(AR==1))]
>> RES=apply(REPLACE,2,function(x) ifelse(x==2,3,x))
>> RES[1:10,1:5]
>    rs335322 rs335322.1 rs186750 rs186750.1 rs1565901
> 1         1          3        3          3         3
> 2         1          1        3          3         3
> 3         3          3        1          3         3
> 4         1          3        3          3         3
> 5         0          0        0          0         0
> 6         0          0        0          0         0
> 7         0          0        0          0         0
> 8         0          0        0          0         0
> 9         0          0        0          0         0
> 10        1          3        3          3         1
>
> Now, the problem I have is replacing the columns in GERU by the columns in
> RES (step 3). At the end the dimension of the new data set should be
> 3468x318. Any help would be greatly appreciated.
>
> Thanks you so much,
>
>
> Jorge
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the R-help mailing list