[R] replacing a factor value in a data frame

Dave Roberts droberts at montana.edu
Fri Oct 28 18:01:14 CEST 2005


Federico,

     There doesn't appear to be an instance of the value you want to 
change in your example, so I had to improvise.  Part of the problem may 
be that the dataframe is composed of factors, and it's not possible to 
convert the value of a factor to another value that's in the set of 
possible values, given by the levels() function.  So, if you want to 
change GC to CG, but CG does not already exist in the set of possible 
values you'll have to add it. E.g.

 > tmp <- data
 > levels(tmp[,30]) <- c(levels(data[,30]),'CG')

then, if the problem only occurs in one column it's an easy fix.

 > tmp[data=='GC'] <- 'CG'

If GC occurs in multiple columns you'll either have to change the levels 
for each column as I did just above, or work with a single column. 
Since you don't have 30 columns in your example, let's pretend you want 
to change all the instances of 'CC' in data$V5 to 'XX'

 > tmp <- data
 > levels(tmp$V5) <- c(levels(data$V5),'XX')
 > tmp$V5[data$V5=='CC'] <- 'XX'
 > tmp
    V4 V5 V6 V7   V8   V9 V10
1  TT GG TT AC   AG   AG  TT
2  AT XX TT AA   AA   AA  TT
3  AT XX TT AC   AA <NA>  TT
4  TT XX TT AA   AA   AA  TT
5  AT CG TT CC   AA   AA  TT
6  TT XX TT AA   AA   AA  TT
7  AT XX TT CC <NA> <NA>  TT
8  TT XX TT AC   AG   AG  TT
9  AT XX TT CC   AG <NA>  TT
10 TT XX TT CC   GG   GG  TT

Notice that the instances of 'CC' in tmp$V7 did not change.

HTH, Dave Roberts

Federico Calboli wrote:
> Hi All,
> 
> I have the following problem, that's driving me mad.
> 
> I have a dataframe of factors, from a genetic scan of SNPs. I DO have
> NAs in the dataframe, which would look like:
> 
>    V4 V5 V6 V7   V8   V9 V10
> 1  TT GG TT AC   AG   AG  TT
> 2  AT CC TT AA   AA   AA  TT
> 3  AT CC TT AC   AA <NA>  TT
> 4  TT CC TT AA   AA   AA  TT
> 5  AT CG TT CC   AA   AA  TT
> 6  TT CC TT AA   AA   AA  TT
> 7  AT CC TT CC <NA> <NA>  TT
> 8  TT CC TT AC   AG   AG  TT
> 9  AT CC TT CC   AG <NA>  TT
> 10 TT CC TT CC   GG   GG  TT
> 
> 
> In the dataframe I have 1 column where one factor has been erroneosly
> given alternative readings: CG and GC. 
> 
> I want to change the instances of GC to CG and I use the code:
> 
> data[data[,30]=="GC", 30] = "CG"
> 
> but get the error:
> Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30
>         missing values are not allowed in subscripted as
> 
> Any hints?
> 
> Cheers,
> 
> Federico
> 


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts                                     office 406-994-4548
Professor and Head                                      FAX 406-994-3190
Department of Ecology                         email droberts at montana.edu
Montana State University
Bozeman, MT 59717-3460




More information about the R-help mailing list