[R] problem with factor levels

Milan Bouchet-Valat nalimilan at club.fr
Tue Dec 4 10:11:41 CET 2012


Le mardi 04 décembre 2012 à 00:34 -0800, Jeremy.Shearman a écrit :
> Hi
>       I have a data.frame with 371,718 obs. of 12 variables (see below for
> an str). My problem is with V1, a Factor w/ 93144 levels, there should
> actually be 93994 levels. Each entry looks like:
> comp[number]_c[number]_seq[number]
> for example
> comp215489_c0_seq40
> R is grouping as though the last number is a decimal for some reason, in
> other words comp215489_c0_seq40 and comp215489_c0_seq4 are considered to be
> the same. My problem is that they are not the same so when I group by this
> factor I am losing 800 levels.
What format is your original data using? How do you import it?

Please provide us with an excerpt of your original file showing at least
two different values of V1 that are considered the same once imported in
R (which sounds very unlikely to me...).


Regards

> Here is an str
> 
> 'data.frame':	371718 obs. of  12 variables:
>  $ V1 : Factor w/ 93144 levels "comp100000_c0_seq1",..: 92271 91685 29 30
> 1564 1564 1623 91700 91701 91848 ...
>  $ V2 : Factor w/ 17162 levels "gi|345842331|ref|NM_001244016.1|",..: 10119
> 10779 13210 13210 11522 8115 13079 14493 14493 15858 ...
>  $ V3 : num  95.5 90.2 98.7 99.2 81.4 ...
>  $ V4 : int  335 153 237 122 258 127 306 258 120 177 ...
>  $ V5 : int  15 15 3 1 38 19 20 23 5 9 ...
>  $ V6 : int  0 0 0 0 4 2 0 0 0 0 ...
>  $ V7 : int  1 45 1 43 1 129 1 54 1 70 ...
>  $ V8 : int  335 197 237 164 254 254 306 311 120 246 ...
>  $ V9 : int  6866 18 3172 3438 67 122 3927 42 346 195 ...
>  $ V10: int  7200 170 3408 3559 318 247 4232 299 465 19 ...
>  $ V11: num  7e-155 2e-46 4e-125 2e-61 3e-24 ...
>  $ V12: num  545 184 446 234 111 69.9 448 329 198 280 ..
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/problem-with-factor-levels-tp4652006.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list