[R] R ignores number only with a nine under 10000

Peter Langfelder peter.langfelder at gmail.com
Mon Nov 21 23:01:58 CET 2011


On Mon, Nov 21, 2011 at 7:42 AM, set <astareh at hotmail.com> wrote:
> Hello R users,
>
> I'm trying to replace numerical values in a datamatrix with strings. R does
> this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504
> etc). This is really weird and I wondered whether someone had encountered
> such a problem or knows the solution. I'm using the next script:
>
> test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep = ",",
> colClasses = "numeric")
> test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT"
> test_1[test_1 != 0 & test_1 <= 18954] = "I8456"
> test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873"
> test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870"
> test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1"
> test_1[test_1 > 18954 & test_1 <= 37991] = "ref"
> write.table(test_1, file = "test_replace7.txt", quote = FALSE, sep="\t")

I think others have already hinted at the problem, but here it is once
again more explicitly: your line
test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT"

converts the entire test1 to character (or at least the columns in
which a replacement happens). When something is a character, you will
find "strange" results:

a = "109"
b = "9"

a<b
> a<b
[1] TRUE

Note that when one side of a comparison is numeric and the other
character, the numeric is converted to character and then they are
compared:

> b = 9
> class(a)
[1] "character"
> class(b)
[1] "numeric"
> a<b
[1] TRUE

This is why your entries starting with 9 are "ignored" - because as
character strings they are the largest.


The solution is simple: create a test2 initialized to test1:

test2 = test1

then replace elements in test2 depending on test1, for example

test_2[test_1 > 94885 & test_1 <= 113835] = "KE3926OT"

This way your test1 remains numeric and the comparisons will work as you expect.

HTH

Peter



More information about the R-help mailing list