[R] Strange column shifting with read.table

David Winsemius dwinsemius at comcast.net
Mon Aug 3 01:14:17 CEST 2009


On Aug 2, 2009, at 7:02 PM, Noah Silverman wrote:

> Hi,
>
> It seems as if the problem was caused by an odd quirk of the "scale"
> function.
>
> Some of my data have NA entries.
>
> So, I substitute 0 for any NA with:
> rawdata[is.na(rawdata)] <- 0

Perhaps this would have done what you intended:

rawdata[is.na(rawdata), ] <- 0

# But this is added _only_ as a matter of coding behavior. See below.

>
> I then scale the data.
>
> For some reason that I don't understand, I find some NA back in the  
> data
> after the scale command.
> But, issuing the same 0 substitution AFTER the scale command makes
> everything work again.
> rawdata[is.na(rawdata)] <- 0

It "works" because rawdata has been converted by scale() to a matrix  
which can be accessed as a vector.

>

The notion of adding zeroes for NA seems "so wrong". And the idea that  
you might get the same results of doing so before scale() as after  
scale() seems additionally bizarre.


>
> VERY strange behavior.
>

Your behavior might be seen as VERY strange by some.

-- 
D


> -N
>
> On 8/2/09 3:57 PM, J Dougherty wrote:
>> On Sunday 02 August 2009 02:34:43 pm Noah Silverman wrote:
>>
>>> The column names have to obfuscated, but here are 10 rows of the  
>>> data.
>>>
>>> label 	c0 	c1 	c2 	c3 	c4 	c5 	c6 	c7 	c8 	c9 	c10 	c11 	c12 	c13
>>> c14 	c15 	c16 	c17 	c18 	c19 	c20 	c21 	c22 	c23 	c24 	c25 	c26 	c27
>>> c28 	c29 	c30 	c31 	c32 	c33 	c34 	c35 	c36 	c37 	c38 	c39 	c40 	c41
>>> c42 	c43 	c44 	c45 	c46 	c47 	c48 	c49 	c50 	c51 	c52 	c53 	c54 	c55
>>> c56 	c57 	c58 	c59 	c60 	c61 	c62 	c63 	c64 	c65 	c66
>>> sick 	2008-12-28_1 	95.609 	5 	3.3 	1.35 	0 	1 	35 	9.6666 	0 	0
>>> 0.0833 	1 	0.0833 	1 	0.1428 	7 	3 	2.035714286 	6.5 	94.8481
>>> 53.846 	12 	-4.69 	1.25 	0.5062 	0.0522 	0.1808 	3 	0.5126 	0.0694
>>> 0.2061 	94.9288 	8.3125 	0.0247 	7.5833 	9.3 	35 	9.6666 	0 	0
>>> 0.0833 	1 	0.0833 	1 	0.1428 	7 	3 	2.035714286 	6.5 	94.8481
>>> 53.846 	12 	-4.69 	1.25 	0.5062 	0.0522 	0.1808 	3 	0.5126 	0.0694
>>> 0.2061 	94.9288 	8.3125 	0.0247 	7.5833 	9.3
>>> well 	2008-12-28_1 	95.338 	1 	11 	3.2 	3 	2 	11 	7.0277 	0.0555 	2
>>> 0.1666 	6 	0.1666 	5 	0.238 	18 	11 	2.541666667 	2.022727273 	 
>>> 94.7733
>>> 38.461 	36 	6.07 	7.5555 	0.5928 	0.0955 	0.2871 	0 	0.5434 	0.0679
>>> 0.2283 	95.9003 	5.1736 	0.0847 	7.3333 	28 	11 	7.0277 	0.0555 	2
>>> 0.1666 	6 	0.1666 	5 	0.238 	18 	11 	2.541666667 	2.022727273 	 
>>> 94.7733
>>> 38.461 	36 	6.07 	7.5555 	0.5928 	0.0955 	0.2871 	0 	0.5434 	0.0679
>>> 0.2283 	95.9003 	5.1736 	0.0847 	7.3333 	28
>>> well 	2008-12-28_1 	95.204 	2 	7.4 	2.75 	4 	1 	22 	8.4545 	0 	0
>>> 0 	0 	0 	0 	0 	6 	4 	2.791666667 	2.5625 	94.8444 	61.538 	11 	2.84
>>> 3.0909 	0.5693 	0.0641 	0.2738 	0 	0.5874 	0.1011 	0.2803 	94.9769
>>> 8.1363 	0.0467 	5.4545 	10 	22 	8.4545 	0 	0 	0 	0 	0 	0 	0 	6 	4
>>> 2.791666667 	2.5625 	94.8444 	61.538 	11 	2.84 	3.0909 	0.5693 	 
>>> 0.0641
>>> 0.2738 	0 	0.5874 	0.1011 	0.2803 	94.9769 	8.1363 	0.0467 	5.4545  
>>> 	10
>>> sick 	2008-12-28_1 	95.204 	14 	48
>>> 	0 	3 	25 	8.7045 	0.0909 	4 	0.2045 	9 	0.2045 	4 	0.2666 	11 	8
>>> 4.409090909 	0 	95.0006 	15.384 	44 	1.76 	7.409 	0.4475 	0.0285
>>> 0.1206 	0 	0.5094 	0.058 	0.1931 	92.9455 	7.2613 	0.0532 	4.5227
>>> 82 	25 	8.7045 	0.0909 	4 	0.2045 	9 	0.2045 	4 	0.2666 	11 	8
>>> 4.409090909 	0 	95.0006 	15.384 	44 	1.76 	7.409 	0.4475 	0.0285
>>> 0.1206 	0 	0.5094 	0.058 	0.1931 	92.9455 	7.2613 	0.0532 	4.5227 	 
>>> 82
>>> well 	2008-12-28_1 	95.07 	13 	26
>>> 	1 	1 	11 	8.1 	0.0666 	2 	0.1666 	5 	0.1666 	0 	0 	21 	16
>>> 2.571428571 	1.984375 	94.825 	30.769 	30 	-4.69 	-0.7999 	0.5166
>>> 0.0624 	0.2078 	0 	0.5306 	0.0792 	0.2398 	95.2282 	7.575 	0.0715
>>> 3.4333 	44 	11 	8.1 	0.0666 	2 	0.1666 	5 	0.1666 	0 	0 	21 	16
>>> 2.571428571 	1.984375 	94.825 	30.769 	30 	-4.69 	-0.7999 	0.5166
>>> 0.0624 	0.2078 	0 	0.5306 	0.0792 	0.2398 	95.2282 	7.575 	0.0715
>>> 3.4333 	44
>>> well 	2008-12-28_1 	95.07 	9 	16
>>> 	0 	4 	39 	9.4117 	0 	0 	0.0588 	1 	0.0588 	0 	0 	3 	25 	3.916666667
>>> 2.96 	94.8177 	30.769 	17 	-20.84 	-15.8234 	0.8205 	0.3333 	 
>>> 0.6666 	0
>>> 0.6054 	0.1287 	0.3292 	95.3232 	6.9117 	0.076 	2.647 	16 	39
>>> 9.4117 	0 	0 	0.0588 	1 	0.0588 	0 	0 	3 	25 	3.916666667 	2.96
>>> 94.8177 	30.769 	17 	-20.84 	-15.8234 	0.8205 	0.3333 	0.6666 	0
>>> 0.6054 	0.1287 	0.3292 	95.3232 	6.9117 	0.076 	2.647 	16
>>> sick 	2008-12-28_1 	94.936 	6 	11
>>> 	4 	1 	28 	7.725 	0.075 	3 	0.125 	5 	0.125 	0 	0 	6 	2 	4 	1.75
>>> 94.7815 	46.153 	40 	6.07 	12.5 	0.5014 	0.0621 	0.1972 	6 	0.523
>>> 0.0742 	0.2035 	95.794 	6.0625 	0.046 	7.25 	12 	28 	7.725 	0.075 	3
>>> 0.125 	5 	0.125 	0 	0 	6 	2 	4 	1.75 	94.7815 	46.153 	40 	6.07 	 
>>> 12.5
>>> 0.5014 	0.0621 	0.1972 	6 	0.523 	0.0742 	0.2035 	95.794 	6.0625
>>> 0.046 	7.25 	12
>>> well 	2008-12-28_1 	94.803 	11 	13
>>> 	0 	5 	35 	7.125 	0.0937 	3 	0.1562 	5 	0.1562 	5 	0.2 	18 	17
>>> 1.555555556 	2.794117647 	95.0398 	38.461 	32 	10.38 	8.4063 	0.5804
>>> 0.0871 	0.2627 	1 	0.558 	0.0738 	0.2324 	92.4367 	5.289 	0.0722
>>> 9.125 	16 	35 	7.125 	0.0937 	3 	0.1562 	5 	0.1562 	5 	0.2 	18 	17
>>> 1.555555556 	2.794117647 	95.0398 	38.461 	32 	10.38 	8.4063 	0.5804
>>> 0.0871 	0.2627 	1 	0.558 	0.0738 	0.2324 	92.4367 	5.289 	0.0722 	 
>>> 9.125 	16
>>> well 	2008-12-28_1 	94.67 	4 	38
>>> 	5 	1 	11 	8.9642 	0.0357 	1 	0.1428 	4 	0.1428 	4 	0.2105 	11 	13
>>> 3.772727273 	4.307692308 	94.8451 	23.076 	28 	-5.76 	-4 	0.3269 	0
>>> 0.0833 	0 	0.5222 	0.0616 	0.2079 	94.9668 	8.6696 	0.0663 	4.6428
>>> 14 	11 	8.9642 	0.0357 	1 	0.1428 	4 	0.1428 	4 	0.2105 	11 	13
>>> 3.772727273 	4.307692308 	94.8451 	23.076 	28 	-5.76 	-4 	0.3269 	0
>>> 0.0833 	0 	0.5222 	0.0616 	0.2079 	94.9668 	8.6696 	0.0663 	4.6428  
>>> 	14
>>> well 	2008-12-28_1 	94.537 	12 	39
>>> 	0 	1 	35 	9.4444 	0 	0 	0 	0 	0 	0 	0 	2 	7 	2.5 	2.892857143 	 
>>> 94.878
>>> 23.076 	9 	-12.23 	-9.6666 	0.4428 	0 	0.0857 	0 	0.5411 	0.0849 	 
>>> 0.25
>>> 94.54 	8.9166 	0.0296 	6.1111 	67 	35 	9.4444 	0 	0 	0 	0 	0 	0 	0
>>> 2 	7 	2.5 	2.892857143 	94.878 	23.076 	9 	-12.23 	-9.6666 	0.4428  
>>> 	0
>>> 0.0857 	0 	0.5411 	0.0849 	0.25 	94.54 	8.9166 	0.0296 	6.1111 	67
>>>
>>>
>>>
>> Your initial post mentions 70 columns in your data table, yet the  
>> example
>> shows 67 counting the initial "labels" term in the header.  I would  
>> suggest
>> adding "row.names = NULL" to force row numbers and see how that  
>> behaves, e.g.
>>
>> rawdata<- read.table("r_work/train_data.csv", header=T, sep=",",
>> 			na.strings=0, row.names = NULL)
>>
>> Otherwise, you might want to consult the R Manual where it states:
>>
>> header 	a logical value indicating whether the file contains the  
>> names of the 	
>> 		variables as its first line. If missing, the value is determined  
>> from the 			
>> 		file format: header is set to TRUE if and only if the first row  
>> contains one
>> 		fewer field than the number of columns.
>>
>> So, you might also want to count up your column names in the header  
>> line.
>>
>> JWDougherty
>>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list