[R] Basic data question

David Winsemius dwinsemius at comcast.net
Thu Oct 14 06:00:26 CEST 2010


On Oct 13, 2010, at 11:52 PM, Santosh Srinivas wrote:

> I have  a question about the output given below after running few  
> lines of
> code. Surely a 101 query!
>
> MF_Data <- read.csv("MF_Data_F.txt", header = F, sep="|")
> temp <- head(MF_Data) #Get the sample Data
> temp1 <- subset(temp, select= c(V1,V4,V6)) #where V1, V4, V6 are the  
> col
> names .. to Get the relevant data
> names(temp1) <- c('Ticker', 'Price','Date') #Adjusted column names
>
> Now as expected, I get:
>> temp1
>  Ticker   Price        Date
> 1 106270 10.3287 01-Apr-2008
> 2 106269 10.3287 01-Apr-2008
> 3 102767 12.6832 01-Apr-2008
> 4 102766 10.5396 01-Apr-2008
> 5 102855  9.7833 01-Apr-2008
> 6 102856 12.1485 01-Apr-2008
>
> BUT, for the below:
> temp1$Price
> [1] 10.3287 10.3287 12.6832 10.5396 9.7833  12.1485
> 439500 Levels: -101.2358 -102.622 -2171.1276 -6796.4926 -969.5193 ...
> Repurchase Price
>
> What is this line? "439500 Levels: -101.2358 -102.622 -2171.1276  
> -6796.4926
> -969.5193 ... Repurchase Price"??
>

It tells you that the Price column got constructed as a factor. One of  
the items in the input data couldn't be coerced to numeric hence  
looked like a character variable and the default stringsAsFactors  
setting of TRUE resulted in classifying that column as factor rather  
than as numeric (or character. Your Date column is surely a factor  
variable.

You may want to look at colClasses in the read.table help page.

The read.zoo function in the zoo package may have better behavior for  
this sort of data input task.

> Many thanks for the help.
>
> Santosh

-- 
David.



More information about the R-help mailing list