[R] data.frame with NA

David Winsemius dwinsemius at comcast.net
Wed Mar 20 15:04:13 CET 2013


On Mar 19, 2013, at 8:18 AM, David L Carlson wrote:

> Try this instead:
>
>> Foglio1[,2:ncol(Foglio1)] <- na.locf(Foglio1[, 
>> 2:ncol(Foglio1)],fromLast=T)
>> str(Foglio1)
> 'data.frame':   1489 obs. of  15 variables:
> $ Date: Date, format: "2001-08-17" "2001-08-20" ...
> $ a   : num  202 201 202 201 202 ...
> $ b   : num  231 230 230 230 232 ...
> $ c   : num  177 179 181 180 182 ...
> $ d   : num  277 277 276 276 275 ...
> $ e   : num  2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
> $ f   : num  275 277 279 279 279 ...
> $ g   : num  91.7 90.7 90.8 91.1 91 ...
> $ h   : num  11446 11258 11280 11396 11127 ...
> $ i   : num  388 389 393 392 393 ...
> $ l   : num  93.2 94 92.4 93.4 93.1 ...
> $ m   : num  128 127 126 129 130 ...
> $ n   : num  103 103 103 103 103 ...
> $ o   : num  133 133 133 133 133 ...
> $ p   : num  107 107 107 107 107 ...
>
> It appears that na.locf() converts the object to a matrix at some  
> point (but
> I haven't checked the source code). The first column (the Date  
> variable) is
> treated as character. As a result, everything gets converted to  
> character.
> This will skip the first column which does not have any missing  
> values.

It happens when the argument gets converted to a zoo coredata object.  
There is only `na.locf.default` that works on zoo objects and no  
`na.locf.data.frame`. The reason there is no warning in help(na.locf)  
is that the author assumed the OP had already read help(zoo) and  
understood the data structures were different than other R objects.

-- 
David.

>> -----Original Message-----
>> From: Pietro [mailto:freerisk3 at gmail.com]
>> Sent: Tuesday, March 19, 2013 6:10 AM
>> To: dcarlson at tamu.edu; dcarlson at tamu.edu
>> Cc: r-help at stat.math.ethz.ch
>> Subject: RE: [R] data.frame with NA
>>
>> Yes, colClasses is the solution. Thank you very much.
>> However i found a very strange thing.
>>
>> If i use:
>> Foglio1 <- read.xlsx2("mydb.xlsx", 1, colClasses=c("Date",
>> rep("numeric",14)))
>>
>> i get numeric dataframe, as you said.
>>
>> I also get NaN (and not NA).
>>
>> At this point i use the function:
>> Foglio1 = na.locf(Foglio1,fromLast=T) and it works perfectly. All NaN
>> 's were replaced with the first numeric value, as expected.
>>
>> And now the enigma.
>>
>> After na.locf function, Foglio1 become all CHR again! It seems that
>> na.locf  convert from num to chr. Even Date is converted in chr.
>> I'm reading the help of this function but i can't find trace about
>> the possibility of this conversion.
>>
>> It seems that i can't get in anyway a numeric dataframe without NA o
>> NaN!
>> Ok, i admit that i'm a newbie, but i'm trying every day to gain
>> confidence with R
>>
>> Can i ask you the courtesy to use na.locf function to see if also on
>> your computer this function convert all to CHR?
>>
>> Thank you
>>
>>
>>
>> At 21.37 18/03/2013, David L Carlson wrote:
>>> It appears that you MUST use the colClasses= argument with  
>>> read.xlsx2:
>>>
>>> Foglio1 <- read.xlsx2("mydb.xlsx", 1, colClasses=c("Date",
>> rep("numeric",
>>> 14)))
>>>
>>> However, e and n are converted to NaN not NA so you would need to
>> convert
>>> those columns (at least, I didn't check for missing values in the
>> other
>>> columns):
>>>
>>>> Foglio1$e <- ifelse(is.nan(Foglio1$e), NA, Foglio1$e)
>>>> Foglio1$n <- ifelse(is.nan(Foglio1$n), NA, Foglio1$n)
>>>> str(Foglio1)
>>> 'data.frame':   1489 obs. of  15 variables:
>>> $ Date: Date, format: "2001-08-17" "2001-08-20" ...
>>> $ a   : num  202 201 202 201 202 ...
>>> $ b   : num  231 230 230 230 232 ...
>>> $ c   : num  177 179 181 180 182 ...
>>> $ d   : num  277 277 276 276 275 ...
>>> $ e   : num  NA NA NA NA NA NA NA NA NA NA ...
>>> $ f   : num  275 277 279 279 279 ...
>>> $ g   : num  91.7 90.7 90.8 91.1 91 ...
>>> $ h   : num  11446 11258 11280 11396 11127 ...
>>> $ i   : num  388 389 393 392 393 ...
>>> $ l   : num  93.2 94 92.4 93.4 93.1 ...
>>> $ m   : num  128 127 126 129 130 ...
>>> $ n   : num  NA NA NA NA NA NA NA NA NA NA ...
>>> $ o   : num  133 133 133 133 133 ...
>>> $ p   : num  107 107 107 107 107 ...
>>>
>>> -------
>>> David
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>> project.org] On Behalf Of David L Carlson
>>>> Sent: Monday, March 18, 2013 3:22 PM
>>>> To: 'Pietro'; 'Berend Hasselman'
>>>> Cc: r-help at stat.math.ethz.ch
>>>> Subject: Re: [R] data.frame with NA
>>>>
>>>> Try this
>>>>
>>>> Open the spreadsheet in Excel. Select all of the data click Copy.
>> Don't
>>>> close Excel.
>>>>
>>>> Open R and type the following command:
>>>>
>>>>> Foglio1 <- read.table("clipboard-128", header=TRUE, sep="\t")
>>>>
>>>> Now take a look at the structure of the data.frame
>>>>
>>>>> str(Foglio1)
>>>> 'data.frame':   1489 obs. of  15 variables:
>>>> $ Date: Factor w/ 1489 levels "1/10/2002","1/10/2003",..: 1275
>> 1291
>>>> 1295
>>>> 1299 1304 1309 1321 1325 1329 1337 ...
>>>> $ a   : num  202 201 202 201 202 ...
>>>> $ b   : num  231 230 230 230 232 ...
>>>> $ c   : num  177 179 181 180 182 ...
>>>> $ d   : num  277 277 276 276 275 ...
>>>> $ e   : num  NA NA NA NA NA NA NA NA NA NA ...
>>>> $ f   : num  275 277 279 279 279 ...
>>>> $ g   : num  91.7 90.7 90.8 91.1 91 ...
>>>> $ h   : num  11446 11258 11280 11396 11127 ...
>>>> $ i   : num  388 389 393 392 393 ...
>>>> $ l   : num  93.2 94 92.4 93.4 93.1 ...
>>>> $ m   : num  128 127 126 129 130 ...
>>>> $ n   : num  NA NA NA NA NA NA NA NA NA NA ...
>>>> $ o   : num  133 133 133 133 133 ...
>>>> $ p   : num  107 107 107 107 107 ...
>>>>
>>>> ----------------------------------------------
>>>> David L Carlson
>>>> Associate Professor of Anthropology
>>>> Texas A&M University
>>>> College Station, TX 77843-4352
>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>> project.org] On Behalf Of Pietro
>>>>> Sent: Monday, March 18, 2013 1:57 PM
>>>>> To: Berend Hasselman
>>>>> Cc: r-help at stat.math.ethz.ch
>>>>> Subject: Re: [R] data.frame with NA
>>>>>
>>>>> Yes, it's true Berend!
>>>>>
>>>>> What i do is simply use read.xlsx  function
>>>>>
>>>>> db <- read.xlsx2("c:/mydb.xlsx",1,as.data.frame=T)
>>>>>
>>>>> This is excel file i use:
>>>>> http://dl.dropbox.com/u/102669/mydb.xlsx
>>>>>
>>>>> I can't find  a way to import as numeric.
>>>>> My objective is to be able to work (in R) with my NA's
>>>>>
>>>>>
>>>>> At 18.46 18/03/2013, Berend Hasselman wrote:
>>>>>
>>>>>> On 18-03-2013, at 16:49, Pete <freerisk3 at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> I have this little data.frame
>>>>>>>
>>>>>>> http://dl.dropbox.com/u/102669/nanotna.rdata
>>>>>>>
>>>>>>> Two column contains NA, so the best thing to do is use
>> na.locf
>>>>>> function (with
>>>>>>> fromLast = T)
>>>>>>>
>>>>>>> But locf function doesn't work because NA in my data.frame
>> are
>>>>>> not recognized as
>>>>>>> real NA.
>>>>>>>
>>>>>>> Is there a way to substitute fake NA with real NA? In this
>> case
>>>>>> na.locf function
>>>>>>> should work
>>>>>>>
>>>>>>
>>>>>> Your data are all characters. Do
>>>>>>
>>>>>> str(db)
>>>>>>
>>>>>> to see that. What is probably supposed to be numeric is also
>>>>> character,
>>>>>> Somehow you have managed to read in data that R thinks is all
>> chr.
>>>>>> Your NA are "NA" in reality: a character string "NA".
>>>>>>
>>>>>> You will have to review the method you used to get the data into
>> R.
>>>>>> And make sure that what you want to be numeric is indeed
>> numeric.
>>>>>> Then you can start to think about doing something about the
>> NA's.
>>>>>>
>>>>>> Berend
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-
>> project.org/posting-
>>>>> guide.html
>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA



More information about the R-help mailing list