[R] read txt file - date - no space

Thu Aug 2 09:30:53 CEST 2018

Dear all,

I have found and error in the date conversion. Now it looks like:

MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
# change date to real
MyData$date<-as.POSIXct(MyData$date, format="%*m*/%*d*/%Y %H:%M")

After that I apply the PIKAL's suggestions:

aggregate(MyData[,-1], list(day(MyData$date)), mean)

And this is the final results:

 1 -82.43636 -46.12437 -319.2710
2        2 -82.06105 -45.74184 -319.2696
3        3 -82.05527 -45.52650 -319.2416
4        4 -82.03535 -47.59191 -319.2275
5        5 -77.44928 -50.05953 -320.5798
...
31    -86.10234 -47.06247 -340.0968

However, it is not correct.
This because I have not made myself clear about my purpose. As I told you
some days ago, I have a *.csv file with hourly data from 10/21/1998
to 12/31/2016. I would like to compute the daily means. Basically, I would
like to have the mean of the hourly date for each day from 10/21/1998
to 12/31/2016 and not 31 values.

Really really thanks again,
Diego

Diego

On 2 August 2018 at 08:55, Diego Avesani <diego.avesani using gmail.com> wrote:

> Dear
>
> I have check the one of the line that gives me problem. I mean, which give
> NA after R processing. I think that is similar to the others:
>
> 10/12/1998 10:00,0,0,0
> 10/12/1998 11:00,0,0,0
> 10/12/1998 12:00,0,0,0
> 10/12/1998 13:00,0,0,0
> 10/12/1998 14:00,0,0,0
> 10/12/1998 15:00,0,0,0
> 10/12/1998 16:00,0,0,0
> 10/12/1998 17:00,0,0,0
>
> @jim: It seems that you suggestion is focus on reading data from the
> terminal. It is possible to apply it to a *.csv file?
>
> @Pikal: Could it be that there are some date conversion error?
>
> Thanks again,
> Diego
>
>
> Diego
>
>
> On 1 August 2018 at 17:01, jim holtman <jholtman using gmail.com> wrote:
>
>>
>> Try this:
>>
>> > library(lubridate)
>> > library(tidyverse)
>> > input <- read.csv(text = "date,str1,str2,str3
>> + 10/1/1998 0:00,0.6,0,0
>> +                   10/1/1998 1:00,0.2,0.2,0.2
>> +                   10/1/1998 2:00,0.6,0.2,0.4
>> +                   10/1/1998 3:00,0,0,0.6
>> +                   10/1/1998 4:00,0,0,0
>> +                   10/1/1998 5:00,0,0,0
>> +                   10/1/1998 6:00,0,0,0
>> +                   10/1/1998 7:00,0.2,0,0", as.is = TRUE)
>> > # convert the date and add the "day" so summarize
>> > input <- input %>%
>> +   mutate(date = mdy_hm(date),
>> +          day = floor_date(date, unit = 'day')
>> +   )
>> >
>> > by_day <- input %>%
>> +   group_by(day) %>%
>> +   summarise(m_s1 = mean(str1),
>> +             m_s2 = mean(str2),
>> +             m_s3 = mean(str3)
>> +   )
>> >
>> > by_day
>> # A tibble: 1 x 4
>>   day                  m_s1   m_s2  m_s3
>>   <dttm>              <dbl>  <dbl> <dbl>
>> 1 1998-10-01 00:00:00 0.200 0.0500 0.150
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want
>> to do, not how you want to do it.*
>>
>>
>> On Tue, Jul 31, 2018 at 11:54 PM Diego Avesani <diego.avesani using gmail.com>
>> wrote:
>>
>>> Dear all,
>>> I am sorry, I did a lot of confusion. I am sorry, I have to relax and
>>> stat
>>> all again in order to understand.
>>> If I could I would like to start again, without mixing strategy and
>>> waiting
>>> for your advice.
>>>
>>> I am really appreciate you help, really really.
>>> Here my new file, a *.csv file (buy the way, it is possible to attach it
>>> in
>>> the mailing list?)
>>>
>>> date,str1,str2,str3
>>> 10/1/1998 0:00,0.6,0,0
>>> 10/1/1998 1:00,0.2,0.2,0.2
>>> 10/1/1998 2:00,0.6,0.2,0.4
>>> 10/1/1998 3:00,0,0,0.6
>>> 10/1/1998 4:00,0,0,0
>>> 10/1/1998 5:00,0,0,0
>>> 10/1/1998 6:00,0,0,0
>>> 10/1/1998 7:00,0.2,0,0
>>>
>>>
>>> I read it as:
>>> MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
>>>
>>> at this point I would like to have the daily mean.
>>> What would you suggest?
>>>
>>> Really Really thanks,
>>> You are my lifesaver
>>>
>>> Thanks
>>>
>>>
>>>
>>> Diego
>>>
>>>
>>> On 1 August 2018 at 01:01, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
>>> wrote:
>>>
>>> > ... and the most common source of NA values in time data is wrong
>>> > timezones. You really need to make sure the timezone that is assumed
>>> when
>>> > the character data are converted to POSIXt agrees with the data. In
>>> most
>>> > cases the easiest way to insure this is to use
>>> >
>>> > Sys.setenv(TZ="US/Pacific")
>>> >
>>> > or whatever timezone from
>>> >
>>> > OlsonNames()
>>> >
>>> > corresponds with your data. Execute this setenv function before the
>>> > strptime or as.POSIXct() function call.
>>> >
>>> > You can use
>>> >
>>> > MyData[ is.na(MyData$datetime), ]
>>> >
>>> > to see which records are failing to convert time.
>>> >
>>> > [1] https://github.com/jdnewmil/eci298sp2016/blob/master/QuickHowto1
>>> >
>>> > On July 31, 2018 3:04:05 PM PDT, Jim Lemon <drjimlemon using gmail.com>
>>> wrote:
>>> > >Hi Diego,
>>> > >I think the error is due to NA values in your data file. If I extend
>>> > >your example and run it, I get no errors:
>>> > >
>>> > >MyData<-read.table(text="103001930 103001580 103001530
>>> > >1998-10-01 00:00:00 0.6 0 0
>>> > >1998-10-01 01:00:00 0.2 0.2 0.2
>>> > >1998-10-01 02:00:00 0.6 0.2 0.4
>>> > >1998-10-01 03:00:00 0 0 0.6
>>> > >1998-10-01 04:00:00 0 0 0
>>> > >1998-10-01 05:00:00 0 0 0
>>> > >1998-10-01 06:00:00 0 0 0
>>> > >1998-10-01 07:00:00 0.2 0 0
>>> > >1998-10-01 08:00:00 0.6 0 0
>>> > >1998-10-01 09:00:00 0.2 0.2 0.2
>>> > >1998-10-01 10:00:00 0.6 0.2 0.4
>>> > >1998-10-01 11:00:00 0 0 0.6
>>> > >1998-10-01 12:00:00 0 0 0
>>> > >1998-10-01 13:00:00 0 0 0
>>> > >1998-10-01 14:00:00 0 0 0
>>> > >1998-10-01 15:00:00 0.2 0 0
>>> > >1998-10-01 16:00:00 0.6 0 0
>>> > >1998-10-01 17:00:00 0.2 0.2 0.2
>>> > >1998-10-01 18:00:00 0.6 0.2 0.4
>>> > >1998-10-01 19:00:00 0 0 0.6
>>> > >1998-10-01 20:00:00 0 0 0
>>> > >1998-10-01 21:00:00 0 0 0
>>> > >1998-10-01 22:00:00 0 0 0
>>> > >1998-10-01 23:00:00 0.2 0 0
>>> > >1998-10-02 00:00:00 0.6 0 0
>>> > >1998-10-02 01:00:00 0.2 0.2 0.2
>>> > >1998-10-02 02:00:00 0.6 0.2 0.4
>>> > >1998-10-02 03:00:00 0 0 0.6
>>> > >1998-10-02 04:00:00 0 0 0
>>> > >1998-10-02 05:00:00 0 0 0
>>> > >1998-10-02 06:00:00 0 0 0
>>> > >1998-10-02 07:00:00 0.2 0 0
>>> > >1998-10-02 08:00:00 0.6 0 0
>>> > >1998-10-02 09:00:00 0.2 0.2 0.2
>>> > >1998-10-02 10:00:00 0.6 0.2 0.4
>>> > >1998-10-02 11:00:00 0 0 0.6
>>> > >1998-10-02 12:00:00 0 0 0
>>> > >1998-10-02 13:00:00 0 0 0
>>> > >1998-10-02 14:00:00 0 0 0
>>> > >1998-10-02 15:00:00 0.2 0 0
>>> > >1998-10-02 16:00:00 0.6 0 0
>>> > >1998-10-02 17:00:00 0.2 0.2 0.2
>>> > >1998-10-02 18:00:00 0.6 0.2 0.4
>>> > >1998-10-02 19:00:00 0 0 0.6
>>> > >1998-10-02 20:00:00 0 0 0
>>> > >1998-10-02 21:00:00 0 0 0
>>> > >1998-10-02 22:00:00 0 0 0
>>> > >1998-10-02 23:00:00 0.2 0 0",
>>> > >skip=1,stringsAsFactors=FALSE)
>>> > >names(MyData)<-c("date","time","st1","st2","st3")
>>> > >MyData$datetime<-strptime(paste(MyData$date,MyData$time),
>>> > > format="%Y-%m-%d %H:%M:%S")
>>> > >MyData$datetime
>>> > >st1_daily<-by(MyData$st1,MyData$date,mean)
>>> > >st2_daily<-by(MyData$st2,MyData$date,mean)
>>> > >st3_daily<-by(MyData$st3,MyData$date,mean)
>>> > >st1_daily
>>> > >st2_daily
>>> > >st3_daily
>>> > >
>>> > >Try adding na.rm=TRUE to the "by" calls:
>>> > >
>>> > >st1_daily<-by(MyData$st1,MyData$date,mean,na.rm=TRUE)
>>> > >st2_daily<-by(MyData$st2,MyData$date,mean,na.rm=TRUE)
>>> > >st3_daily<-by(MyData$st3,MyData$date,mean,na.rm=TRUE)
>>> > >
>>> > >Jim
>>> > >
>>> > >On Tue, Jul 31, 2018 at 11:11 PM, Diego Avesani
>>> > ><diego.avesani using gmail.com> wrote:
>>> > >> Dear all,
>>> > >>
>>> > >> I have still problem with date.
>>> > >> Could you please tel me how to use POSIXct.
>>> > >> Indeed I have found this command:
>>> > >> timeAverage, but I am not able to convert MyDate to properly date.
>>> > >>
>>> > >> Thank a lot
>>> > >> I hope to no bother you, at least too much
>>> > >>
>>> > >>
>>> > >> Diego
>>> > >>
>>> > >>
>>> > >> On 31 July 2018 at 11:12, Diego Avesani <diego.avesani using gmail.com>
>>> > >wrote:
>>> > >>>
>>> > >>> Dear Jim, Dear all,
>>> > >>>
>>> > >>> thanks a lot.
>>> > >>>
>>> > >>> Unfortunately, I get the following error:
>>> > >>>
>>> > >>>
>>> > >>>  st1_daily<-by(MyData$st1,MyData$date,mean)
>>> > >>> Error in tapply(seq_len(0L), list(`MyData$date` = c(913L, 914L,
>>> > >925L,  :
>>> > >>>   arguments must have same length
>>> > >>>
>>> > >>>
>>> > >>> This is particularly strange. indeed, if I apply
>>> > >>>
>>> > >>>
>>> > >>> mean(MyData$str1,na.rm=TRUE)
>>> > >>>
>>> > >>>
>>> > >>> it works
>>> > >>>
>>> > >>>
>>> > >>> Sorry, I have to learn a lot.
>>> > >>> You are really boosting me
>>> > >>>
>>> > >>> Diego
>>> > >>>
>>> > >>>
>>> > >>> On 31 July 2018 at 11:02, Jim Lemon <drjimlemon using gmail.com> wrote:
>>> > >>>>
>>> > >>>> Hi Diego,
>>> > >>>> One way you can get daily means is:
>>> > >>>>
>>> > >>>> st1_daily<-by(MyData$st1,MyData$date,mean)
>>> > >>>> st2_daily<-by(MyData$st2,MyData$date,mean)
>>> > >>>> st3_daily<-by(MyData$st3,MyData$date,mean)
>>> > >>>>
>>> > >>>> Jim
>>> > >>>>
>>> > >>>> On Tue, Jul 31, 2018 at 6:51 PM, Diego Avesani
>>> > ><diego.avesani using gmail.com>
>>> > >>>> wrote:
>>> > >>>> > Dear all,
>>> > >>>> > I have found the error, my fault. Sorry.
>>> > >>>> > There was an extra come in the headers line.
>>> > >>>> > Thanks again.
>>> > >>>> >
>>> > >>>> > If I can I would like to ask you another questions about the
>>> > >imported
>>> > >>>> > data.
>>> > >>>> > I would like to compute the daily average of the different date.
>>> > >>>> > Basically I
>>> > >>>> > have hourly data, I would like to ave the daily mean of them.
>>> > >>>> >
>>> > >>>> > Is there some special commands?
>>> > >>>> >
>>> > >>>> > Thanks a lot.
>>> > >>>> >
>>> > >>>> >
>>> > >>>> > Diego
>>> > >>>> >
>>> > >>>> >
>>> > >>>> > On 31 July 2018 at 10:40, Diego Avesani <
>>> diego.avesani using gmail.com>
>>> > >>>> > wrote:
>>> > >>>> >>
>>> > >>>> >> Dear all,
>>> > >>>> >> I move to csv file because originally the date where in csv
>>> > >file.
>>> > >>>> >> In addition, due to the fact that, as you told me, read.csv is
>>> a
>>> > >>>> >> special
>>> > >>>> >> case of read.table, I prefer start to learn from the simplest
>>> > >one.
>>> > >>>> >> After that, I will try also the *.txt format.
>>> > >>>> >>
>>> > >>>> >> with read.csv, something strange happened:
>>> > >>>> >>
>>> > >>>> >> This us now the file:
>>> > >>>> >>
>>> > >>>> >> date,st1,st2,st3,
>>> > >>>> >> 10/1/1998 0:00,0.6,0,0
>>> > >>>> >> 10/1/1998 1:00,0.2,0.2,0.2
>>> > >>>> >> 10/1/1998 2:00,0.6,0.2,0.4
>>> > >>>> >> 10/1/1998 3:00,0,0,0.6
>>> > >>>> >> 10/1/1998 4:00,0,0,0
>>> > >>>> >> 10/1/1998 5:00,0,0,0
>>> > >>>> >> 10/1/1998 6:00,0,0,0
>>> > >>>> >> 10/1/1998 7:00,0.2,0,0
>>> > >>>> >> 10/1/1998 8:00,0.6,0.2,0
>>> > >>>> >> 10/1/1998 9:00,0.2,0.4,0.4
>>> > >>>> >> 10/1/1998 10:00,0,0.4,0.2
>>> > >>>> >>
>>> > >>>> >> When I apply:
>>> > >>>> >> MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
>>> > >>>> >>
>>> > >>>> >> this is the results:
>>> > >>>> >>
>>> > >>>> >> 10/1/1998 0:00    0.6    0.00    0.0 NA
>>> > >>>> >> 2        10/1/1998 1:00    0.2    0.20    0.2 NA
>>> > >>>> >> 3        10/1/1998 2:00    0.6    0.20    0.4 NA
>>> > >>>> >> 4        10/1/1998 3:00    0.0    0.00    0.6 NA
>>> > >>>> >> 5        10/1/1998 4:00    0.0    0.00    0.0 NA
>>> > >>>> >> 6        10/1/1998 5:00    0.0    0.00    0.0 NA
>>> > >>>> >> 7        10/1/1998 6:00    0.0    0.00    0.0 NA
>>> > >>>> >> 8        10/1/1998 7:00    0.2    0.00    0.0 NA
>>> > >>>> >>
>>> > >>>> >> I do not understand why.
>>> > >>>> >> Something wrong with date?
>>> > >>>> >>
>>> > >>>> >> really really thanks,
>>> > >>>> >> I appreciate a lot all your helps.
>>> > >>>> >>
>>> > >>>> >> Diedro
>>> > >>>> >>
>>> > >>>> >>
>>> > >>>> >> Diego
>>> > >>>> >>
>>> > >>>> >>
>>> > >>>> >> On 31 July 2018 at 01:25, MacQueen, Don <macqueen1 using llnl.gov>
>>> > >wrote:
>>> > >>>> >>>
>>> > >>>> >>> Or, without removing the first line
>>> > >>>> >>>   dadf <- read.table("xxx.txt", stringsAsFactors=FALSE,
>>> skip=1)
>>> > >>>> >>>
>>> > >>>> >>> Another alternative,
>>> > >>>> >>>    dadf$datetime <- as.POSIXct(paste(dadf$V1,dadf$V2))
>>> > >>>> >>> since the dates appear to be in the default format.
>>> > >>>> >>> (I generally prefer to work with datetimes in POSIXct class
>>> > >rather
>>> > >>>> >>> than
>>> > >>>> >>> POSIXlt class)
>>> > >>>> >>>
>>> > >>>> >>> -Don
>>> > >>>> >>>
>>> > >>>> >>> --
>>> > >>>> >>> Don MacQueen
>>> > >>>> >>> Lawrence Livermore National Laboratory
>>> > >>>> >>> 7000 East Ave., L-627
>>> > >>>> >>> Livermore, CA 94550
>>> > >>>> >>> 925-423-1062
>>> > >>>> >>> Lab cell 925-724-7509
>>> > >>>> >>>
>>> > >>>> >>>
>>> > >>>> >>>
>>> > >>>> >>> On 7/30/18, 4:03 PM, "R-help on behalf of Jim Lemon"
>>> > >>>> >>> <r-help-bounces using r-project.org on behalf of
>>> > >drjimlemon using gmail.com>
>>> > >>>> >>> wrote:
>>> > >>>> >>>
>>> > >>>> >>>     Hi Diego,
>>> > >>>> >>>     You may have to do some conversion as you have three
>>> fields
>>> > >in
>>> > >>>> >>> the
>>> > >>>> >>>     first line using the default space separator and five
>>> > >fields in
>>> > >>>> >>>     subsequent lines. If the first line doesn't contain any
>>> > >important
>>> > >>>> >>> data
>>> > >>>> >>>     you can just delete it or replace it with a meaningful
>>> > >header
>>> > >>>> >>> line
>>> > >>>> >>>     with five fields and save the file under another name.
>>> > >>>> >>>
>>> > >>>> >>>     It looks as thought you have date-time as two fields. If
>>> > >so, you
>>> > >>>> >>> can
>>> > >>>> >>>     just read the first field if you only want the date:
>>> > >>>> >>>
>>> > >>>> >>>     # assume you have removed the first line
>>> > >>>> >>>     dadf<-read.table("xxx.txt",stringsAsFactors=FALSE
>>> > >>>> >>>     dadf$date<-as.Date(dadf$V1,format="%Y-%m-%d")
>>> > >>>> >>>
>>> > >>>> >>>     If you want the date/time:
>>> > >>>> >>>
>>> > >>>> >>>
>>> > >dadf$datetime<-strptime(paste(dadf$V1,dadf$V2),format="%Y-%m-%d
>>> > >>>> >>> %H:%M:%S")
>>> > >>>> >>>
>>> > >>>> >>>     Jim
>>> > >>>> >>>
>>> > >>>> >>>     On Tue, Jul 31, 2018 at 12:29 AM, Diego Avesani
>>> > >>>> >>> <diego.avesani using gmail.com> wrote:
>>> > >>>> >>>     > Dear all,
>>> > >>>> >>>     >
>>> > >>>> >>>     > I am dealing with the reading of a *.txt file.
>>> > >>>> >>>     > The txt file the following shape:
>>> > >>>> >>>     >
>>> > >>>> >>>     > 103001930 103001580 103001530
>>> > >>>> >>>     > 1998-10-01 00:00:00 0.6 0 0
>>> > >>>> >>>     > 1998-10-01 01:00:00 0.2 0.2 0.2
>>> > >>>> >>>     > 1998-10-01 02:00:00 0.6 0.2 0.4
>>> > >>>> >>>     > 1998-10-01 03:00:00 0 0 0.6
>>> > >>>> >>>     > 1998-10-01 04:00:00 0 0 0
>>> > >>>> >>>     > 1998-10-01 05:00:00 0 0 0
>>> > >>>> >>>     > 1998-10-01 06:00:00 0 0 0
>>> > >>>> >>>     > 1998-10-01 07:00:00 0.2 0 0
>>> > >>>> >>>     >
>>> > >>>> >>>     > If it is possible I have a coupe of questions, which
>>> will
>>> > >sound
>>> > >>>> >>> stupid but
>>> > >>>> >>>     > they are important to me in order to understand ho R
>>> deal
>>> > >with
>>> > >>>> >>> file
>>> > >>>> >>> or date.
>>> > >>>> >>>     >
>>> > >>>> >>>     > 1) Do I have to convert it to a *csv file?
>>> > >>>> >>>     > 2) Can a deal with space and not ","
>>> > >>>> >>>     > 3) How can I read date?
>>> > >>>> >>>     >
>>> > >>>> >>>     > thanks a lot to all of you,
>>> > >>>> >>>     > Thanks
>>> > >>>> >>>     >
>>> > >>>> >>>     >
>>> > >>>> >>>     > Diego
>>> > >>>> >>>     >
>>> > >>>> >>>     >         [[alternative HTML version deleted]]
>>> > >>>> >>>     >
>>> > >>>> >>>     > ______________________________________________
>>> > >>>> >>>     > R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>>> > >more,
>>> > >>>> >>> see
>>> > >>>> >>>     > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > >>>> >>>     > PLEASE do read the posting guide
>>> > >>>> >>> http://www.R-project.org/posting-guide.html
>>> > >>>> >>>     > and provide commented, minimal, self-contained,
>>> > >reproducible
>>> > >>>> >>> code.
>>> > >>>> >>>
>>> > >>>> >>>     ______________________________________________
>>> > >>>> >>>     R-help using r-project.org mailing list -- To UNSUBSCRIBE and
>>> > >more, see
>>> > >>>> >>>     https://stat.ethz.ch/mailman/listinfo/r-help
>>> > >>>> >>>     PLEASE do read the posting guide
>>> > >>>> >>> http://www.R-project.org/posting-guide.html
>>> > >>>> >>>     and provide commented, minimal, self-contained,
>>> > >reproducible
>>> > >>>> >>> code.
>>> > >>>> >>>
>>> > >>>> >>>
>>> > >>>> >>
>>> > >>>> >
>>> > >>>
>>> > >>>
>>> > >>
>>> > >
>>> > >______________________________________________
>>> > >R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > >https://stat.ethz.ch/mailman/listinfo/r-help
>>> > >PLEASE do read the posting guide
>>> > >http://www.R-project.org/posting-guide.html
>>> > >and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> > --
>>> > Sent from my phone. Please excuse my brevity.
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>

	[[alternative HTML version deleted]]