[R] Reading Text files from UK Met Office into R again...

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Wed Oct 12 17:02:26 CEST 2022


First one needs to remove the extraneous line-ends that you created by using an editor that inserts those line-ends (or perhaps it was your mail-client that added them because you failed to post in plain-text. I removed those files "by hand" and then created a text "file".

txt <- "2015-01-01 00:00, 03002, WMO, SYNOP, 1, 12, 1011, 4, 7, 200, 18, 82, , , 8, , , , , 100, 450, 1005.4, 5, , 102, 4, , 129, , , , , , , , 8.7, 7.5, 8.1,1003.6, , , , , , , 1, 1, 1, , , 1, , , , , 1, 1, 1, 1, 1, 1, , 1, , 1, 1, , , , , , , , , , 1, , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , ,
2015-01-01 00:00, 03005, WMO, SYNOP, 1, 9, 1011, 4, 1, 210, 26, 62, 8, 6, ,8, 8, , , 8, 30, 700, 1006, 1, 8, 54, 7, 6, 105, , , , , , , , 8.6, 7.3, 8, 996.1, , 01, , , , , 1, 1, 1, 1, 1, 1, 1, , , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, , , , , , , , 1, , , , , 2014-12-31 23:55, 0, , , , , , , , , , , , K, , , , , 91.7, A, , , 0, 1
2015-01-01 00:00, 03006, WMO, SYNOP, 1, 10, 1011, 4, 6, 210, 23, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 1, 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 2014-12-31 23:53, 0, , , , , , , , , , , , , , , , , , A, , , ,
2015-01-01 00:00, 03010, WMO, SYNOP, 1, 17, 1011, 4, 6, 230, 21, , , , , , , , , , , 1006.1, , , , , , , , , , , , , , 9.4, 6.2, 7.9, , , , , , , , 1, 1, , , , , , , , , , , 1, 1, 1, 1, , , , , , , , , , , , , , , , , , , ,"

# Then use `count.fields`
count.fields(file=textConnection(txt))
[1] 104 106 105  81

# So i'm guessing you arbitrarily snipped in the middl of own of the text lines

dat <- read.table(text=txt, sep=",", fill=TRUE, row.names=NULL, head=FALSE)
 str(dat)
'data.frame':	4 obs. of  105 variables:
 $ V1  : chr  "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00" "2015-01-01 00:00"
 $ V2  : int  3002 3005 3006 3010
 $ V3  : chr  " WMO" " WMO" " WMO" " WMO"
 $ V4  : chr  " SYNOP" " SYNOP" " SYNOP" " SYNOP"
 $ V5  : int  1 1 1 1
 $ V6  : int  12 9 10 17
 $ V7  : int  1011 1011 1011 1011
 $ V8  : int  4 4 4 4
 $ V9  : int  7 1 6 6
 $ V10 : int  200 210 210 230
 $ V11 : int  18 26 23 21
 $ V12 : int  82 62 NA NA
 $ V13 : int  NA 8 NA NA
 $ V14 : int  NA 6 NA NA
 $ V15 : int  8 NA NA NA
 $ V16 : int  NA 8 NA NA
 $ V17 : int  NA 8 NA NA
 $ V18 : logi  NA NA NA NA
 $ V19 : logi  NA NA NA NA
 $ V20 : int  100 8 NA NA
 #snipped about 80 lines .......
 $ V99 : num  91.7 NA NA NA
  [list output truncated]


ALWAYS use a programming editor and always post in plain-text.

-- David.

> On Oct 9, 2022, at 4:50 PM, Ivan Krylov <krylov.r00t using gmail.com> wrote:
> 
> On Sun, 9 Oct 2022 12:01:27 +0100
> Nick Wray <nickmwray using gmail.com> wrote:
> 
>> Error in read.table("midas_wxhrly_201501-201512.txt", fill = T) :
>>  duplicate 'row.names' are not allowed
> 
> Since you don't pass the `header` argument, I think that the automatic
> header detection is here at play. This is what ?read.table has to say
> about row names:
> 
>>> If there is a header and the first row contains one fewer field than
>>> the number of columns, the first column in the input is used for the
>>> row names.  Otherwise if ‘row.names’ is missing, the rows are
>>> numbered.
> 
> Perhaps the "one fewer field in the header than the number of columns"
> condition is true for files after 2010? I'm too lazy to sign up for a
> CEDA account and I'm not sure I'd be given access to hourly datasets
> anyway.
> 
> If this is the reason for the failure (first column used as rownames()
> and turns out to be non-unique), there's an easy way to fix that:
> 
>>> Using ‘row.names = NULL’ forces row numbering.
> 
> I don't see a header in your example. If there's actually no header
> containing column names, passing `header = FALSE` will both prevent the
> error and avoid eating the first line of the file.
> 
> -- 
> Best regards,
> Ivan
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list