[R] download/retain text file structure with RCurl/getURL()

David Winsemius dwinsemius at comcast.net
Mon Jan 19 20:52:13 CET 2009


It's a fixed width format, with irregular entries, perhaps something  
along the lines of:

read.fwf(textConnection(txtfile), skip = 8, # skips the header
          widths = <column widths vector>,
          colnames= <colnames> ,
          nrows=48 )    #drops the trailing summary text

perhaps :

         widths = c(2, -1, 1, -1 ,4, -1, 3   .... the rest  # the -col  
entries drop the white-space
         names = c("year","card", "Jan.date", "Jan.dep"          .....  
the rest

Just the first few columns seem to come in acceptably, although the  
lines with all NA's will need to be deleted:
 > read.fwf(textConnection(txtfile), skip = 8, # skips the header
+         widths = c(2, -1, 1, -1 ,4, -1, 3),  # the -col entries drop  
the white-space
+         col.names = c("year","card", "Jan.date", "Jan.dep"),     
nrows=48 )
    year card Jan.date Jan.dep
1    61    1     E/ST      NA
2    62    1     E/ST      NA
3    63    1     K/31      15
4    64    1     K/30      12
5    NA   NA     <NA>      NA
6    65    1     E/ST      NA
7    66    1     1/07      17
8    67    1     E/ST      NA
9    68    1     K/28      12
10   69    1     K/31      22
11   NA   NA     <NA>      NA
12   70    1     K/30      16
13   71    1     K/29      28
14   72    1     K/28      32
15   73    1     1/02      16
snip
-- 
David Winsemius

On Jan 19, 2009, at 1:26 PM, zack holden wrote:

>
> Dear list,
>
> I'm trying to download a text file directly from the internet using  
> the RCurl package and the command getURL. Duncan Lang graciously  
> helped me solve the first step in this problem using the following  
> command:
>
> #################
> txtfile <- getURL('ftp://ftp.wcc.nrcs.usda.gov/data/snow/snow_course/table/history/idaho/13e19.txt' 
> ,
> ftp.use.epsv = FALSE)
> #################
>
> This brings the text file into R in a single long character string.  
> I've spent many hours now trying to bring this text file into R into  
> a sensible form. I've tried every variant of different commands in  
> getURL help file, as well as different
> strsplit() commands to try to break this character string into a  
> sensible rows and columns, to no avail.
>
> Can anyone suggest a solution for doing this? I suspect there is a  
> getURL command I'm missing. Alternatively, do I really have to break  
> this long character string into rows and columns that I can then  
> assemble into a table?
>
> I'd be grateful for any advice.
>
> Thanks in advance,
>
> Zack
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list