[R] Downloading a directory of text files into R

Ben Bolker bbo|ker @end|ng |rom gm@||@com
Wed Jul 26 00:13:02 CEST 2023


  Where is readtext() from?

   Some combination of scraping

http://home.brisnet.org.au/~bgreen/Data/Hanson1/

and

http://home.brisnet.org.au/~bgreen/Data/Hanson2/


to recover the required file names:

library(rvest)
read_html("http://home.brisnet.org.au/~bgreen/Data/Hanson1/") |> 
html_element("body") |> html_element("table") |> html_table()

will get you most of the way there ...

then an lapply() or for loop to download all the bits ...?



On 2023-07-25 6:06 p.m., Bob Green wrote:
> Hello,
> 
> I am seeking advice as to how I can download the 833 files from this 
> site:"http://home.brisnet.org.au/~bgreen/Data/"
> 
> I want to be able to download them to perform a textual analysis.
> 
> If the 833 files, which are in a Directory with two subfolders were on 
> my computer I could read them through readtext. Using readtext I get the 
> error:
> 
>  > x = readtext("http://home.brisnet.org.au/~bgreen/Data/*")
> Error in download_remote(file, ignore_missing, cache, verbosity) :
>    Remote URL does not end in known extension. Please download the file 
> manually.
> 
>  > x = readtext("http://home.brisnet.org.au/~bgreen/Data/Dir/()")
> Error in download_remote(file, ignore_missing, cache, verbosity) :
>    Remote URL does not end in known extension. Please download the file 
> manually.
> 
> Any suggestions are appreciated.
> 
> Bob
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
(Acting) Graduate chair, Mathematics & Statistics
 > E-mail is sent at my convenience; I don't expect replies outside of 
working hours.



More information about the R-help mailing list