[R] Function gutenberg_download in the gutenbergr package

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Jan 24 16:59:04 CET 2018


I have never used that package, but it seems obvious to me that you need to "reflect" on the meaning of the word "mirror". There is no reason to assume that a site hosting a mirror of the CRAN archive is also going to host a mirror of Project Gutenberg [1].

If, after you know you are giving reasonable inputs the package does not seem to work as designed, please remember that contributed packages have maintainers [2] and not all of them subscribe to r-help.

[1] https://www.gutenberg.org/MIRRORS.ALL
[2] ?maintainer
-- 
Sent from my phone. Please excuse my brevity.

On January 23, 2018 11:23:06 PM PST, Patrick Connolly <p_connolly at slingshot.co.nz> wrote:
>
>I've been working through https://www.tidytextmining.com/tidytext.html
>wherein everything worked until I got to this part in section 1.5
>
>> hgwells <- gutenberg_download(c(35, 36, 5230, 159))
>Determining mirror for Project Gutenberg from
>http://www.gutenberg.org/robot/harvest
>Error in open.connection(con, "rb") : 
>  Failed to connect to www.gutenberg.org port 80: Connection timed out
>
>Which indicates the problem is at the very start:
>
>  if (is.null(mirror)) {
>    mirror <- gutenberg_get_mirror(verbose = verbose)
>  }
>
>The documentation for gutenberg_get_mirror indicates there's nothing
>different I could set.
>
>So I tried specifying my usual mirror:
>
>> hgwells <- gutenberg_download(c(1260, 768, 969, 9182, 767), mirror =
>"http://cran.stat.auckland.ac.nz")
>Error in read_zip_url(full_url) : could not find function
>"read_zip_url"
>> 
>
>Which is, indeed, strange since according to 
>
>> help.search("read_zip_url")
>Help files with alias or concept or title matching ‘read_zip_url’ using
>regular expression matching:
>
>
>gutenbergr::read_zip_url
>                        Read a file from a .zip URL
>  Aliases: read_zip_url
>
>[...]
>
>And according to 
>library(help = "gutenbergr")
>
>[...]
>Index:
>
>gutenberg_authors       Metadata about Project Gutenberg authors
>gutenberg_download      Download one or more works using a Project
>                        Gutenberg ID
>gutenberg_get_mirror    Get the recommended mirror for Gutenberg files
>gutenberg_metadata      Gutenberg metadata about each work
>gutenberg_strip         Strip header and footer content from a Project
>                        Gutenberg book
>gutenberg_subjects      Gutenberg metadata about the subject of each
>                        work
>gutenberg_works         Get a filtered table of Gutenberg work metadata
>read_zip_url            Read a file from a .zip URL
>
>[...]
>
>However, when I look at the list for that part of the search(), there
>is no read_zip_url but all the rest of that list are present.  So it's
>not surprising that it isn't found.  But it puzzles me that it is not
>there.
>
>Ideas as to where I should proceed gratefully appreciated.
>
>
>> sessionInfo()
>R version 3.4.2 (2017-09-28)
>Platform: x86_64-pc-linux-gnu (64-bit)
>Running under: Ubuntu 14.04.5 LTS
>
>Matrix products: default
>BLAS: /home/hrapgc/local/R-3.4.2/lib/libRblas.so
>LAPACK: /home/hrapgc/local/R-3.4.2/lib/libRlapack.so
>
>locale:
> [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8    
> [5] LC_MONETARY=en_NZ.UTF-8    LC_MESSAGES=en_NZ.UTF-8   
> [7] LC_PAPER=en_NZ.UTF-8       LC_NAME=C                 
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
>[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       
>
>attached base packages:
>[1] grDevices utils     stats     graphics  methods   base     
>
>other attached packages:
>[1] sos_2.0-0          brew_1.0-6         gutenbergr_0.1.3  
>ggplot2_2.2.1     
>[5] stringr_1.2.0      bindrcpp_0.2       dplyr_0.7.4       
>janeaustenr_0.1.5 
>[9] tidytext_0.1.6     FactoMineR_1.38    readxl_1.0.0       tm_0.7-3  
>       
>[13] NLP_0.1-11         wordcloud_2.5      RColorBrewer_1.1-2
>lattice_0.20-35   
>
>loaded via a namespace (and not attached):
> [1] Rcpp_0.12.13         cellranger_1.1.0     compiler_3.4.2      
> [4] plyr_1.8.4           bindr_0.1            tokenizers_0.1.4    
> [7] tools_3.4.2          gtable_0.2.0         tibble_1.3.4        
>[10] nlme_3.1-131         pkgconfig_2.0.1      rlang_0.1.2         
>[13] Matrix_1.2-11        psych_1.7.8          curl_3.0            
>[16] parallel_3.4.2       xml2_1.1.1           cluster_2.0.6       
>[19] hms_0.3              flashClust_1.01-2    grid_3.4.2          
>[22] scatterplot3d_0.3-40 glue_1.1.1           ellipse_0.3-8       
>[25] R6_2.2.2             foreign_0.8-69       readr_1.1.1         
>[28] purrr_0.2.4          tidyr_0.7.2          reshape2_1.4.2      
>[31] magrittr_1.5         scales_0.5.0         SnowballC_0.5.1     
>[34] MASS_7.3-47          leaps_3.0            assertthat_0.2.0    
>[37] mnormt_1.5-5         colorspace_1.3-2     labeling_0.3        
>[40] stringi_1.1.5        lazyeval_0.2.1       munsell_0.4.3       
>[43] slam_0.1-42          broom_0.4.2         
>> 
>
>-- 
>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>  
>   ___    Patrick Connolly   
> {~._.~}                   Great minds discuss ideas    
> _( Y )_  	         Average minds discuss events 
>(:_~*~_:)                  Small minds discuss people  
> (_)-(_)  	                      ..... Eleanor Roosevelt
>	  
>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list