[R] read.csv fails to read a CSV file from google docs

Philipp Pagel p.pagel at wzw.tum.de
Fri Apr 29 20:27:45 CEST 2011


On Fri, Apr 29, 2011 at 06:19:24PM +0300, Tal Galili wrote:
>
> data_url <- "
> http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
> "
> read.csv(data_url)
> Error in file(file, "rt") : cannot open the connection

I get the same error (R 2.11.1, Debian LINUX) and don't have a
solution. But I did some tests and found the origin of the problem

I can download the file from google with wget but get some interesting
´information in the process:

------------------------------------------------------------
$ wget -v 'http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv'
--2011-04-29 20:07:40--  http://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Resolving spreadsheets0.google.com... 209.85.148.139, 209.85.148.113, 209.85.148.138, ...
Connecting to spreadsheets0.google.com|209.85.148.139|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv [following]
--2011-04-29 20:07:41--  https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv
Connecting to spreadsheets0.google.com|209.85.148.139|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: “pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv.1”

    [ <=>                                                                                                       ] 41          --.-K/s   in 0s      

2011-04-29 20:07:42 (342 KB/s) - “pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&gid=0&output=csv.1” saved [41]
------------------------------------------------------------

The message that caught my attention was the http redirection: "302 Moved
Temporarily".

If you try again with the new url you get this:

> read.csv(url("https://spreadsheets0.google.com/spreadsheet/pub?hl=en&hl=en&key=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE&single=true&g"))
Error in open.connection(file, "rt") : cannot open the connection
In addition: Warning message:
In open.connection(file, "rt") : unsupported URL scheme

?url told me "Note that ‘https://’ connections are not supported."
Case closed, problem unsolved...

Dirty workaround: use system() and wget or whatever command is available on
Windows for this.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/



More information about the R-help mailing list