[R] Journal Articles that Have Used R

(Ted Harding) Ted.Harding at manchester.ac.uk
Sun Jun 7 16:01:14 CEST 2009


I think the reason Google will not find it is that, in the Journal
website, the R files (and the names of the article directories that
might contain them, such as journal.sjdm.org/8210/ -- see below)
are not directly pointed to by any index.html or any
"<href> ... </href>" in the website, as far as I can see. This would
be why 'wget' cannot find them in HTTP mode, and it would prevent
Google being led to them.

On the other hand, if one knows the name of a directory, then
a wget on that directory will assemble its list of contents into
an "index.html" file on the local machine, from which the names
of any ".R" files can be extracted with a bit of greppery.

For example,
  wget http://journal.sjdm.org/8210/
creates a local file "index.html", and then
  grep '[.]R' index.html
outputs:

<tr><td valign="top"><img src="/icons/unknown.gif"
  alt="[  ]"></td><td><a href="probs.R">probs.R</a></td><td
  align="right">09-Dec-2008 14:37
  </td><td align="right">1.0K</td></tr>

<tr><td valign="top"><img src="/icons/unknown.gif"
  alt="[  ]"></td><td><a href="test.R">test.R</a></td><td
  align="right">23-May-2008 05:46
  </td><td align="right">251 </td></tr>

thus revealing the two R files "probs.R" and "test.R" which are there.
Then a bit of seddery (or the like) could probably extract just the
filenames, by looking for *.R between > and <.

However, the key to the whole thing is knowing what the numerical
directory names (such as "8210) are. The only way I've found to do
this automatically is to download the whole site (Linux commands):

  mkdir sjdm
  cd sjdm
  wget -r -k -np -nH http://journal.sjdm.org/

extract the numeric directory-names with (e.g.):

  find . -type d -name '[0-9]*[0-9]' -print

and then work through the results of this with directory-specific
wget's as before.

This all seems to be overkill, however! Much easier if the site
would accept FTP.
Ted.


On 07-Jun-09 11:45:19, Gabor Grothendieck wrote:
> The fact that the search did find two files suggests that
> it works but the problem may be that google has just not
> indexed those other files.  Try entering the url for one of
> them into google and google still does not find it.
> http://journal.sjdm.org/8210/test.R
> 
> 
> On Sun, Jun 7, 2009 at 7:37 AM, Ted
> Harding<Ted.Harding at manchester.ac.uk> wrote:
>> On 07-Jun-09 10:56:25, Gabor Grothendieck wrote:
>>> Try this:
>>> site:journal.sjdm.org filetype:R
>>
>> When I enter that into Google, I got only the following two hits:
>>
>> _#
>> _#!/usr/bin/Rscript --vanilla # input is a pre-made list of files ...
>> _#!/usr/bin/Rscript --vanilla # input is a pre-made list of files
>> ending
>> _in html called ../htmlist # (see below). This is easily modified. ...
>> _journal.sjdm.org/RePEc/rss/rss.R - Cached - Similar pages
>>
>> _#
>> _#!/usr/bin/Rscript --vanilla --verbose # script to convert RePEc ...
>> _#!/usr/bin/Rscript --vanilla --verbose # script to convert
>> RePEc-style
>> _rdf files (ReDIFF) to DOAJ-type xml files # usage: oai.R [file] #
>> where
>> _[file] is a ...
>> _journal.sjdm.org/RePEc/rss/oai.R - Cached - Similar pages
>>
>> none of which is what Jonathan os looking for (and the "Similar pages"
>> links are a waste of time).
>>
>> In "regexp language", what he is looking for is
>>
>> _http://journal.sjdm.org/[0:9]+/*.R
>>
>> of which there are several instances on the site, for example
>>
>> _http://journal.sjdm.org/8210/
>>
>> shows
>>
>> _ jdm8210.html _ _13-Dec-2008 1
>> _ jdm8210.pdf _ _ 13-Dec-2008 11:18 _ _ _ 102K
>> _ jdm8210.tex _ _ 13-Dec-2008 11:18 _ _ _ 27K
>> _ jdm8210001.gif _09-Dec-2008 14:38 _ _ _ 11K
>> _ probs.R _ _ _ _ 09-Dec-2008 14:37 _ _ _ 1.0K
>> _ test.R _ _ _ _ _23-May-2008 05:46 _ _ _ 251
>> _ ttest.csv _ _ _ 22-May-2008 21:31 _ _ _ 2.6K1:18 _ _ _ _31K
>>
>> so there are two ".R"files there (8210 is the number of an article
>> in the Journal). Other similar directories mAy or may not have
>> ".R" files -- for example
>> _http://journal.sjdm.org/8816/
>> has none.
>>
>> The problem is that utilities like wget won;t work in this case,
>> since HTTP doesn't accept "wild cards", unlike FTP; but the journal
>> site doesn't accept FTP ... !!
>>
>> It's an intriguing problem, and I'm seeking advice amongst my Linux
>> acquaintances about it. I sonehow doubt that there is a solution ...
>>
>> Ted.
>>> On Sat, Jun 6, 2009 at 6:39 PM, Jonathan Baron<baron at psych.upenn.edu>
>>> wrote:
>>>> I also use R to redraw figures for the journal I edit (below), when
>>>> the authors cannot produce usable graphics (about 50% of the author
>>>> who try).
>>>>
>>>> Unfortunately, I cannot find a way to search for just the R files.
>>>> They are all http://journal.sjdm.org/*/*.R
>>>> where * is the number of the article. _But Google, to my knowledge
>>>> will not deal with wildcards like this.
>>>>
>>>> Jon
>>>> --
>>>> Jonathan Baron, Professor of Psychology, University of Pennsylvania
>>>> Home page: http://www.sas.upenn.edu/~baron
>>>> Editor: Judgment and Decision Making (http://journal.sjdm.org)
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --------------------------------------------------------------------
>> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>> Fax-to-email: +44 (0)870 094 0861
>> Date: 07-Jun-09 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Time: 12:37:34
>> ------------------------------ XFMail ------------------------------
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jun-09                                       Time: 15:01:10
------------------------------ XFMail ------------------------------




More information about the R-help mailing list