[R] publication statistics from Web of Science

baptiste auguie ba208 at exeter.ac.uk
Wed Jan 14 14:44:41 CET 2009


Dear list,

This is a bit of an off-topic question, but I'm hoping to get some  
advice from more experienced people. I've used the website "Web of  
Science" to manually collect publication counts responding to several  
keywords as a function of date, since the 1960s.

http://apps.isiknowledge.com/RAMore.do?product=UA&search_mode=&SID=P1g9lFJp9@ejA6PJHKD&qid=1&ra_mode=more&ra_name=PublicationYear&db_id=UGB&viewType=raMore

This is a really long and error-prone process. Once the data was  
collected I rearranged it in a form R could read (see example in the  
end), this step wasn't too bad. Finally, I plotted histograms to show  
the temporal trends.

I have two questions:

- Is there a package or external tool to facilitate the collection of  
data from this kind of online search tool? I could not find any public  
API for this website, although some tools like Endnote clearly access  
the database somehow. I'd be very grateful for any pointer.

- I feel like the display and choice of search terms is very arbitrary  
and subjective. Any general advice on how to present this data better  
is most welcome. (I should mention that i'd rather not  involve any  
complicated statistical analysis, I only want to make sure that the  
presentation is not horribly biased).


Best regards,

baptiste


statistics <- list(list(values=read.table(textConnection("
date count
2007 600
2006 588
2008 555
2005 430
2004 418
2003 334
2002 277
2001 239
2000 226
1997 184
1999 184
1998 182
1996 129
1995 108
1994 92
1993 67
1992 53
1991 47
1990 37
1989 14
1988 11
1983 10
1987 7
1985 6
1986 6
1981 5
1984 5
1979 4
1982 4
2009 3
1971 2
1933 1
1973 1
1974 1
1977 1
1978 1
1980 1"), head=T),type=1, cumSum=4833, search="photonics"),
list(values=read.table(textConnection("
date count
2008 129
2007 92
2006 50
2005 26
2004 15
2003 4
1972 1
2001 1
2002 1"), head=T),type=1, cumSum=319, search="plasmonics"),
list(values=read.table(textConnection("
date count
2008 3207
2007 3105
2006 2666
2005 2323
2004 1910
2003 1552
2002 1372
2001 1292
2000 1095
1999 992
1998 863
1997 771
1996 643
1995 484
1993 418
1994 407
1992 345
1991 321
1990 120
1989 91
1988 82
1987 78
1981 77
1986 73
1983 72
1978 69
1979 68
1985 66
1976 63
1975 62
1980 59
1984 54
1982 52
1973 50
1977 50
1972 46
1974 43
1971 38
1969 28
1970 28
2009 26
1968 18
1967 11
1966 8
1962 5
1963 4
1900 3
1960 3
1961 3
1948 2
1912 1
1949 1
1950 1
1953 1
1954 1
1959 1
1964 1
1965 1"), head=T),type=1, cumSum=25226, search="plasmonics+ plasmon"),
list(values=read.table(textConnection("
date count
2008 2716
2007 2640
2006 2257
2005 1991
2004 1625
2003 1302
2002 1129
2001 1056
2000 862
1999 814
1998 650
1997 574
1996 427
1995 338
1994 272
1993 260
1991 187
1992 176
1990 62
1989 51
1981 41
1988 41
1987 36
1986 32
1983 30
1980 29
1982 28
1984 28
1985 27
1975 25
1976 23
2009 23
1973 22
1979 22
1972 15
1974 15
1977 13
1971 10
1978 10
1970 9
1968 7
1969 7
1966 1  "), head=T),type=2, cumSum=19883, search="surface plasmon"),
list(values=read.table(textConnection("
date count
2008 324
2007 295
2006 248
2005 220
2004 156
2003 126
2002 113
2000 86
2001 84
1996 66
1999 59
1997 53
1998 53
1993 39
1992 34
1994 29
1995 29
1991 25
1973 2
1987 2
1970 1
1972 1
1978 1
1983 1
1984 1
1989 1
2009 1  "), head=T),type=2, cumSum=2050, search="localised or particle  
plasmon"),
list(values=read.table(textConnection("
date count
2007 196
2008 165
2005 141
2006 141
2003 112
2004 109
2002 83
2001 75
1999 62
2000 51
1998 38
1997 29
1995 13
1996 11
1993 6
1992 4
1994 4
1991 2
2009 2
1990 1"), head=T),type=2, cumSum=1245, search="SPR sensor"),
list(values=read.table(textConnection("
date count
2008 290
2007 225
2006 167
2005 138
2004 101
2003 79
2001 54
2002 51
2000 42
1998 31
1999 30
1997 27
1996 25
1992 20
1995 20
1991 15
1994 14
1993 10
1973 2
1984 2
1990 2
2009 2
1963 1
1972 1
1974 1
1977 1
1978 1
1982 1
1983 1
1988 1
1989 1"), head=T), cumSum=1356,type=1,  search="light scattering gold"))

str(statistics)

treatOne <- function(ml){
	data.frame(ml$values, search= as.character(ml$search))
}
# treatOne(statistics[[1]])

library(plyr)
stats.list <- llply(statistics[-3], treatOne)
stats.df <- do.call(rbind, stats.list)

stats.melt <- melt(stats.df, id.var=c("date", "search"))
str(stats.melt)
# stats.melt <- within(stats.melt, counts=value)

library(ggplot2)

p <- ggplot(data = subset(stats.melt, date>1960 ), mapping = aes(x =  
date,y = value)) +
facet_wrap(~search,ncol=2,  scale="free_y") +
layer(colour="grey",  geom = c( "histogram"), stat = "identity" ) +
scale_y_continuous("number of publications")
p


_____________________________

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag




More information about the R-help mailing list