| Type: | Package |
| Title: | Example Datasets for a Learning Guide to R |
| Version: | 0.1.1 |
| Description: | A largish collection of example datasets, including several classics. Many of these datasets are well suited for regression, classification, and visualization. |
| Encoding: | UTF-8 |
| Depends: | R (≥ 2.10) |
| Suggests: | ggplot2 |
| LazyData: | false |
| License: | CC0 |
| RoxygenNote: | 6.1.1 |
| NeedsCompilation: | no |
| Packaged: | 2019-06-18 17:44:42 UTC; remko |
| Author: | Remko Duursma [aut, cre], Jeff Powell [ctb] |
| Maintainer: | Remko Duursma <remkoduursma@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2019-06-19 09:40:03 UTC |
Allometry
Description
This dataset contains measurements of tree dimensions and biomass. Data kindly provided by John Marshall, University of Idaho.
Usage
allometry
Format
A data frame with 63 rows and 5 variables:
speciesfactor The tree species (PSME = Douglas fir, PIMO = Western white pine, PIPO = Ponderosa pine).
diameterdouble Tree diameter at 1.3m above ground (cm).
heightdouble Tree height (m).
leafareadouble Total leaf area (m2)
branchmassdouble Total (oven-dry) mass of branches (kg).
Examples
data(allometry)
with(allometry, plot(diameter, height, pch=19, col=species))
Child anthropometry
Description
Data include measurements of age, foot length, and height for 3898 children. These data are a small subset of many dozens of measurements on the same children, described in detail by Snyder (1977).
Usage
anthropometry
Format
A data frame with 3898 rows and 4 variables:
agedouble Age in years
genderinteger "female" or "male"
foot_lengthinteger Total foot length (mm)
heightdouble Total height (cm)
Source
<http://mreed.umtri.umich.edu/mreed/downloads.html>.
Examples
data(anthropometry)
with(anthropometry, plot(age, foot_length, pch=16, cex=0.5, col=gender))
Cars data
Description
Fuel efficiency, weight, acceleration, and other measurements on 398 cars. The majority
of the data come from American cars (n = 249), and some European (n = 70) and Japanese (n = 79).
Not to be confused with cars data provided by base R, see cars and mtcars.
Usage
automobiles
Format
A data frame with 398 rows and 9 variables:
car_namecharacter Make and model
originfactor 'American', 'European' or 'Japanese'
build_yeardouble Year car was built
fuel_efficiencydouble Liters / 100km
cylindersinteger Nr. of cylinders
engine_volumedouble Engine volume ('displacement') in liters.
horsepowerinteger Engine power (hp)
weightdouble Car weight in kg
accelerationdouble Time to accelerate to 60mph
Source
Data originally hosted on <http://lib.stat.cmu.edu/datasets/>, also used in ISLR (as the 'Auto' dataset). Converted to metric units for use in this package.
Berkeley admissions data, 1973
Description
A well-known example dataset, used as an excellent example for Simpson's Paradox. The Wikipedia page (see source), describes: "The admission figures for the fall of 1973 showed that men applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance. But when examining the individual departments, it appeared that six out of 85 departments were significantly biased against men, whereas only four were significantly biased against women. In fact, the pooled and corrected data showed a 'small but statistically significant bias in favor of women.'"
Usage
berkeley
Format
A data frame with 6 rows and 5 variables:
Departmentinteger University Department, A-F
Admitted_Maleinteger Nr. Admitted male applicants
Denied_Maleinteger Nr. Denied male applicants
Admitted_Femaleinteger Nr. Addmitted female applicants
Denied_Femaleinteger Nr. Denied female applicants.
Source
<https://en.wikipedia.org/wiki/Simpson
A Baboon Named Brunhilda
Description
The observed responses are Geiger counter counts (times 10-4) used to measure the amount of radioactively tagged sulfate drug in the blood of a baboon named Brunhilda after an injection of the drug.
Usage
brunhild
Format
A data frame with 21 rows and 2 variables:
Hoursinteger Hours after drug injection
Sulfatedouble Tagged sulfate concentration in blood
Source
<http://www.statsci.org/data/general/brunhild.html>
Cavitation resistance for Callitris branches
Description
Measurements of so-called 'percent loss conductivity' (PLC) curves on terminal twigs of Callitris trees (a member of the Cupressaceae in Australia). Twigs are subjected to increasingly negative xylem pressure (Psi, included as a positive pressure in MPa), and the loss in conductivity (i.e. the conductivity of water transport in the xylem) is measured.
Usage
callitrishydraulic
Format
A data frame with 31 rows and 3 variables:
Repinteger Replicate - four branches are included.
Psidouble Positive-valued negative xylem water pressure (MPa)
PLCdouble Percent loss conductivity (sometimes < 0)
Examples
data(callitrishydraulic)
with(callitrishydraulic, plot(Psi, PLC, pch=Rep))
Cereal nutrition data - small subset nr1
Description
Small subset nr1 of the Cereals data to practice merging,
see cereals (available are cereal1, cereal2 and cereal3).
Usage
cereal1
Format
An object of class data.frame with 10 rows and 2 columns.
Cereal nutrition data - small subset nr2
Description
Small subset nr1 of the Cereals data to practice merging,
see cereals (available are cereal1, cereal2 and cereal3).
Usage
cereal2
Format
An object of class data.frame with 8 rows and 2 columns.
Cereal nutrition data - small subset nr3
Description
Small subset nr1 of the Cereals data to practice merging,
see cereals (available are cereal1, cereal2 and cereal3).
Usage
cereal3
Format
An object of class data.frame with 6 rows and 2 columns.
Cereal nutrition data
Description
This dataset summarizes 77 different brands of breakfast cereals, including calories, proteins, fats, and so on, and gives a 'rating' that indicates the overall nutritional value of the cereal.
Usage
cereals
Format
A data frame with 77 rows and 13 variables:
Cereal.namecharacter Cereal name
Manufacturerfactor Cereal manufacturer (letter code)
Cold.or.Hotfactor 'C' or 'H'
caloriesinteger
proteininteger
fatinteger
sodiuminteger
fiberdouble
carbodouble
sugarsinteger
potassinteger
vitaminsinteger
ratingdouble Health rating of the cereal (unknown calculation method).
Source
<https://dasl.datadescription.com/datafile/cereals/> (Originally at Statlib CMU).
Choat's Plant Drought Tolerance
Description
Data include a measure of plant drought tolerance (P50, more negative values indicate plant stems can tolerate lower water contents), and mean annual precipitation of the location where the sample was taken. Data are for 115 individual species (species name not included). Data are from original source were simplified for the purpose of this book.
Usage
choat_precipp50
Format
A data frame with 115 rows and 2 variables:
annualprecipinteger Annual rainfall (mm) where the plant was sampled.
P50double The negative water pressure in the xylem at which 50% of stem conductivity is lost. More negative indicates higher tolerance to drought.
Source
Choat B. et al., 2012, Global convergence in the vulnerability of forests to drought, Nature 491, pages 752–755 <https://www.nature.com/articles/nature11688>.
Coweeta tree data
Description
Tree measurements in the Coweeta LTER.
Usage
coweeta
Format
A data frame with 87 rows and 9 variables:
speciesinteger One of 10 tree species
siteinteger Site abbreviation
elevinteger Elevation (m asl)
ageinteger Tree age (yr)
DBHdouble Diameter at breast height (cm)
heightdouble Tree height (m)
folmassdouble Foliage mass (kg)
SLAdouble Specific leaf area (index of leaf thinness) (cm2 g-1)
biomassdouble Total tree biomass
Details
DETAILS
Source
Martin J.G., et al., 1998, Aboveground biomass and nitrogen allocation of ten deciduous southern Appalachian tree species, Canadian Journal of Forest Research 28, 1648-1659.
Dutch election data
Description
Polls for the 12 leading political parties in the Netherlands, leading up to the general election on 12 Sept. 2012. Data are in 'wide' format, with a column for each party. Values are in percentages.
Usage
dutchelection
Format
A data frame with 22 rows and 12 variables:
Datefactor Date of poll (NOTE: has not been converted to Date class)
VVDdouble Vote for this part in percentage.
PvdAdouble Vote for this part in percentage.
PVVdouble Vote for this part in percentage.
CDAdouble Vote for this part in percentage.
SPdouble Vote for this part in percentage.
D66double Vote for this part in percentage.
GLdouble Vote for this part in percentage.
CUdouble Vote for this part in percentage.
SGPdouble Vote for this part in percentage.
PvdDdouble Vote for this part in percentage.
FiftyPlusdouble Vote for this part in percentage.
Source
<http://en.wikipedia.org/wiki/Dutch_general_election,_2012>
Leaf gas exchange at the EucFACE
Description
Measurements of leaf net photosynthesis at the EucFACE experiment, on leaves of different trees growing in ambient and elevated CO$_2$ concentrations. Measurements were repeated four times during 2013 (labelled as Date=A,B,C,D).
Usage
eucface_gasexchange
Format
A data frame with 84 rows and 7 variables:
Datefactor Date label (A-D)
CO2integer CO2 treatment, Amb=ambient, Ele=elevated
Ringinteger One of six plots ('rings') where treatment was applied
Treeinteger Tree number
Photodouble Rate of leaf photosynthesis (mu mol m-2 s-1)
Trmmoldouble Rate of leaf transpiration (mmol m-2 s-1)
VpdLdouble Vapour pressure deficit (kPa)
Source
Gimeno T.E., 2015, Conserved stomatal behaviour under elevated CO2 and varying water availability in a mature woodland. Functional Ecology <https://doi.org/10.1111/1365-2435.12532>
EucFACE ground cover data
Description
This file contains estimates of plant and litter cover within the rings of the EucFACE experiment, evaluating forest ecosystem responses to elevated CO$_2$, on two dates. Within each ring are four plots and within each plot are four 1m by 1m subplots. Values represent counts along a grid of 16 points within each subplot.
Usage
eucfacegc
Format
A data frame with 192 rows and 8 variables:
Dateinteger Date of measurement (d/m/y, not yet converted to Date class)
Ringinteger The identity of the EucFACE Ring, the level at which the experimental treatment is applied.
Plotinteger A total of four plots, nested within each level of Ring.
Subinteger A total of four subplots, nested within each level of Plot.
Forbesinteger Number of points where dicot plants are observed.
Grassinteger Number of points where grass is observed.
Litterinteger Number of points where leaf litter is observed.
Trtinteger The experimental treatment:
ctrlfor ambient levels of atmospheric carbon dioxide,elevfor ambient plus 150ppm.
Source
Jeff Powell
Fluxtower data
Description
This dataset contains measurements of CO$_2$ and H$_2$O fluxes (and related variables) over a pine forest in Quintos de Mora, Spain. The site is a mixture of Pinus pinaster and Pinus pinea, and was planted in the 1960's.
Data need to be cleaned to some extent (the purpose of this example dataset).
Usage
fluxtower
Format
A data frame with 244 rows and 8 variables:
TIMESTAMPfactor Date and time
FCO2double Canopy CO2 flux (mu mol m$^-2$ s$^-1$)
FH2Odouble Canopy H2O flux (mmol m$^-2$ s$^-1$)
ustardouble Roughness length (m s$^-1$)
Tairdouble Air temperature (degrees C)
RHdouble Relative humidity (%)
Tsoildouble Soil temperature (degrees C)
Raininteger Rainfall (mm half hour$^-1$)
Source
Data kindly provided by Victor Resco de Dios (in 2011), and simplified somewhat.
Seed germination as affected by fire
Description
Two datasets on the germination success of seeds of four Melaleuca species, when subjected to temperature, fire cue, and dehydration treatments. Seeds were collected from a number of sites and subjected to 6 temperature treatments and fire cues (in the fire germination data), or two a range of dehydration levels (in the water germination data).
This dataset contains the fire treatment data.
Usage
germination_fire
Format
A data frame with 576 rows and 7 variables:
speciesfactor One of four Melaleuca species
tempinteger Temperature treatment (C)
fire.cuesinteger Fire cue treatment (yes or no)
siteinteger Coding for the site where the seed was collected
cabinetinteger ID for the cabinet where seeds were treated
germinteger Number of germinated seeds
ninteger Number of seeds tested (20 for all rows)
Source
Data are from Hewitt et al. 2015 (Austral Ecology 40(6):661-671), shared by Charles Morris, and simplified for the purpose of this book.
See Also
Seed germination as affected by water
Description
Two datasets on the germination success of seeds of four Melaleuca species, when subjected to temperature, fire cue, and dehydration treatments. Seeds were collected from a number of sites and subjected to 6 temperature treatments and fire cues (in the fire germination data), or two a range of dehydration levels (in the water germination data).
This dataset contains the water treatment data.
Usage
germination_water
Format
A data frame with 352 rows and 5 variables:
speciesfactor One of four Melaleuca species
siteinteger Coding for the site where the seed was collected
water.potentialdouble Water potential of the seed (Mpa) after incubation (low values is drier)
germinteger Number of germinated seeds
ninteger Number of seeds tested (25 for all rows)
Source
Data are from Hewitt et al. 2015 (Austral Ecology 40(6):661-671), shared by Charles Morris, and simplified for the purpose of this package.
See Also
Examples
data(germination_water)
with(germination_water,
plot(jitter(water.potential), germ/n,
pch=21, bg=terrain.colors(4)[species])
)
I x F at the HFE - tree observations
Description
Heights and stem diameters of trees growing in a fertilization x irrigation experiment in Richmond, New South Wales, Australia, as part pf the Hawkesbury Forest Experiment (HFE). A total of 16 plots, each with 72 Eucalyptus saligna trees, was remeasured 17 times between 2008 and 2012. Treatments to the plots were either control (C), applied with fertilizer (F), irrigation (I), or irrigation+fertilization (IF).
This dataset contains the tree-level observations, see hfeifplotmeans for
averaged data.
Usage
hfeifbytree
Format
A data frame with 9592 rows and 6 variables:
IDinteger A unique identifier for each tree.
plotnrinteger A total of sixteen plots (four treatments).
treatinteger One of four treatments (I - irrigated, F - dry fertilized, IL - Liquid fertilizer plus irrigation, C - control)
Datefactor The date of measurement (YYYY-MM-DD)
heightdouble Mean height for the sample trees ($m$).
diameterdouble Mean diameter for the sample trees ($cm$).
Source
Data courtesy of Craig Barton and Burhan Amiji, from Western Sydney University.
Examples
# Variable sample sizes over time. On many occassions, subsamples were measured.
data(hfeifbytree)
ftable(xtabs(~Date+treat, data=hfeifbytree))
I x F at the HFE - plot-level observations
Description
Heights and stem diameters of trees growing in a fertilization x irrigation experiment in Richmond, New South Wales, Australia, as part pf the Hawkesbury Forest Experiment (HFE). A total of 16 plots, each with 72 Eucalyptus saligna trees, was remeasured 17 times between 2008 and 2012. Treatments to the plots were either control (C), applied with fertilizer (F), irrigation (I), or irrigation+fertilization (IF).
This dataset contains the plot-level means, see hfeifbytree for tree-level
measurements.
Usage
hfeifplotmeans
Format
A data frame with 320 rows and 5 variables:
plotnrinteger A total of sixteen plots (four treatments).
Datefactor The date of measurement (YYYY-MM-DD)
diameterdouble Mean diameter for the sample trees ($cm$).
heightdouble Mean height for the sample trees ($m$).
treatinteger One of four treatments (I - irrigated, F - dry fertilized, IL - Liquid fertilizer plus irrigation, C - control)
Weather data at the Hawkesbury Forest Experiment
Description
Data for the weather station at the Hawkesbury Forest Experiment (HFE) for the year 2008. The HFE is in Richmond, New South Wales (in western Sydney), Australia.
Data are in 30min timestep.
Usage
hfemet2008
Format
A data frame with 17568 rows and 9 variables:
DateTimeinteger Date Time (half-hourly steps)
Tairdouble Air temperature (degrees C)
AirPressdouble Air pressure (kPa)
RHdouble Relative humidity (%)
VPDdouble Vapour pressure deficit (kPa)
PARdouble Photosynthetically active radiation (mu mol m$^-2$ s$^-1$)
Raindouble Precipitation (mm)
winddouble Wind speed (m s$^-1$)
winddirectiondouble Wind direction (degrees)
Source
Data courtesy of Craig Barton at Western Sydney University.
Howell height, age and weight data
Description
These data were also used by McElreath (2016, "Statistical Rethinking", CRC Press). Data include measurements of height, age and weight on Khosan people.
Usage
howell
Format
A data frame with 783 rows and 4 variables:
sexfactor male or female
agedouble Age (years)
weightdouble Body weight (kg)
heightdouble Total height (cm)
Source
<https://tspace.library.utoronto.ca/handle/1807/17996>, subsetted for non-missing data and one outlier removed.
Examples
data(howell)
with(howell, plot(age, height, pch=19, col=sex))
Hydro dam storage data
Description
This dataset describes the storage of the hydrodam on the Derwent river in Tasmania (Lake King William \& Lake St. Clair), in equivalent of energy stored.
Usage
hydro
Format
A data frame with 314 rows and 2 variables:
Datefactor The date of the bi-weekly reading (d/m/yyyy)
storageinteger Total water stored, in energy equivalent ($GWh$).
Details
DETAILS
Icecream sales and temperature
Description
A synthetic dataset on weekly ice cream sales in two locations in Amsterdam, along with air temperature. The idea is that the ice cream salesman first sold icecream in 'Oosterpark', and decided to move shop to the 'Dappermarkt' the year after. Did sales improve? This dataset can be used to show that naive conclusions from simple linear model fits can be misleading, and that the use of covariates (here, air temperature) can change conclusions about effects.
Usage
icecream
Format
A data frame with 40 rows and 3 variables:
temperaturedouble Air temperature (C)
salesdouble Icecream sales per week (in local currency)
locationfactor Either 'Dappermarkt' or 'Oosterpark'
Examples
data(icecream)
# Linear model, temperature as covariate
fit_ice <- lm(sales ~ temperature*location, data=icecream)
# Try to guess from coefficients where the sales were higher:
summary(fit_ice)
# What about now?
with(icecream, plot(temperature, sales, pch=19, col=location))
legend("topleft", levels(icecream$location), fill=palette())
Genetically modified soybean litter decomposition
Description
Soybean litter decomposition as a function of time (date), type of
litter (variety), herbicides applied (herbicide), and where in the soil
profile it is placed (profile). masslost refers to the proportion of the
litter that was lost from the bag (decomposed) relative to the start of the experiment.
Herbicide treatments were applied at the level of whole plots, with both treatments
represented within each of four blocks. Both levels of variety and profile were each
represented within each plot, with six replicates of each treatment added to each plot.
Usage
masslost
Format
A data frame with 246 rows and 8 variables:
plotinteger A total of eight plots.
blockinteger A total of four blocks.
varietyinteger Soybean variety is genetically modified ('gm') or not ('nongm'); manipulated at the subplot level.
herbicideinteger Herbicide applied is glyphosate ('gly') or conventional program ('conv'); manipulated at plot level.
profileinteger Whether litter was 'buried' in the soil or placed at the soil 'surface'; manipulated at the subplot level.
dateinteger Date at which litter bags were recovered.
sampleinteger Factor representing timing of sampling ('incrop1', 'incrop2', 'postharvest').
masslostdouble The proportion of the initial mass that was lost from each litter bag during field incubation. Some values are lower than zero due to insufficient washing of dirt and biota from litter prior to weighing.
Source
Jeff Powell
Memory of words dataset
Description
A dataset on the number of words remembered from list, for various learning techniques, and in two age groups.
Usage
memory
Format
A data frame with 100 rows and 3 variables:
Ageinteger Age of person tested (yr)
Processfactor One of five methods used to memorize the words.
Wordsdouble Number of words recalled.
Details
Description taken from source: "Why do older people often seem not to remember things as well as younger people? Do they not pay attention? Do they just not process the material as thoroughly? One theory regarding memory is that verbal material is remembered as a function of the degree to which is was processed when it was initially presented. Eysenck (1974) randomly assigned 50 younger subjects and 50 older (between 55 and 65 years old) to one of five learning groups. The Counting group was asked to read through a list of words and count the number of letters in each word. This involved the lowest level of processing. The Rhyming group was asked to read each word and think of a word that rhymed with it. The Adjective group was asked to give an adjective that could reasonably be used to modify each word in the list. The Imagery group was instructed to form vivid images of each word, and this was assumed to require the deepest level of processing. None of these four groups was told they would later be asked to recall the items. Finally, the Intentional group was asked to memorize the words for later recall. After the subjects had gone through the list of 27 items three times they were asked to write down all the words they could remember."
Source
<http://www.statsci.org/data/general/eysenck.html>.
Crude oil production
Description
Crude oil production for the top 8 oil-producing countries (minus Russia, for which understandably no data were available pre-1990), for the period 1971-2017.
Usage
oil
Format
A data frame with 376 rows and 3 variables:
countryfactor Country code
yearinteger 1971 - 2017
productiondouble Annual crude oil production in TOE.
Pulse Rates before and after Exercise
Description
Pulse rates measured on 110 participating students. Half of the students ran in place for one minute, before their pulse rate was measured again.
Usage
pulse
Format
A data frame with 110 rows and 11 variables:
Heightinteger Height (cm)
Weightdouble Weight (kg)
Ageinteger Age (years)
Genderinteger Sex (1 = male, 2 = female)
Smokesinteger Regular smoker? (1 = yes, 2 = no)
Alcoholinteger Regular drinker? (1 = yes, 2 = no)
Exerciseinteger Frequency of exercise (1 = high, 2 = moderate, 3 = low)
Raninteger Whether the student ran or sat between the first and second pulse measurements (1 = ran, 2 = sat)
Pulse1integer First pulse measurement (rate per minute)
Pulse2integer Second pulse measurement (rate per minute)
Yearinteger Year of class (93 - 98)
Details
Description taken from source: "Students in an introductory statistics class (MS212 taught by Professor John Eccleston and Dr Richard Wilson at The University of Queensland) participated in a simple experiment. The students took their own pulse rate. They were then asked to flip a coin. If the coin came up heads, they were to run in place for one minute. Otherwise they sat for one minute. Then everyone took their pulse again. The pulse rates and other physiological and lifestyle data are given in the data. Five class groups between 1993 and 1998 participated in the experiment. The lecturer, Richard Wilson, was concerned that some students would choose the less strenuous option of sitting rather than running even if their coin came up heads, so in the years 1995-1998 a different method of random assignment was used. In these years, data forms were handed out to the class before the experiment. The forms were pre-assigned to either running or non-running and there were an equal number of each. In 1995 and 1998 not all of the forms were returned so the numbers running and sitting was still not entirely controlled."
Source
<http://www.statsci.org/data/oz/ms212.html>
Examples
data(pulse)
with(pulse, plot(Weight, Pulse2-Pulse1,
pch=19, col=c("red2", "dimgrey")[Ran]))
abline(h=0, lty=5)
Pupae data
Description
This dataset is from an experiment where larvae were left to feed on Eucalyptus leaves, in a glasshouse that was controlled at two different levels of temperature and CO$_2$ concentration. After the larvae pupated (that is, turned into pupae), the body weight was measured, as well as the cumulative 'frass' (larvae excrement) over the entire time it took to pupate.
Usage
pupae
Format
A data frame with 84 rows and 5 variables:
T_treatmentinteger Temperature treatments ('ambient' and 'elevated')
CO2_treatmentinteger CO$_2$ treatment (280 or 400 ppm).
Genderinteger The gender of the pupae : 0 (male), 1 (female)
PupalWeightdouble Weight of the pupae ($g$)
Frassdouble Frass produced ($g$)
Source
Data courtesy of Tara Murray, and simplified for the purpose of this package.
Rain data
Description
This dataset contains ten years (1995-2006) of daily rainfall amounts as measured at the Richmond RAAF base.
Usage
rain
Format
A data frame with 3653 rows and 3 variables:
Yearinteger Year
DOYinteger Day of year (1-366)
Raindouble Daily rainfall amount (mm)
Source
<http://www.bom.gov.au/climate/data/>, simplified and adjusted for this package.
Sydney to Hobart winning times
Description
Winning times for the Sydney to Hobart Yacht Race. An annual sail yacht race over 1170km, from Sydney's harbour, to Hobart in Tasmania. The race is infamous for the rough conditions, long distance, and large number of dropouts in some years. The data include the winning time, and the number of starting yachts, and the number of yachts reaching the finish.
Usage
sydney_hobart_times
Format
A data frame with 72 rows and 5 variables:
Yearinteger Year race was held
Timedouble Total time (days)
fleet_startinteger Number yachts at start
fleet_finishinteger Number yachts at finish
Time_recorddouble Record race up to this year
Source
<https://en.wikipedia.org/wiki/Sydney_to_Hobart_Yacht_Race>
Examples
data(sydney_hobart_times)
with(sydney_hobart_times, {
plot(Year, Time)
lines(Year, Time_record, type='s', col="red")
})
Passengers on the Titanic
Description
Survival status of passengers on the Titanic,
together with their names, age, sex and passenger class. Not to be confused with
the dataset Titanic, provided with R, which lists only tables of passengers.
This dataset on the other hand provides one row per passenger.
Usage
titanic
Format
A data frame with 1313 rows and 5 variables:
Nameinteger Recorded name of passenger
PClassinteger Passenger class: 1st, 2nd or 3rd
Agedouble Age in years (many missing)
Sexinteger male or female
Survivedinteger 1 = Yes, 0 = No
Details
DETAILS
Source
<http://www.statsci.org/data/general/titanic.html>
Tree canopy gradients in the Priest River Experimental Forest (PREF)
Description
Leaves of two pine species (35 trees in total) were sampled throughout their canopy, usually 8 samples were taken at various heights. The height is expressed as the 'distance from top', i.e. the distance to the apex of the tree. Leaves (conifer needles) were analysed for nitrogen content (narea), and an index of leaf thickness, the 'leaf mass per area'. The data show the usual pattern of higher leaf thickness (higher LMA) toward the top of the trees, but individual trees show a lot of variation in LMA.
Usage
treecanopy
Format
A data frame with 249 rows and 7 variables:
IDinteger ID of the individual tree
speciesinteger Pinus ponderosa or Pinus monticola
dfromtopdouble Distance from top of tree (where leaf sample was taken) (m)
totheightdouble Total height of the tree (m)
heightdouble Height from the ground (where sample was taken) (m)
LMAdouble Leaf mass per area (g m$^-2$)
nareadouble Nitrogen per area (gN m$^-2$)
Source
Marshall, J.D., Monserud, R.A. 2003. Foliage height influences specific leaf area of three conifer species. Can J For Res 33:164-170
Examples
data(treecanopy)
if(require(ggplot2)){
ggplot(treecanopy, aes(dfromtop,LMA,group=ID,col=species)) +
geom_point() +
stat_smooth(method="lm",se=FALSE) +
theme_minimal()
}
Xylem vessel diameters
Description
Measurements of diameters of xylem (wood) vessels on a single Eucalyptus saligna tree grown at the Hawkesbury Forest Experiment.
Usage
vessel
Format
A data frame with 550 rows and 3 variables:
positioninteger Either 'base' or 'apex' : the tree was sampled at stem base and near the top of the tree.
imagenrinteger At the stem base, six images were analyzed (and all vessels measured in that image). At apex, three images.
vesseldiamdouble Diameter of individual water-conducting vessels (mu m).
Source
Sebastian Pfautsch
Weight loss data
Description
This dataset contains measurements of a Jeremy Zawodny over a period of about 3 months while he was trying to lose weight. This is an example of an irregular timeseries dataset (intervals between measurements vary).
Usage
weightloss
Format
A data frame with 67 rows and 2 variables:
Datefactor Date, d/m/yy
Weightdouble Weight, in pounds
Source
<http://jeremy.zawodny.com/blog/archives/006851.html>
Mouse metabolism
Description
Wild mice were placed in a device where the metabolic rate (energy used by the animal) can be measured directly, and continuously. Measurements were made at varying temperature (15, 20 and 31C), mice were provided with food or not, and were able to exercise (with a treadmill) or not.
Usage
wildmousemetabolism
Format
A data frame with 864 rows and 9 variables:
idinteger Individual number
runinteger The experiment was repeated three times (run = 1,2,3)
dayinteger Day of experiment (1-6)
tempinteger Temperature (deg C)
foodinteger Whether food was provided ('Yes') or not ('No')
bmdouble Body mass (g)
wheelinteger Whether the mouse could use an exercise wheel ('Yes') or not ('No')
rmrdouble Resting metabolic rate (minimum rate of a running average over 12min) (kC hour-1)
sexinteger Male or Female
Source
Christopher Turbill