[R] possible interesting R projects for undergrads

Greg Snow Greg.Snow at imail.org
Wed Sep 24 21:43:13 CEST 2008


Here are some of the ideas I have used in the past teaching a class like this:

Give them a paragraph of text describing data values and have them create a data frame from the data (the prose is so that there is not an obvious table structure to start with).  Something like:

Patient number 1 (male) had blood pressure of 120/80 before the treatment and 110/70 after the treatment, patient number 2 lowered her systolic value from 130 to 120 with the treatment but her diastolic value stayed at 80, ...

I only used about 6 rows of final data, but this forces the students to think about how they want to structure the data (should systolic before and after be separate columns? Or 1 column with another column indicating before/after?)

Now have the students do some basic analyses on a sample dataset, t-tests, summaries, basic regression, diagnostic plots.

Have them compute regression coefficients the hard way (doing the matrix multiplications and/or minimizing the sum of squared residuals using optim), this may help them appreciate the lm function.

Generate a population of random data and compute the mean and standard deviation, then take 100 or 1,000 samples from this population and compute the means of each sample.  Compare the mean and standard deviation of the means to the mean and standard deviation of the population, create a histogram of the means and show summaries of the means and the population as reference lines on the plot (cement the central limit theorem).

Write a function to do the classic number guessing game where the function will choose an integer between 1 and 100 then prompt the user for a guess, then tell them if their guess is too high, too low, or correct (not interesting statistically, but gives some good basic use of programming logic).

Write a function that will compute the arithmetic, geometric, harmonic, and self weighting means.  The function needs the same optional arguments as mean.  Optionally have it plot a histogram of the data with reference lines at each of the means.

Use regexpr and related functions to extract information from date(), or from the rownames of a dataset (I often get data whith id values like M1, M2, F1, F2, ... and no column of sex info, so need to extract that from the id).

Generate data from the distribution f(x) = x/2 for 0<x<2.  Generate bivariate data from the joint distribution f(x,y) = 2x+2y-4xy, 0<x<1, 0<y<1.  Plot the data to see if it looks like it comes from the theoretical distribution.

Various simulations:
recreate the t-table (generate samples of normals, compute t, find the quantile at which 5% of tests would be more extreme.

Generate data for a 2-sample t test, but decide whether to pool the variances based on a test of the variances.  Simulate under various conditions to see if you get a different error rate than you should.

Do simulations to calculate power for different scenarios.


As part of the final I would usually have them write a function to do Hottelling's multivariate T-test.

Hope this helps,





--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Erin Hodgess
> Sent: Wednesday, September 24, 2008 11:39 AM
> To: r-help at r-project.org
> Subject: [R] possible interesting R projects for undergrads
>
> Dear R People:
>
> I finally (Yay!) got R installed in a classroom!
>
> Anyhow, I have a respectful request, please:  could anyone recommend
> some nice undergrad projects in R, please?
>
> This is in a statistical computation class; first time being run.
>
> Thanks,
> Erin
>
>
> --
> Erin Hodgess
> Associate Professor
> Department of Computer and Mathematical Sciences
> University of Houston - Downtown
> mailto: erinm.hodgess at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list