[R] Subtracting Data Frame With a Different Number of Rows
William Michels
wjm1 @end|ng |rom c@@@co|umb|@@edu
Wed Apr 22 00:29:17 CEST 2020
Hi Phillip,
You have two choices here: 1. Manually enter the missing rows into
your individual.df using rbind(), and cbind() the overall.df and
individual.df dataframes together (assuming the rows line up
properly), or 2. Use merge() to perform an SQL-like "Left Join", and
copy values from the "overall" columns to fill in missing values in
the "indiv" columns (imputation). Below is code starting from a .tsv
files showing the second (merge) method. Note: I've only included the
first 4 rows of data after the merge command (there are 24 rows
total):
> overall <- read.delim("overall.R", sep="\t")
> indiv <- read.delim("individual.R", sep="\t")
> merge(overall, indiv, all.x=TRUE, by.x=c("RunnerCode", "Outs"), by.y=c("RunnerCode", "Outs"))
RunnerCode Outs X.x MeanRuns.x X.y MeanRuns.y
1 BasesEmpty 0 1 0.5137615 1 0.4262295
2 BasesEmpty 1 9 0.3963801 8 0.5238095
3 BasesEmpty 2 17 0.4191011 15 0.3469388
4 BasesLoaded 0 8 3.2173913 NA NA
HTH, Bill.
W. Michels, Ph.D.
On Tue, Apr 21, 2020 at 1:47 PM Phillip Heinrich <herd_dog using cox.net> wrote:
>
> I have two small data frames of baseball data. The first one is the mean
> number of runs that will score in each half inning for the 2018 Arizona
> Diamondbacks. The second data frame is the same information but for only
> one player. As you will see the individual player did not come up to bat
> any time during the season:
> with the bases loaded and no outs
> runners on first and third with one out
>
> Overall
>
> RunnerCode Outs MeanRuns
> 1 Bases Empty 0 0.5137615
> 2 Runner:1st 0 0.8967391
> 3 Runner:2nd 0 1.3018868
> 4 Runners:1st & 2nd 0 1.6551724
> 5 Runner:3rd 0 1.9545455
> 6 Runners:1st & 3rd 0 2.0571429
> 7 Runners:2nd & 3rd 0 2.1578947
> 8 Bases Loaded 0 3.2173913
> 9 Bases Empty 1 0.3963801
> 10 Runner:1st 1 0.6952596
> 11 Runner:2nd 1 0.9580838
> 12 Runners:1st & 2nd 1 1.4397163
> 13 Runner:3rd 1 1.5352113
> 14 Runners:1st & 3rd 1 1.5882353
> 15 Runners:2nd & 3rd 1 1.9215686
> 16 Bases Loaded 1 1.9193548
> 17 Bases Empty 2 0.4191011
> 18 Runner:1st 2 0.5531915
> 19 Runner:2nd 2 0.8777293
> 20 Runners:1st & 2nd 2 0.9553073
> 21 Runner:3rd 2 1.2783505
> 22 Runners:1st & 3rd 2 1.5851064
> 23 Runners:2nd & 3rd 2 1.2794118
> 24 Bases Loaded 2 1.388235
>
> Individual Player
>
> RunnerCode Outs MeanRuns
> 1 Bases Empty 0 0.4262295
> 2 Runner:1st 0 1.3200000
> 3 Runner:2nd 0 1.2857143
> 4 Runners:1st & 2nd 0 0.5714286
> 5 Runner:3rd 0 2.0000000
> 6 Runners:1st & 3rd 0 3.5000000
> 7 Runners:2nd & 3rd 0 1.0000000
> 8 Bases Empty 1 0.5238095
> 9 Runner:1st 1 0.6578947
> 10 Runner:2nd 1 0.3750000
> 11 Runners:1st & 2nd 1 1.4285714
> 12 Runner:3rd 1 1.4285714
> 13 Runners:2nd & 3rd 1 0.6666667
> 14 Bases Loaded 1 3.0000000
> 15 Bases Empty 2 0.3469388
> 16 Runner:1st 2 0.1363636
> 17 Runner:2nd 2 0.7142857
> 18 Runners:1st & 2nd 2 1.6666667
> 19 Runner:3rd 2 1.2500000
> 20 Runners:1st & 3rd 2 2.1428571
> 21 Runners:2nd & 3rd 2 1.5000000
> 22 Bases Loaded 2 2.2000000
>
> RunnersCode is a factor
> Outs are integers
> MeanRuns is numerical data
>
> I would like to subtract the second from the first as a way to evaluate the
> players ability to produce runs. As part of this analysis I I would like to
> input the mean number of runs from the overall data frame into the two
> missing cells for the individual player:Bases Loaded no outs and 1st and 3rd
> one out.
>
> Can anyone give me some advise?
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list