[R] How to run prop.test on 3-level factors?

Luigi Marongiu m@rong|u@|u|g| @end|ng |rom gm@||@com
Tue Nov 16 09:32:24 CET 2021


Hello,
I have a large database with a column containing a factor:
```
> str(df)
'data.frame': 5000000 obs. of  4 variables:
$ MR   : num  0.000809 0.001236 0.001663 0.002089 0.002516 ...
$ FCN  : num  2 2 2 2 2 2 2 2 2 2 ...
$ Class: Factor w/ 3 levels "negative","positive",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Set  : int  1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "out.attrs")=List of 2
..$ dim     : Named int [1:2] 1000 1000
.. ..- attr(*, "names")= chr [1:2] "X1" "X2"
..$ dimnames:List of 2
.. ..$ X1: chr [1:1000] "X1=0.0008094667" "X1=0.0012360955"
"X1=0.0016627243" "X1=0.0020893531" ...
.. ..$ X2: chr [1:1000] "X2= 2.000000" "X2= 2.048048" "X2= 2.096096"
"X2= 2.144144" ...
```
I would like to run prop.test on df$Class, but:
```
> prop.test(x=point$Class, n=length(point$Class),
+ conf.level=.95, correct=FALSE)
Error in prop.test(x = point$Class, n = length(point$Class),
conf.level = 0.95,  :
'x' and 'n' must have the same length
```
Since `x` is "a vector of counts of successes, a one-dimensional table
with two entries, or a two-dimensional table (or matrix) with 2
columns, giving the counts of successes and failures, respectively." I
provided point$Class. The total number of tests is
length(point$Class).
There are three levels:
```
> unique(df$Class)
[1] negative  positive  uncertain
Levels: negative positive uncertain
```
I tried to remove the levels to check if the levels were interfering
with the test:
```
> df$Class = levels(droplevels(df$Class))
Error in `$<-.data.frame`(`*tmp*`, Class, value = c("negative", "positive",  :
replacement has 3 rows, data has 5000000
```
What would be the syntax for this test? The idea is to get the most
common value for each unique(df$Set) and prop.test will provide also
the 95% CI for the estimate.
Thanks



More information about the R-help mailing list