Don MacQueen
macq at llnl.gov
Wed Jul 3 22:21:37 CEST 2002
Here is one idea:
1) use try(), as in try(lda()),
then, when try() indicates an error, use a function (which you will
have to write), to search through all the variables in RESULTVARS,
find out which one(s) are offending, and then call lda() again
omitting those.
Here's an untested sketch of the search function (some debugging
probably needed):
findgood <- function(df,cols=1:750) {
ok <- numeric()
for (j in cols) if (sum(!duplicated(df[,j])) > 1) ok <- c(ok,j)
ok
}
it is supposed to return a vector of column numbers of columns that
are ok to use. Use it like this:
ok <- findgood(RESULTVARS,1:750)
then
lda(RESULTVARS[,ok] , GROUPVAR)
sapply() instead of for() might be faster, though perhaps not easier
to understand.
-Don
At 7:12 PM +0100 7/3/02, Rishabh Gupta wrote:
>Hi all,
> I am using the lda function from the MASS library to measure the
>discriminance of different variables with respect to different
>grouping variables by using
>
> lda( RESULTVARS[, 1:750] , GROUPVAR , tol=0 ) where
>RESULTVARS contains some 750 different variables.
>
>Occasionally there is a variable within RESULTVARS that has the same
>values for all values of GROUPVAR, ie no variance so I get the
>error:
>
>Error in svd(X, nu = 0) : NA/NaN/Inf in foreign function call (arg 1)
>
>As I understand it, this is due to the a division of zero in one svd
>function that is used by lda. The nature of my results are such
>that every now and than I will get a case where all the values for a
>RESULTVARS variable are constant. Is there a way of getting
>past this problem. For example, by using the tol=0 parameter I can
>avoid problems when the variables are the same within a
>particular group. As far as I am concerned, cases where the
>variables are constant across all groups is saying that that variable
>has zero discriminance.
>Example values of the grouping variable are:
>> d$subject
> [1] E D C B A H G K F I J E D C B A H G K F I J E D C B A H G K
>F I J E D C B A G H K F I J E D C B A H G K F I J E D C B A H G
>K F I J E D C B A H G K F I J
> Levels: A B C D E F G H I J K
>
>Example values of the results variables with no errors are:
>> d[,104]
> [1] 2.312308 2.957263 2.979431 2.764650 2.877694 3.078302 3.112324
>2.906696 3.045316 1.995411 2.488661 2.976581 2.917944 3.089677
>2.850058 2.758467
>[17] 2.898870 2.966295 3.123338 3.130935 2.729223 2.831621 2.222380
>2.461088 2.539655 2.267584 2.599100 2.575934 2.858999 2.311193
>2.515690 2.490992
>[33] 2.230635 2.846939 3.091381 3.072407 3.097286 2.878738 3.097788
>3.155828 3.250491 3.095101 2.956129 3.157974 3.093765 2.682200
>3.072632 2.931168
>[49] 2.469290 2.909947 2.682943 2.985903 2.738458 2.828025 2.860262
>3.112574 2.890100 2.813462 2.694520 3.058201 2.761940 2.835700
>2.829152 2.834158
>[65] 3.029300 2.870694 3.024452 2.909192 2.926210 2.530717 2.875842
>2.798146 2.576489 2.690214 2.865670 2.499521 2.900491
>
>and finally, example of the results variables WITH ERRORS are:
>d[,105]
> [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
>3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
>3 3 3 3 3 3 3 3 3 3 3 3 3
>
>Can anyone suggest a workaround for this problem. Your help would be
>greatly appreciated.
>
>Many Thanks For Your Help
>
>Rishabh
>
