[R] NAs introduced by coercion warning?

Jonathan Greenberg greenberg at ucdavis.edu
Wed Feb 18 23:46:20 CET 2004


Its hard for me to pinpoint where this is happening, since I'm working on an
image that¹s about 10000 x 20000 pixels, and 12 bands deep and I'm using a
set of for-next loops to pull out subsections of data.  I can guarantee the
input values are all floating point values.

To be more specific, I have created a classification tree, and I want to
apply it to that large floating point image (all the band names match up)
and write the prediction (probability) values to a file.  What happens if a
decision tree tries to classify a set of input values that are completely
outside of the range of the input tree?

Here's the code I was using.  I should mention that this worked on a small
subset (400 x 400 pixels) that wouldn't have any "weird" values (negative or
zero).  The output file from this is turning out to be slightly smaller than
it should given the samples,lines,bands and number type, which I why I'm
wondering if the tree is simply dropping those "bad" values rather than
giving them some value (e.g. 0):

## Creating the tree
library(tree)
bands=12
bandnames<-paste(c("B"),1:bands,sep="")
treetraindata=read.csv("classtrainshad040205.csv",header=TRUE)
names(treetraindata)[2:6]<-bandnames[1:5]
names(treetraindata)[8:14]<-bandnames[6:12]
treetraindata$Class_Name<-as.factor(treetraindata$Class_Name)

## Create an overfit tree
treetrain<-tree(Class_Name ~ B1 + B2 + B3 +
B4+B5+B6+B7+B8+B9+B10+B11+B12,treetraindata,mincut=1,minsize=2,mindev=0)

## Extracts a slice of data out of an ENVI BSQ file
envigetslice<-function(fileconnection,samples,lines,bands,interleave,datatyp
e,maxpixels) {
    currentloc=seek(fileconnection,where=NA,origin="current")
    ## If data is integer
    if(datatype==3) {
        numbersize=2
        datatype=integer()
        if ((samples*lines)-(currentloc/numbersize) < maxpixels)
maxpixels=(samples*lines)-(currentloc/numbersize)
        envislice <-
readBin(fileconnection,integer(),maxpixels,size=numbersize)
        newloc=seek(fileconnection,where=NA,origin="current")
        if (bands > 1) {
            for (i in 1:(bands-1)) {
                
seek(fileconnection,where=currentloc+(samples*lines*numbersize*i),origin="st
art")
                currentslice <-
readBin(fileconnection,integer(),maxpixels,size=numbersize)
                envislice=data.frame(envislice,currentslice)
            }
        }
    }
    ## If data is floating point
    if(datatype==4) {
        numbersize=4
        if ((samples*lines)-(currentloc/numbersize) < maxpixels)
maxpixels=(samples*lines)-(currentloc/numbersize)
        envislice <-
readBin(fileconnection,double(),maxpixels,size=numbersize)
        newloc=seek(fileconnection,where=NA,origin="current")
        if (bands > 1) {
            for (i in 1:(bands-1)) {
                
seek(fileconnection,where=currentloc+(samples*lines*numbersize*i),origin="st
art")
                currentslice <-
readBin(fileconnection,double(),maxpixels,size=numbersize)
                envislice=data.frame(envislice,currentslice)
            }
        }
    }
    seek(fileconnection,where=newloc,origin="start")
    envislice
}

## Read ENVI files in subsets
## interleave: 1=bsq
## datatype: (follows ENVI format):
##    3: long integer
##    4:floating point


## Apply the classifier
imageclasstree<-function(infile,outfile,dectree,samples,lines,bands,interlea
ve,datatype,maxpixels) {

fileconnection<-file(infile,open="rb")
outfileconnection=file(outfile,open="wb")

numpixels = samples * lines
numslices=ceiling(numpixels/maxpixels)
if (numslices == floor(numpixels/maxpixels)) numslices=numslices-1

bandnames<-paste(c("B"),1:bands,sep="")

## Loop for processing images
for(j in 0:numslices) {
    print((j/numslices)*100)
    
envislice<-envigetslice(fileconnection,samples,lines,bands,interleave,dataty
pe,maxpixels)
    names(envislice)<-bandnames
    predictslice<-predict(treetrain,envislice,type=c("vector"))
    
predictslice<-as.integer(round(as.vector(t(predictslice*10000)),digits=0))
    predictslice
    writeBin(predictslice,outfileconnection,size=2)
}
close(fileconnection)
close(outfileconnection)
}

imageclasstree("flt4aall","flt4adt", treetrain,11216,18173,12,1,4,25000)

On 2/18/04 2:25 PM, "Sundar Dorai-Raj" <sundar.dorai-raj at PDF.COM> wrote:

> 
> 
> Jonathan Greenberg wrote:
> 
>> I'm running a decision tree on a large dataset, and I'm getting multiple
>> instances of "NAs introduced by coercion" (> 50).  What does this mean?
>> 
>> --j
>> 
> 
> My guess would be you're trying to convert from character to numeric and
> are unable to do so. As in,
> 
>> as.numeric("A")
> [1] NA
> Warning message:
> NAs introduced by coercion
>> as.numeric("1")
> [1] 1
>> 
> 
> But without more information from you it's impossible to tell.
> 
> See the posting guide at
> 
> http://www.R-project.org/posting-guide.html
> 
> Regards,
> Sundar
> 


-- 
Jonathan Greenberg
Graduate Group in Ecology, U.C. Davis
http://www.cstars.ucdavis.edu/~jongreen
http://www.cstars.ucdavis.edu
AIM: jgrn307 or jgrn3007
MSN: jgrn307 at msn.com or jgrn3007 at msn.com




More information about the R-help mailing list