[R] Probability of exceedance function question

Roger Bivand Roger.Bivand at nhh.no
Mon Oct 9 09:52:11 CEST 2006


On Sun, 8 Oct 2006, Thomas P. Colson wrote:

> I am able to convert the flow accumulation grid into an area (for each
> pixel) grid, then import this into R as an ASCII file. The plot(ecdf)
> function in R seems to plot the opposite: curve starts at probability 0, for
> drainage area 0, should be the other way? 

In your initial posting, you asked for P(A > A*), plot(ecdf()) is giving 
P(A <= A*), so:

plot(t1, pch=".", axes=FALSE)
axis(1) 
box()
axis(2, at=axTicks(2), labels=rev(axTicks(2)))

lets you reverse the vertical axis labels. There is a section on ECDF in 
Jacoby's Statistical graphics for univariate and bivariate data (Sage, 
1997). ECDF are also the hypsometric integral, etc., they just keep 
getting re-invented. See also a helpful reply on the hypsometric interval 
from a couple of years ago:

http://tolstoy.newcastle.edu.au/R/help/03b/2746.html

> 
> About 150,000 data points in these sets, ecdf curve plots in about 15
> seconds. 

But are there 150K drainage basins? Mocking this up in GRASS with 
spearfish:

r.mapcalc 'MASK=!isnull(elevation.dem)'
r.watershed elev=elevation.dem threshold=100 stream=stream.map1 basin=basin.map1

and reading the output into R, I have a SpatialGridDataFrame with 302418 
raster cells. To get basin area:

basins <- split(basin.map$basin.map1, basin.map$basin.map1)
cell_count <- sapply(basins, length)

(just 1422 basins) and multiply by 30*30 (resolution) to get area. I think
your Arc ASCII grid contains basin ID numbers, so you'll need to split on
the IDs first, then count the lengths of the cells in each basin and
multiply by cell area (20ft by 20ft).

By the way, you can read an Arc ASCII grid into a SpatialGridDataFrame 
with function readAsciiGrid() in package maptools, so for you:

basin.map <- readAsciiGrid("c:/temp/area.asc", colname="basinID")
basins <- split(basin.map$basinID, basin.map$basinID)
length(basins) # sanity check
cell_count <- sapply(basins, length)
basin_area <- cell_count * 20 * 20
t1 <- ecdf(basin_area)
plot(t1, pch=".", axes=FALSE) # and set xlab=, ylab=, main=
axis(1)
box()
grid()
axis(2, at=axTicks(2), labels=rev(axTicks(2)))

probably gets you there. There is a list (R-sig-geo) for these kinds of 
questions that you might consider using, there are a fair number of 
people using R with GIS.

Roger

> 
> 
> Could the problem be, how I'm importing the data from ascii grid? Cellsize
> is 20 ft and z is the drainage area, for each cell (flow weighted)
> 
>  area <- read.table(file = "c:/temp/area.asc", sep = " ", na.strings =
> "-9999", skip = 6) 
> area <- area[,-ncol(area)] 
> xLLcorner <- 1985649.0700408898
> yLLcorner <- 841301.04004059616
> cellsize <-20
> xURcorner <- xLLcorner + (cellsize * (ncol(area) - 1)) 
> xLRcorner <- xURcorner 
> xULcorner <- xLLcorner
> yULcorner <- yLLcorner + (cellsize * (nrow(area) - 1)) 
> yURcorner <- yULcorner 
> yLRcorner <- yLLcorner 
> coordsa <- expand.grid(y = seq(yULcorner, yLRcorner, by = -20),x =
> seq(xULcorner, xLRcorner, by = 20))
> area<- data.frame(coordsa, tmin = as.vector(c(area,recursive = T))) 
> names(area)<-c("x","y","z")
> Plot(ecdf(area$z))
> 
> 
> -----Original Message-----
> From: Roger Bivand [mailto:Roger.Bivand at nhh.no] 
> Sent: Sunday, October 08, 2006 4:37 PM
> To: Thomas P. Colson
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Probability of exceedance function question
> 
> On Sun, 8 Oct 2006, Thomas P. Colson wrote:
> 
> > I'm trying to calculate a cumulative area distribution (graph) of 
> > drainage areas. This is defined as P(A > A*). Simple in principle. I 
> > can do this in excel, with "COUNTIF", which will count the number of 
> > cells in the row "area" that have area A, then determine, for each 
> > cell in the row "area, how many cells exceede that area, then dividing 
> > that number by the total number of cells, which gives me the 
> > probability that drainage area A exceeds drainage area A*.
> 
> Is this ecdf() of the vector or its suitable subset? If so, it runs very
> fast even for large data sets. For plotting, bear in mind that you are
> generating a lot of output, though:
> 
> > t0 <- runif(100000)
> > system.time(t1 <- ecdf(t0))
> [1] 0.222 0.022 0.248 0.000 0.000
> > system.time(plot(t1, pch="."))
> [1] 1.089 0.079 1.186 0.000 0.000
> 
> isn't at all bad!
> 
> > 
> > E.g, drainage area of 6 sq meters (One DEM grid cell) has a high 
> > probability of exceedance(.99), while a drainage area of 100,000 
> > square meters has a low probability of exceedance (.001).
> > 
> > I wish to plot this relationship, and we all know that excel is not 
> > the tool of choice when working with hundreds of thousands of records. 
> > I'd like to port the CAD into a few R functions that I've already 
> > developed for other tests as well.
> > 
> > So my challenge, in R, is to
> > (1)count the number of rows in column "Area" that have AREA(*),
> > 
> > (2) determine, by row, how many rows have an area greater than the 
> > area given in that one row
> > 
> > (3) divide step 2 by number of rows (how can I do a row count and port 
> > that to a variable, as I have to do this on 10 datasets?)
> > 
> > Thanks for any advice you can offer to this endevour
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> --
> Roger Bivand
> Economic Geography Section, Department of Economics, Norwegian School of
> Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-help mailing list