[R] expanding a presence only dataset into presence/absence

arun smartpink111 at yahoo.com
Mon Apr 29 20:05:03 CEST 2013




I am sorry.  I forgot to update the code:dat1<- read.table(text="
Species Site Date
a 1 1
b 1 1
b 1 2
c 1 3
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$Present<- 1
dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
 colnames(dat2)<- colnames(dat1)[-4] #changed here
res<-merge(dat1,dat2,by=c("Species","Site","Date"),all=TRUE)
res[is.na(res)]<- 0
 res<-res[order(res$Date),]

row.names(res)<- 1:nrow(res)
res
#  Species Site Date Present
#1       a    1    1       1
#2       b    1    1       1
#3       c    1    1       0
#4       a    1    2       0
#5       b    1    2       1
#6       c    1    2       0
#7       a    1    3       0
#8       b    1    3       0
#9       c    1    3       1
A.K.


________________________________
From: Matthew Venesky <mvenesky at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Monday, April 29, 2013 1:58 PM
Subject: Re: [R] expanding a presence only dataset into presence/absence



The output that you prepared (for Site 1) looks good... however, I can't get that code to work. I get the following error:

> dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames(dat2)<- colnames(dat1)
Error: unexpected symbol in "dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))colnames"






--
Matthew D. Venesky, Ph.D.


Postdoctoral Research Associate,
Department of Integrative Biology,
The University of South Florida,
Tampa, FL 33620

Website: http://mvenesky.myweb.usf.edu/


On Mon, Apr 29, 2013 at 1:44 PM, arun <smartpink111 at yahoo.com> wrote:

Hi Matthew,
>
>So, do you think the output I gave is different from what you expected?
>Thanks,
>Arun
>
>
>
>
>
>
>________________________________
>From: Matthew Venesky <mvenesky at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Monday, April 29, 2013 1:15 PM
>Subject: Re: [R] expanding a presence only dataset into presence/absence
>
>
>
>
>I see what you are confused about. 
>
>I'm sorry. I gave extra sites as examples in my table called "Desired Data" such that there are 3 sites in the "Desired Data" and only 1 site in the "My current data". Ignore sites 2 and 3; you should see what I am trying to do using only site 1.
>
>
>
>
>--
>Matthew D. Venesky, Ph.D.
>
>
>Postdoctoral Research Associate,
>Department of Integrative Biology,
>The University of South Florida,
>Tampa, FL 33620
>
>Website: http://mvenesky.myweb.usf.edu/
>
>
>On Mon, Apr 29, 2013 at 1:11 PM, Matthew Venesky <mvenesky at gmail.com> wrote:
>
>That is part of the difficulty. If Species C was present only on Date 3, we need to have the code manually add Species C as absent (i.e., assign it a value of 0) at that site on the previous sampling dates. 
>>
>>
>>Or, is there something else that is confusing you that I am not explaining?
>>
>>
>>
>>
>>--
>>
>>
>Matthew D. Venesky, Ph.D.
>>
>>
>>Postdoctoral Research Associate,
>>Department of Integrative Biology,
>>The University of South Florida,
>>Tampa, FL 33620
>> 
>>Website: http://mvenesky.myweb.usf.edu/
>>
>>
>>On Mon, Apr 29, 2013 at 12:47 PM, arun <smartpink111 at yahoo.com> wrote:
>>
>>Hi,
>>>
>>>Your output dataset is bit confusing as it contains Sites that were not in the input.
>>>Using your input dataset, I am getting this:
>>>
>>>
>>>dat1<- read.table(text="
>>>
>>>Species Site Date
>>>a 1 1
>>>b 1 1
>>>b 1 2
>>>c 1 3
>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>>dat1$Present<- 1
>>>dat2<-expand.grid(unique(dat1$Species),unique(dat1$Site),unique(dat1$Date))
>>> colnames(dat2)<- colnames(dat1)
>>>res<-merge(dat1,dat2,by=c("Species","Site","Date"),all=TRUE)
>>>res[is.na(res)]<- 0
>>> res<-res[order(res$Date),]
>>> res
>>>#  Species Site Date Present
>>>#1       a    1    1       1
>>>#4       b    1    1       1
>>>#7       c    1    1       0
>>>#2       a    1    2       0
>>>#5       b    1    2       1
>>>#8       c    1    2       0
>>>#3       a    1    3       0
>>>#6       b    1    3       0
>>>#9       c    1    3       1
>>>A.K.
>>>
>>>
>>>
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: Matthew Venesky <mvenesky at gmail.com>
>>>To: r-help at r-project.org
>>>Cc:
>>>Sent: Monday, April 29, 2013 11:12 AM
>>>Subject: [R] expanding a presence only dataset into presence/absence
>>>
>>>Hello,
>>>
>>>I'm working with a very large dataset (250,000+ lines in its' current form)
>>>that includes presence only data on various species (which is nested within
>>>different sites and sampling dates). I need to convert this into a dataset
>>>with presence/absence for each species. For example, I would like to expand
>>>"My current data" to "Desired data":
>>>
>>>My current data
>>>
>>>Species Site Date
>>>a 1 1
>>>b 1 1
>>>b 1 2
>>>c 1 3
>>>
>>>Desired data
>>>
>>>Species Present Site Date
>>>a 1 1 1
>>>b 1 1 1
>>>c 0 1 1
>>>a 0 2 2
>>>b 1 2 2
>>>C 0 2 2
>>>a 0 3 3
>>>b 0 3 3
>>>c 1 3 3
>>>
>>>I've scoured the web, including Rseek and haven't found a resolution (and
>>>note that a similar question was asked sometime in 2011 without an answer).
>>>Does anyone have any thoughts? Thank you in advance.
>>>
>>>--
>>>
>>>Matthew D. Venesky, Ph.D.
>>>
>>>Postdoctoral Research Associate,
>>>Department of Integrative Biology,
>>>The University of South Florida,
>>>Tampa, FL 33620
>>>
>>>Website: http://mvenesky.myweb.usf.edu/
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> 
> 



More information about the R-help mailing list