[R] [newbie] aggregating table() results and simplifying code with loop

Sun Sep 16 20:24:40 CEST 2012

Hi Davide,

I had some time this afternoon and I wonder if this approach is llkely to get the results you want?  As before it is not complete but I think it holds promise.  

On the other hand Rui is a much better programer than I am so he may have a much cleaner solution.  My way still looks labour-intensive at the moment.

I am using the plyr package which you will probably have to install.
load.packages("plyr") should do it.
==========================================================
# load the plyr package -
library(plyr)

# sample data
T80<- read.csv("/home/john/rdata/sample.csv",  header = TRUE, sep = ";")
# Davide's actual read statement
# T80<-read.table(file="C:/sample.txt", header=T, sep=";")

# Looking for Maize
pattern  <-  c("2Ma", "2Ma","2Ma", "2Ma","2Ma")

# one row examples to see that is happening
T80[1,3:7]
T80[1, 3:7] == pattern

T80[405, 3:7]
T80[405, 3:7] == pattern

T80[55, 3:7] == pattern

# now we apply the patterns to the entire data set.
pp1  <-  T80[, 3:7] == pattern

# paste the TRUEs and FALSEs together to form a single variable
concatdat  <-  paste(pp1[, 1], pp1[, 2], pp1[, 3], pp1[, 4],pp1[,5] ,  sep = "+")

# Assmble new data frame. 
maizedata  <-  data.frame(T80$WS, concatdat)
names(maizedata)  <-  c("WS", "crop_pattern")

mzcount  <-  ddply(maizedata, .(WS, crop_pattern),  summarize, count = length(crop_pattern))
mzcount  # This is all the data not just the relevant maise patterns 

# This seems to be getting us somewhere though we are not not there yet
# Does this subset  look like we are going in the right direction?
m51  <-  subset(mzcount, 
mzcount$crop_pattern == "FALSE+FALSE+FALSE+FALSE+TRUE" 
| mzcount$crop_pattern == "TRUE+FALSE+FALSE+FALSE+FALSE")

m51  <-  ddply(m51, .(WS), summarize, count = sum(count))
m51
=================================================================

John Kane
Kingston ON Canada

> -----Original Message-----
> From: ridavide at gmail.com
> Sent: Sat, 15 Sep 2012 19:00:29 +0200
> To: jrkrideau at inbox.com, ruipbarradas at sapo.pt
> Subject: Re: [R] [newbie] aggregating table() results and simplifying
> code with loop
> 
> Thanks Rui, thanks John for your very different solutions.
> 
> I'll try to break my questions into smaller steps following your tips.
> However, not everything is clear for me... so before giving you a
> feed-back I need to study further your answers. For the moment I could
> specify that I'm looking for the following 19 patterns:
> 
> 1. True, False, False, False, False # return period of 5 years (1/2)
> 2. False, False, False, False, True # return period of 5 years (2/2)
> 3. True, False, False, False, True # return period of 4 years (1/3)
> 4. False, True, False, False, False # return period of 4 years (2/3)
> 5. False, False, False, True, False # return period of 4 years (3/3)
> 6. True, False, False, True, False # return period of 3 years (1/3)
> 7. False, True, False, False, True # return period of 3 years (2/3)
> 8. False, False, True, False, False # return period of 3 years (3/3)
> 9. False, True, False, True, False # return period of 2 years (1/2)
> 10. True, False, True, False, True # return period of 2 years (1/2)
> 11. True, True, True, True, True # mono-succession of 5 years
> 12. False, True, True, True, True # mono-succession of 4 years (1/2)
> 13. True, True, True, True, False # mono-succession of 4 years (2/2)
> 14. True, False, True, True, True # mono-succession of 3 years (1/5)
> 15. True. True. True. False, True # mono-succession of 3 years (2/5)
> 16. False, False, True, True, True # mono-succession of 3 years (3/5)
> 17. True, True, True, False, False # mono-succession of 3 years (4/5)
> 18. False, True, True, True, False # mono-succession of 3 years (5/5)
> 19. True, True, False, True, True # crops repeated two years
> 
> In particular, I want to apply all these 19 patterns to 7 (out of 11)
> land covers: 2BC, 2Co, 2Ma, 2We, 2MG, 2ML, 2PG. The pattern are so
> structured: True means presence of a given land cover (iteratively,
> one of the seven listed above), False means any other land-cover
> (amidst the remainder 10).
> 
> Thanks again for any further help.
> Greetings,
> Dd
> 
> ***********************************************************
> Davide Rizzo
> website :: http://sites.google.com/site/ridavide/
> 
> 
> On Sat, Sep 15, 2012 at 5:51 PM, John Kane <jrkrideau at inbox.com> wrote:
>> I have not seen any replies to your questions so I will suggest an
>> approach that may work if I can get a function to work.
>> 
>> If I understand what you want, you have a pattern something like this:
>> pattern1  <-  c("2Ma", "no2Ma","no2Ma", "no2Ma","no2Ma")
>> pattern2  <-  c("no2Ma", 'no2Ma', "no2Ma", "no2Ma", "2Ma")
>> 
>> for each five year period where 2Ma stands to Maize, one of 11 different
>> grains
>>   1AU   2BC   2Co   2Ma   2MG   2ML   2oc   2PG   2SA   2We   3sN
>> 
>> and what you want to know is if each year gives a pattern like
>> 
>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>> 
>> If I understand the patterns you only care for the two above, is that
>> correct?
>> 
>> I am running out of time today but I think that this approach will get
>> you started
>> ===========================================================
>> 
>> T80<-read.table(file="C:/sample.txt", header=T, sep=";")
>> 
>> # Reminder of just what we want to get as a final result.
>> check1 <-  c(TRUE, FALSE, FALSE, FALSE, FALSE)
>> check2  <-  c(FALSE, FALSE, FALSE, FALSE, TRUE)
>> 
>> pattern1  <-  c("2Ma", "2Ma","2Ma", "2Ma","2Ma")
>> 
>> # one row examples to see that is happening
>> T80[1,3:7]
>> T80[1, 3:7] == pattern1
>> 
>> T80[405, 3:7]
>> T80[405, 3:7] == pattern1
>> 
>> # now we apply the patterns to the entire data set.
>> pp1  <-  T80[, 3:7] == pattern1
>> pp2  <-  T80[, 3:7] == pattern2
>> 
>> # reassign the WS values so we know where the data is from
>> WSnames  <-  rep(T80$WS, 2)
>> 
>> # Assmble new data frame.
>> maizedata  <-  data.frame(WSnames, rbind(pp1,pp2))
>> ========================================================
>> 
>> Now, assuming this runs for you and I have not made a serious mistake in
>> logic, kyou should be able to do some subsetting  (?subset)  to extract
>> only the
>> check1 and check2 patterns above.
>> 
>> This is where I ran into trouble as I don't have the time this morning
>> to work out the subsetting conditions. It looks tricking and you
>> probably need a couple of subsetting moves.
>> 
>> It's not a pretty  solutlion and, particularly, I expect someone could
>> clean it up to make the subsetting easier or even unnecessary but I hope
>> it helps.
>> 
>> Once you have extracted what you want   use apply() or perhaps the plyr
>> package to aggregate the results.
>> 
>> Repeat for all grains.  Actually look into setting the whole thing up as
>> a function. You should be able to write the program once as a function
>> and do a loop or an apply() to do all 11 grains in one go.
>> 
>> Best of luck.
>> 
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -----Original Message-----
>>> From: ridavide at gmail.com
>>> Sent: Thu, 13 Sep 2012 15:36:28 +0200
>>> To: r-help at r-project.org
>>> Subject: [R] [newbie] aggregating table() results and simplifying code
>>> with loop
>>> 
>>> Dear all,
>>> I'm looking for primary help at aggregating table() results and at
>>> writing a loop (if useful)
>>> 
>>> My dataset ( http://goo.gl/gEPKW ) is composed of 23k rows, each one
>>> representing a point in the space of which we know the land cover over
>>> 10 years (column y01 to y10).
>>> 
>>> I need to analyse it with a temporal sliding window of 5 years (y01 to
>>> y05, y02 to y06 and so forth)
>>> For each period I'm looking for specific sequences (e.g., Maize,
>>> -noMaize, -noMaize, -noMaize, -noMaize) to calculate the "return time"
>>> of principal land covers: barley (2BC), colza (2Co), maize (2Ma), etc.
>>> I define the "return time" as the presence of a given land cover
>>> according to a given sequence. Hence, each return time could require
>>> the sum of different sequences (e.g., a return time of 5 years derives
>>> from the sum of [2Ma,no2Ma,no2Ma,no2Ma,no2Ma] +
>>> [no2Ma,no2Ma,no2Ma,no2Ma,2Ma]).
>>> I need to repeat the calculation for each land cover for each time
>>> window. In addition, I need to repeat the process over three datasets
>>> (the one I give is the first one, the second one is from year 12 to
>>> year 24, the third one from year 27 to year 31. So I have breaks in
>>> the monitoring of land cover that avoid me to create a continuous
>>> dataset). At the end I expect to aggregate the sum for each spatial
>>> entity (column WS)
>>> 
>>> I've started writing the code for the first crop in the first 5yrs
>>> period (http://goo.gl/FhZNx) then copying and pasting it for each crop
>>> then for each time window...
>>> Moreover I do not know how to aggregate the results of table(). (NB
>>> sometimes I have a different number of WS per table because a given
>>> sequence could be absent in a given spatial entity... so I have the
>>> following warning msg: number of columns of result is not a multiple
>>> of vector length (arg 1)). Therefore, I'm "obliged" to copy&paste the
>>> table corresponding to each sequence....
>>> 
>>> FIRST QUEST. How to aggregate the results of table() when the number
>>> of columns is different?
>>> Or the other way around: Is there a way to have a table where each row
>>> reports the number of points per time return per WS? something like
>>> 
>>> WS1    WS2    WS3    WS4    ...    WS16    crop    period
>>> 23    15    18    43    ...    52       Ma5    01
>>> 18    11    25    84    ...    105       Ma2    01
>>> ...    ...    ...    ...    ...    ...    ...    ...
>>> ...    ...    ...    ...    ...    ...    Co5    01
>>> ...    ...    ...    ...    ...    ...    ...    ...
>>> ...    ...    ...    ...    ...    ...    Ma5    02
>>> ...    ...    ...    ...    ...    ...    ...    ...
>>> In this table each row should represent a return time for a given land
>>> cover a given period (one of the 6 time window of 5 years)?
>>> 
>>> SECOND QUEST. Could a loop (instead of a modular copy/paste code)
>>> improve the time/reliability of the calculation? If yes, could you
>>> please indicate me some entry-level references to write it?
>>> 
>>> I am aware this are newbie's questions, but I have not be able to
>>> solve them using manuals and available sources.
>>> Thank you in advance for your help.
>>> 
>>> Greetings,
>>> Dd
>>> 
>>> PS
>>> R: version 2.14.2 (2012-02-29)
>>> OS: MS Windows XP Home 32-bit SP3
>>> 
>>> *****************************
>>> Davide Rizzo
>>> post-doc researcher
>>> INRA UR055 SAD-ASTER
>>> website :: http://sites.google.com/site/ridavide/
>> 
>> ____________________________________________________________
>> GET FREE 5GB EMAIL - Check out spam free email with many cool features!
>> Visit http://www.inbox.com/email to find out more!
>> 
>>

____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!