[R] Extracting data from a file containing data

Thu Jul 2 02:49:20 CEST 2015

Here is a way to do case I.  It uses the 'tidyr' package and produces
results like:

> case1[[1]]
     YR JF-R_NINO1.2 MAM-R_NINO1.2 JJA-R_NINO1.2 OND-R_NINO1.2
1  1982           ML            ML            ME            SE
2  1983           SE            SE            SE            ME
3  1984           ML            ML            ML            ML
4  1985           SL            SL            SL            ML
5  1986           ME            ML            ML            ME
6  1987           ME            SE            SE            SE
7  1988           ML            ML            SL            SL
8  1989           ML            ML            ML            ML
9  1990           ML            ML            ML            ML
10 1991           ML            ML            ME            ME
11 1992           ME            SE            ME            ML
12 1993           ME            ME            ME            ME
13 1994           ML            SL            ML            ME
14 1995           ME            ML            ML            ML
15 1996           ML            SL            SL            SL
16 1997           ML            SE            SE            SE
17 1998           SE            SE            SE            ML
18 1999           ML            ML            SL            ML
19 2000           ML            ML            ML            ML
20 2001           ML            ME            ML            SL
21 2002           ML            ME            ML            ME
22 2003           ML            SL            ML            ME
23 2004           ML            ML            SL            ME
24 2005           ML            ML            ML            ML
25 2006           ME            ML            ME            SE
26 2007           ME            SL            SL            SL
27 2008           ML            ME            ME            ML
28 2009           ML            ME            ME            ME
29 2010           ME            ME            SL            SL
30 2011           ML            ME            ME            ML
31 2012           ML            ME            ME            ML

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jun 26, 2015 at 5:27 AM, Peter Tuju <peterenos at ymail.com> wrote:

> Dear Jim Holtman,
>
> Thank you very much for your help.
>
> The problem I'm trying to solve is “To determine weather the evolution of
> ENSO can influence rainfall over Tanzania”. In this study I have two types
> of data, ie Rainfall data (for 23 stations) and Nino indices data, both
> spanning a period of 31 years (1982-2012).
>
>  *CASE I:*
> 1. In “*Nino.indices.txt*” data for all columns of the nino regions (both
> for anomalies and SST), to calculate the Season means "January & February
> (JF)", “March, April and may (MAM)", "June, July & August (JJA)" and
> "October, November and December (OND" for each year. and have the output in
> table form as;
>
>  *Nino indices Mean*
>       Years
>  JF
> SST Mean
> NINO1+2
>  JF
> ANOM Mean
> NINO1+2
>  MAM
> SST Mean
> NINO3
>  MAM
> ANOM Mean
> NINO3
>  JJA
> SST Mean
> NINO4
>  JJA
> ANOM Mean
> NINO4
>  OND
> SST Mean
> NINO3.4
>  OND
> SST Mean
> NINO3.4
>   1982
>
>
>
>
>
>
>
>
>    1983
>
>
>
>
>
>
>
>
>    - - - -
>
>
>
>
>
>
>
>
>    - - - -
>
>
>
>
>
>
>
>
>    2012
>
>
>
>
>
>
>
>
>
>  2. To use the Yearly anomalies for each column in nino regions to
> classify the events as;
> (i). If ANOM Mean> 1, then I assign it to “SE” (Being as Strong El-nino)
> (ii). If 0<ANOM Mean<=1 , then I assign it to “ME” (Being as Moderate
> El-nino)
> (iii). If ANOM== 0, then I assign it to “NT” (Being as Neutral Condition)
> (iv). If ANOM Mean< (-1), then I assign it to “SL” (Being as Strong
> La-nina)
> (v). If -1<=ANOM Mean< 0 , then I assign it to “ML” (Being as Moderate
> La-nina)
> The output have to be in table form as;
>
>  *FOR NINO1+2*
>     Years
>  JF
> ANOM Mean
> NINO1+2
>  MAM
> ANOM Mean
> NINO1+2
>  JJA
> ANOM Mean
> NINO1+2
>  OND
> SST Mean
> NINO1+2
>   1982
>  SE
>
>
>
>    1983
>
>
>   SL
>
>    - - - -
>
>
>
>   ML
>   - - - -
>
>   ME
>
>
>    2012
>
>
>
>   *SL*
>
>  *FOR NINO3*
>     Years
>  JF
> ANOM Mean
> NINO3
>  MAM
> ANOM Mean
> NINO3
>  JJA
> ANOM Mean
> NINO3
>  OND
> SST Mean
> NINO3
>   1982
>  *SE *
>
>
>
>    1983
>
>
>
>
>    - - - -
>
>
>
>
>    - - - -
>
>   ME
>
>
>    2012
>
>
>
>   *SL*
>
>  *FOR NINO4*
>     Years
>  JF
> ANOM Mean
> NINO4
>  MAM
> ANOM Mean
> NINO4
>  JJA
> ANOM Mean
> NINO4
>  OND
> SST Mean
> NINO4
>   1982
>  *SE *
>
>
>
>    1983
>
>
>
>
>    - - - -
>  ML
>
>
>   SL
>   - - - -
>
>   ME
>
>
>    2012
>
>
>
>   *SL*
>
>
>  *FOR NINO3.4*
>     Years
>  JF
> ANOM Mean
> NINO3.4
>  MAM
> ANOM Mean
> NINO3.4
>  JJA
> ANOM Mean
> NINO3.4
>  OND
> SST Mean
> NINO3.4
>   1982
>  *SE *
>  SL
>
>
>    1983
>
>
>
>
>    - - - -
>
>
>   ML
>
>    - - - -
>
>   ME
>
>
>    2012
>
>
>
>   *SL*
>
>
>  3. To plot the time series graph for each nino regions using the Yearly
> Anomalies.
>
>
>  *CASE II:*
>  Consider the Rainfall station data;
>  1. In some files containing the data there are missing data labeled by
> variable “m”. I want to substitute these missing data with long term mean.
>  2. Find the rowSum and anomalies of each file containing the data.
>  3. To find the cumsum of the rowSum of each file containing the data.
>  4. Plot the single mass curves ie. Plot(Year, cumsum) for each file and
> name its title as the name of the corresponding file name.
>  5. Plot the time series graphs for seasons JF, MAM, JJA and OND for each
> file and name give its name as “Time series graph for “name of the file””
>  6. To find the seasonal correlations for JF, MAM, JJA and OND using the
> anomalies of the rainfall station data and that of each nino region
> indices, and have the results in table form as;
>
>  *CORRELATIONS OF RAINFALL AND NINO1+2 ANOMALIES*
>     *Years*
>  *JF*
>  *MAM*
>  *JJA*
>  *OND*
>   *1982 *
>
>
>
>
>    *1983*
>
>
>
>
>    *- - - -*
>
>
>
>
>    *- - - -*
>
>
>
>
>    *2012*
>
>
>
>
>
>  *CORRELATIONS OF RAINFALL AND NINO3 ANOMALIES*
>     *Years*
>  *JF*
>  *MAM*
>  *JJA*
>  *OND*
>   *1982 *
>
>
>
>
>    *1983*
>
>
>
>
>    *- - - -*
>
>
>
>
>    *- - - -*
>
>
>
>
>    *2012*
>
>
>
>
>
>  *CORRELATIONS OF RAINFALL AND NINO4 ANOMALIES*
>     *Years*
>  *JF*
>  *MAM*
>  *JJA*
>  *OND*
>   *1982 *
>
>  ...
>
> [Message clipped]
-------------- next part --------------
input <- read.table("C:\\Users\\jh52822\\Downloads\\Nino_indices.txt"   
                    , header = TRUE
                    , as.is = TRUE
                    )
# create factors
input$season <- factor(c(rep("JF", 2), rep("MAM", 3), rep("JJA", 3)
        , NA, rep("OND", 3)
        )[input$MON], levels = c("JF", "MAM", "JJA", "OND"))

# leave off the MON (-2) column from the data            
res <- aggregate(. ~ season + YR, data = input[, -2], FUN = 'mean')
head(res,10)

# this function determines the labels to apply
f_labels <- 
function(x)
{
    result <- as.character(cut(x  # convert to character since it is a factor
        , breaks = c(-Inf, -1, 0, 1, Inf)
        , labels = c("SL", "ML", "ME", "SE")
        ))
    # check for zero
    result[x == 0] <- "NT"
    result
}

# get the "ANOM" columns for processing since these are the ones that
# we want to test for values.
anom <- which(grepl("^ANOM", names(res)))

# for each ANOM column iterate and compute the labels based on values
for (i in anom){
    # use the variable in previous column for the name
    res[[paste0("R_", names(res[i - 1L]))]] <- f_labels(res[[i]])
}

# the tidyr package helps to format the results
require(tidyr)
# columns to use as summary -- added above
sum_cols <- paste0("R_", names(res[anom - 1L]))

case1 <- lapply(sum_cols, function(.col){
    # need to restrict what data we want
    x <- spread_(res[, c("YR", "season", .col)], "season", .col)
    # append the name of the data to the season
    names(x)[-1] <- paste(names(x[-1]), .col, sep = '-')
    x  # return value
})