[R] Tricky filtering

Cacique Samurai c@c|que@@mur@| @end|ng |rom gm@||@com
Thu Oct 31 10:37:02 CET 2019


Hello Pert, thanks for your reply!

You are right, my problem is just between ANT01 and ANT02. All other will
keep in the filtered data. I have six more stations.

Looks like your solution will work pretty well for me! Once that I have to
I think that I can insert this inside a function and use lapply to use it
with all data that I have - that I can separe fishes by code.

I just do not understand one thing:

In the keep variable assumes value of 2 in the ANT that I have to keep, but
value of 1 for other stations. How I can keep with just necessary data
after use your solution?

Thanks again for your attention and help.

Raoni

Em qui, 31 de out de 2019 às 04:30, PIKAL Petr <petr.pikal using precheza.cz>
escreveu:

> Hi.
>
> Bert's questions should be clarified. But from your question I understand
> that only ANT01 and ANT02 are the Stations which you want to filter and all
> others you want to keep regardless of condition. If this is true, I would
> add the new column which would have one value for ANT stations and
> different
> for all others (if you have more than one). Than you could set flag which
> is
> the biggest number in each day. And after that you could add in each day
> stations different from ANT and want to keep.
>
> I named your data as test and change them to data frame as I am not
> familiar
> with tibbles.
>
> The code is like that.
> test$m <- ave(test$N_records, interaction(test$Date, test$Station),
> FUN=mean)
> test$flag <- ave(test$m, test$Date, FUN=function(x) max(x) == x)
> test$keep <- test$flag + (test$Station == "ETE01")*1
>
> but you need to think about questions asked by Bert.
>
> Cheers
> Petr
>
> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Bert Gunter
> > Sent: Thursday, October 31, 2019 5:18 AM
> > To: Cacique Samurai <caciquesamurai using gmail.com>
> > Cc: R help <r-help using r-project.org>
> > Subject: Re: [R] Tricky filtering
> >
> > Thanks for the nice dput example, but your specification confuses me.
> > What if the 2 records with largest Mean_power are not the same as the two
> > with largest N_records. Do you want to keep all four records? Or various
> > combinations of this question that would keep 3 records. And will you
> > always have two records on a date, or could you have just one? And if the
> 2
> > records with largest Mean_power always also have the largest N_records,
> > then you only need to choose the two with largest Mean_power and can
> > ignore the N_records, right?
> >
> > Once you have answered these questions -- or someone else has a better
> > understanding than I -- it should be easy. It will require a loop of one
> form or
> > another, however, and therefore might take a while.
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> > and sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Wed, Oct 30, 2019 at 7:55 PM Cacique Samurai
> > <caciquesamurai using gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I had a fish telemetry data with more then 11 million lines. I had
> > > some false records in the data, that I have to eliminate. I can solve
> > > this using a loop, but I think that dplyr:: filter could be faster and
> > > elegant. I just can't figure out how to do it.
> > >
> > > At this moment, I already summarized this raw data, and had something
> > > like this (dput at end of e-mail):
> > >
> > > Date Station Antenna Mean_power N_records *Action need (manually
> > > inserted)*
> > > 29/03/2019 ANT01 1 108 1704 Remove
> > > 29/03/2019 ANT01 2 94 1219 Remove
> > > 29/03/2019 ANT02 1 220 3029 Keep
> > > 29/03/2019 ANT02 2 219 2711 Keep
> > > 30/03/2019 ANT01 1 204 2289 Keep
> > > 30/03/2019 ANT01 2 172 1477 Keep
> > > 30/03/2019 ANT02 1 88 913 Remove
> > > 30/03/2019 ANT02 2 72 1080 Remove
> > > 30/03/2019 ETE01 AH0 87 1 Keep
> > >
> > > The problem occurs between Stations ANT01 and ANT02. In the same day,
> > > I have to keep the pair of records that have bigger Mean_power and
> > > more N_records. In this example, I have to keep records in Station
> > > ANT02 in
> > > 29/03 and of ANT01 and ETE01 in 30/03. If I do not have more than
> > > ANT01 and
> > > ANT02 in the same day, it was a simple question.
> > >
> > > I have to do this for each marked fish, that is identified by a Code
> > > supres here for resuming.
> > >
> > > Thanks in advanced,
> > >
> > > Raoni
> > >
> > >
> > > structure(list(Date = structure(c(17984, 17984, 17984, 17984, 17985,
> > > 17985, 17985, 17985, 17985), class = "Date"), Station =
> > > c("ANT01","ANT01", "ANT02", "ANT02", "ANT01", "ANT01", "ANT02",
> > > "ANT02","ETE01"), Antenna = c("1", "2", "1", "2", "1", "2", "1",
> > > "2","AH0"), Media_power = c(108, 94, 220, 219, 204, 172, 88, 72, 87),
> > > N_records = c(1704L, 1219L, 3029L, 2711L, 2289L, 1477L, 913L, 1080L,
> > > 1L)), row.names = c(NA, -9L), class = c("grouped_df", "tbl_df", "tbl",
> > > "data.frame"), groups = structure(list(Date = structure(c(17984,
> > > 17984, 17985, 17985, 17985), class = "Date"), Station = c("ANT01",
> > > "ANT02", "ANT01", "ANT02", "ETE01"), .rows = list(1:2, 3:4, 5:6, 7:8,
> > > 9L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl",
> > > "data.frame"), .drop = TRUE))
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Raoni Rosa Rodrigues
> > > Research Associate of Fish Transposition Center CTPeixes Universidade
> > > Federal de Minas Gerais - UFMG Brasil rodrigues.raoni using gmail.com
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>


-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.raoni using gmail.com

	[[alternative HTML version deleted]]



More information about the R-help mailing list