[R] find multiple mode, sorry for not providing enough information

Abby Spurdle @purd|e@@ @end|ng |rom gm@||@com
Mon Mar 16 21:56:51 CET 2020


(Sorry, that was supposed to go to the mailing list).

Here's a solution to your original question:
---------
freq <- c (1,2,5,5,10,4,4,8,1,1,8,8,2,4,3,1,2,1,1,138,149,14,1,1)

unique.consecutive <- function (x)
{       dx <- diff (x)
        x [dx != 0]
}

which.maxs <- function (x, ..., include.endpoints=FALSE)
{       dx <- diff (x)
        if (any (dx == 0) )
                stop ("function needs unique-consecutive values")
        ndx <- length (dx)
        I <- c (FALSE, dx [-ndx] > 0 & dx [-1] < 0, FALSE)
        if (include.endpoints)
        {       I [1] <- (dx [1] < 0)
                I [ndx + 1] <- (dx [ndx] > 0)
        }
        which (I)
}

freq.sub <- unique.consecutive (freq)
maxv <- freq.sub [which.maxs (freq.sub, include.endpoints=TRUE)]

maxv
unique (maxv)
---------

Some comments:

My package, probhat, contains early prototype-quality functions for
discrete kernel smoothing.
This can be used to "smooth" frequency data.
Which in turn, can eliminate spurious modes.

https://cran.r-project.org/web/packages/probhat/vignettes/probhat.pdf

Unfortunately, bandwidth selection is manual.
Also note that currently it only returns probability mass (not
frequency) but it's very easy to to get frequency from probability
mass.

I'm planning to resume work on this package in two to three days, so
I'm open to suggestions...

On 3/17/20, Yuan Chun Ding <ycding using coh.org> wrote:
> Hi Jim,
>
> Yes, you are right.  I sorted the tem4$Var1 first, then find rising peaks in
> Freq variable from left to right.   I guess I probably need to define the
> minimal rising and drop on both side of a potential maxima, so avoid
> identifying really small rising peaks.  For example, I only want to identify
> the freq value of freq=10 (corresponding var1 = allele6),
> freq=8(var1=allele9),, freq=8( var1=allele12), and freq=149 (var1=allele23),
>  but ignore freq=4 (var10=allele15) and freq=2 (var1=allele18).
>
> I am still working on it, any help would be really appreciated.
>
> Thank you,
>
> Ding
>
> -----Original Message-----
> From: Jim Lemon [mailto:drjimlemon using gmail.com]
> Sent: Monday, March 16, 2020 1:10 AM
> To: Yuan Chun Ding; r-help mailing list
> Subject: Re: [R] find multiple mode, sorry for not providing enough
> information
>
> [Attention: This email came from an external source. Do not open attachments
> or click on links from unknown senders or unexpected emails.]
>
> ----------------------------------------------------------------------
> Hi Ding,
> While I was completely off the track in my first reply, the subsequent posts
> make your problem somewhat clearer. The way you state the problem suggests
> that the order of the values of "freq" is important.
> That is, it is not just a matter of finding local maxima, but the direction
> in which you approach those maxima is important. For example. I might want
> to only identify maxima with at least four monotonically increasing values
> preceding them and a decrease of at least half the value of the maximum in
> the succeeding value. By breaking down the problem into a set of criteria,
> these can be implemented in a function that will search the values in one
> direction, returning the locations of maxima that fulfil those criteria.
>
> Jim
>
> On Mon, Mar 16, 2020 at 3:11 PM Yuan Chun Ding <ycding using coh.org> wrote:
>>
>> sorry, I just came back.
>>
>> Yes,  Abby's understanding is right.
>>
>> > tem4$Var1
>>  [1]  1    3   4   5   6    7   8   9  10  11  12  13  14  15  16  17  18
>> 20   21   22    23     24   25   31
>> > tem4$Freq
>>  [1]   1   2   5   5  10   4   4   8   1    1    8    8     2     4    3
>>  1    2    1     1   138  149    14    1     1
>>
>> I have 2000 markers, this is just one example marker, the var1 is a VNTR
>> marker with alleles 1, 3, 4 etc, a multi-allele marker; the corresponding
>> frequency for each allele is 1,2 5 etc.  I want to convert this
>> multi-allele marker to bi-allele markers by choosing a cutoff value; I
>> would want the cut point to be allele 6 with frequency of 10, so  patients
>> with allele 1 to allele 5 are considered as carrying "short" allele,
>> allele 6 to 31 as "long" allele;  then sliding to next rsing frequency
>> peak, allele 8 with frequency of 8, etc.
>>
>> maybe those rising peaks are not really multiple modes, but I want to do
>> this type of data conversion.  I want to first determine m number of
>> modes, then convert input dat file into m different input files, then
>> perform Cox regression analysis for each converted file. I am stuck in the
>> step of find out m rise peaks.
>>
>> Thank you,
>>
>> Ding
>>
>
> ----------------------------------------------------------------------
> ------------------------------------------------------------
> -SECURITY/CONFIDENTIALITY WARNING-
>
> This message and any attachments are intended solely for the individual or
> entity to which they are addressed. This communication may contain
> information that is privileged, confidential, or exempt from disclosure
> under applicable law (e.g., personal health information, research data,
> financial information). Because this e-mail has been sent without
> encryption, individuals other than the intended recipient may be able to
> view the information, forward it to others or tamper with the information
> without the knowledge or consent of the sender. If you are not the intended
> recipient, or the employee or person responsible for delivering the message
> to the intended recipient, any dissemination, distribution or copying of the
> communication is strictly prohibited. If you received the communication in
> error, please notify the sender immediately by replying to this message and
> deleting the message and any accompanying files from your system. If, due to
> the security risks, you do not wish to rec
>  eive further communications via e-mail, please reply to this message and
> inform the sender that you do not wish to receive further e-mail from the
> sender. (LCP301)
> ------------------------------------------------------------
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list