[R] two difficult loop

greg holly mak.hholly at gmail.com
Mon Jun 13 04:41:06 CEST 2016


Hi Jim;

Thanks so much for this info. I did not know this as I am very much new in
R, So do you think that, rather than using unique !duplicated would be
better to use?

Thanks in advance,

Greg

On Sun, Jun 12, 2016 at 7:06 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

> Hi Greg,
> You've got a problem that you don't seem to have identified. Your
> "reg" field in the "map" data frame can define at most 100000 unique
> values. This means that each value will be repeated about 270 times.
> Unless there are constraints you haven't mentioned, we would expect
> that in 135 cases for each value, the values in each "ref" row will be
> in the reverse order and the spans may overlap. I notice that you may
> have tried to get around this by sorting the "map" data frame, but
> then the order of the rows is different, and the number of rows
> "between" any two values changes. Apart from this, it is almost
> certain that the number of values of "p > 0.85" in the multiple runs
> between each set of "ref" values will be different. It is possible to
> perform both tasks that you mention, but only the second will yield an
> unique or tied value for all of the cases. So your result data frame
> will have an unspecified number of values for each row in "ref" for
> the first task.
>
> Jim
>
>
> On Mon, Jun 13, 2016 at 6:14 AM, greg holly <mak.hholly at gmail.com> wrote:
> > Dear all;
> >
> >
> >
> > I have two data sets, data=map and data=ref). A small part of each data
> set
> > are given below. Data map has more than 27 million and data ref has about
> > 560 rows. Basically I need run two different task. My R codes for these
> > task are given below but they do not work properly.
> >
> > I sincerely do appreciate your helps.
> >
> >
> > Regards,
> >
> > Greg
> >
> >
> >
> > Task 1)
> >
> > For example, the first and second columns for row 1 in data ref are 29220
> > 63933. So I need write an R code normally first look the first row in ref
> > (which they are 29220 and 63933) than summing the column of "map$rate"
> and
> > give the number of rows that >0.85. Then do the same for the second,
> > third....in ref. At the end I would like a table gave below (the results
> I
> > need). Please notice the all value specified in ref data file are exist
> in
> > map$reg column.
> >
> >
> >
> > Task2)
> >
> > Again example, the first and second columns for row 1 in data ref are
> 29220
> > 63933. So I need write an R code give the minimum map$p for the 29220
> > -63933 intervals in map file. Than
> >
> > do the same for the second, third....in ref.
> >
> >
> >
> >
> > #my attempt for the first question
> >
> > temp<-map[order(map$reg, map$p),]
> >
> > count<-1
> >
> > temp<-unique(temp$reg
> >
> > for(i in 1:length(ref) {
> >
> >   for(j in 1:length(ref)
> >
> >   {
> >
> > temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]
> > & temp[cumsum(temp$rate)
> >>0.70,])
> >
> > count=count+1
> >
> >     }
> >
> > }
> >
> > #my attempt for the second question
> >
> >
> >
> > temp<-map[order(map$reg, map$p),]
> >
> > count<-1
> >
> > temp<-unique(temp$reg
> >
> > for(i in 1:length(ref) {
> >
> >   for(j in 1:length(ref)
> >
> >   {
> >
> > temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,])
> >
> > output<-temp2[temp2$p==min(temp2$p),]
> >
> >     }
> >
> > }
> >
> >
> >
> > Data sets
> >
> >
> >   Data= map
> >
> >   reg   p      rate
> >
> >  10276 0.700  3.867e-18
> >
> >  71608 0.830  4.542e-16
> >
> >  29220 0.430  1.948e-15
> >
> >  99542 0.220  1.084e-15
> >
> >  26441 0.880  9.675e-14
> >
> >  95082 0.090  7.349e-13
> >
> >  36169 0.480  9.715e-13
> >
> >  55572 0.500  9.071e-12
> >
> >  65255 0.300  1.688e-11
> >
> >  51960 0.970  1.163e-10
> >
> >  55652 0.388  3.750e-10
> >
> >  63933 0.250  9.128e-10
> >
> >  35170 0.720  7.355e-09
> >
> >  06491 0.370  1.634e-08
> >
> >  85508 0.470  1.057e-07
> >
> >  86666 0.580  7.862e-07
> >
> >  04758 0.810  9.501e-07
> >
> >  06169 0.440  1.104e-06
> >
> >  63933 0.750  2.624e-06
> >
> >  41838 0.960  8.119e-06
> >
> >
> >  data=ref
> >
> >   reg1         reg2
> >
> >   29220     63933
> >
> >   26441     41838
> >
> >   06169     10276
> >
> >   74806     92643
> >
> >   73732     82451
> >
> >   86042     93502
> >
> >   85508     95082
> >
> >
> >
> >        the results I need
> >
> >      reg1      reg2 n
> >
> >    29220   63933  12
> >
> >    26441   41838   78
> >
> >    06169 10276  125
> >
> >    74806 92643   11
> >
> >    73732 82451   47
> >
> >    86042 93502   98
> >
> >    85508 95082  219
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list