[R] how to complete this task on data management

Petr Pikal petr.pikal at precheza.cz
Wed Aug 23 15:03:57 CEST 2006


Hi

This is a little bit more precise. My sugeestion works with unordered 
data and finds row index for second item lower then a threshold.

which(diff(cumsum(diff(data<3.5)==1)<2)!=0)+2

However with ordered data you need to slightly modify it

which(diff(cumsum(diff(data<3.5)!=0)<2)!=0)+2

I bet there is some other solution 

HTH
Petr



On 23 Aug 2006 at 19:23, zhijie zhang wrote:

Date sent:      	Wed, 23 Aug 2006 19:23:49 +0800
From:           	"zhijie zhang" <epistat at gmail.com>
To:             	"Petr Pikal" <petr.pikal at precheza.cz>
Subject:        	Re: [R] how to complete this task on data management

> *Dear friends,*
> * I'd like to explain it clearly*
> *   x
> **1 1
> 2 2
> 3 3
> 4 4
> 5 5
> *6 1
> 7 2
> 8 3
> I want to retain the first part of the dataset(1,2,3,4,5) if the
> continuous data(1,2,3) in the latter part of dataset is less than 3.5,
> in fact ,i want to know the row index (it's 6 in this dataset)that is
> less than 3.5. In fact, my dataset is very large, so i should find the
> index automatically. My idea is: First:Find the continous data in the
> latter dataset,which is less than a certain value,here it's 3.5.
>   X
> 6 1
> 7 2
> 8 3
> 
> Second:Identify the index (here,it's 6), which corresponds to the
> first data in the  latter dataset
>    X
> *6* 1
> Finally,select the the first (index-1) number.(6-1=5)
> *   x
> **1 1
> 2 2
> 3 3
> 4 4
> 5 5
> *
> Thanks very much.
> 
> 
> On 8/23/06, Petr Pikal <petr.pikal at precheza.cz> wrote:
> >
> > Hi
> >
> > I am not sure what you really want. If you try to preserve first
> > part of your objects just exclude them from operation e.g.
> >
> > data[-(1:5),] will exclude first five rows from your dataframe.
> >
> > However it is unclear what you want to do next. Instead of three
> > items you want only add one different?
> >
> > data.frame(x=c(data[(1:5),],6))
> >
> > or another vector
> >
> > data.frame(x=c(data[(1:5),],some.other.data))
> >
> > Following probably too complicated construction tells you which is
> > the position of the second value lower then some threshold (in this
> > case 3.5) in a vector.
> >
> > which(diff(cumsum(diff(data<3.5)==1)<2)!=0)+2
> >
> > HTH
> > Petr
> >
> >
> >
> > On 23 Aug 2006 at 11:23, zhijie zhang wrote:
> >
> > Date sent:              Wed, 23 Aug 2006 11:23:03 +0800
> > From:                   "zhijie zhang" <epistat at gmail.com>
> > To:                     R-help at stat.math.ethz.ch
> > Subject:                [R] how to complete this task on data
> > management
> >
> > > Dear friends,
> > >  When i clean my dataset , i met a difficulty
> > >  suppose my data set is :
> > > *> data<-data.frame(x=c(1:5,1,2,3))
> > > > data
> > >   x
> > > 1 1
> > > 2 2
> > > 3 3
> > > 4 4
> > > 5 5*
> > > 6 1
> > > 7 2
> > > 8 3
> > > Now i need to add the data which are less than 3.5 at the bottom,
> > > not including the top data, so the results should be :
> > >   x
> > > 1 1
> > > 2 2
> > > 3 3
> > > 4 4
> > > 5 5
> > > *6 6*
> > > I tried to use " data[data$x>3.5,]" to do it , but it also delete
> > > the first several numbers,* How to finish it ?* Thanks very much.
> > > -- Kind Regards, Zhi Jie,Zhang
> > >
> > >  [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> > Petr Pikal
> > petr.pikal at precheza.cz
> >
> >
> 
> 
> -- 
> Kind Regards,
> Zhi Jie,Zhang ,PHD
> Department of Epidemiology
> School of Public Health
> Fudan University
> Tel:86-21-54237149
> 

Petr Pikal
petr.pikal at precheza.cz



More information about the R-help mailing list