[R] for loop and if problem

Philipp Pagel p.pagel at wzw.tum.de
Tue Jan 6 17:34:25 CET 2009


On Tue, Jan 06, 2009 at 07:21:48AM -0800, Sake wrote:
> I'm heaving difficulties with a dataset containing gene names and positions
> of those genes.
> Not such a big problem, but each gene has multiple exons so it's hard to say
> where de gene starts and where it ends. I want the starting and ending
> position of each gene in my dataset.
> Attached is the dataset:
> http://www.nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv 
> Column 'B' is the gene name, 'G' is the starting position and 'H' is the
> stop position.

I don't really see how 'if' and 'for loops' are involved in the
question. You may want to give us a little more detail on what
exactly you need and what you tried unsuccessfully.  (By the way
-- there are no columns labeled 'B', 'G' or 'H' in the file).

Anyway - I believe this is what you are after:

# get minimum start position by gene
aggregate(dat[, c('Exon_Start.Chr.')], by=list(dat$Gene), min)
# get maximum stop position by gene
aggregate(dat[, c('Exon_Stop.Chr.')], by=list(dat$Gene), max)

Of course, these will only reflect the real start and stop
coordinates of the gene if ALL exons are given in the file.

cu
	Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
85350 Freising, Germany
http://mips.gsf.de/staff/pagel




More information about the R-help mailing list