[R] function on factors - how best to proceed

Karin Lagesen karin.lagesen at medisin.uio.no
Wed Sep 19 13:16:54 CEST 2007


Sorry about this one being long, and I apologise beforehand if there
is something obvious here that I have missed. I am new to creating my
own functions in R, and I am uncertain of how they work.

I have a data set that I have read into a data frame:

> gctable[1:5,]
     refseq geometry X60_origin X60_terminus  length  kingdom
1 NC_009484      cir    1790000       773000 3389227 Bacteria
2 NC_009484      cir    1790000       773000 3389227 Bacteria
3 NC_009484      cir    1790000       773000 3389227 Bacteria
4 NC_009484      cir    1790000       773000 3389227 Bacteria
5 NC_009484      cir    1790000       773000 3389227 Bacteria
                  grp feature gene begin dir gc_content replicor LEADLAG
1 Alphaproteobacteria     CDS  CDS   261   +   0.654244    RIGHT    LEAD
2 Alphaproteobacteria     CDS  CDS  1737   -   0.651408    RIGHT     LAG
3 Alphaproteobacteria     CDS  CDS  2902   +   0.607843    RIGHT    LEAD
4 Alphaproteobacteria     CDS  CDS  3693   +   0.617647    RIGHT    LEAD
5 Alphaproteobacteria     CDS  CDS  4227   +   0.699208    RIGHT    LEAD
> 

Most of these columns are factors.

Now, I have a function that I would like to employ on this data
frame. Right now I cannot get it to work, and that seems to be due to
the columns in the data frame being factors. I tested it with a data
frame created from vectors, and it worked fine.

The function:

percentdistance <- function(origin, terminus, length, begin, replicor){
print(c(origin, terminus, length, begin, repl))
d = 0
if (terminus>origin) {
  if(replicor=="LEFT") {
    d = -((origin-begin)%%length)
  }
else {
    d = (begin-origin)
  }
}
else {
  if (replicor=="LEFT") {
    d=(origin-begin)
  }
  else{
    d = -((begin-origin)%%length)
  }
}
d/length*2
}

The error I get:
> percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, gctable$begin, gctable$replicor)
    [1]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [19]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [37]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [55]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [73]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [91]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
  [109]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
  [127]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
.....[99919]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99937]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99955]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99973]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99991]   2   2   2   2   2   2   2   2   2
 [ reached getOption("max.print") -- omitted 8526091 entries ]]
Error in if (terminus > origin) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: > not meaningful for factors in: Ops.factor(terminus, origin) 
2: the condition has length > 1 and only the first element will be used in: if (terminus > origin) { 
> 

This worked nice when the input were columns from a data frame created
from vectors.

I have also tried the different apply-functions, although I am
uncertain of which one would be appropriate here.


I would like to use this function to create a new data frame which
would look something like this:

new_frame = (gctable$feature, gctable$gene, gctable$kingdom, gctable$grp, gctable$gc_content, percentdistance(gctable))

I am uncertain of how to proceed. Should I deconstruct the data frame
within the function, or should I get just the numbers out of the
factors and input that into the function? Or is my solution way off
from how things are done in R?

Thankyou very much for your help!

Karin
-- 
Karin Lagesen, PhD student
karin.lagesen at medisin.uio.no
http://folk.uio.no/karinlag



More information about the R-help mailing list