[R] help with column substaction with a twist
john.archie.mckown at gmail.com
Sun Jul 20 17:36:05 CEST 2014
On Sun, Jul 20, 2014 at 6:33 AM, Ubernerdy <ubernerdy at gmail.com> wrote:
> Hello guys!
> I have been messing around with R for a while now, but this situation has
> me a bit stumped. I was unable to solve it by reading documentation.
> So I have this table (currently in Excel - could export it as csv) with
> values in 3 columns. Let's call them value P (for position), value B
> (beginning) and E (end). P represents the position of a mutation in the
> genome, B and E are the beginnings and ends of a nearby gene that either
> contains the mutation or not.
> I am trying to compute the distance between the mutation and the gene.
> If the mutation is contained in the gene, that is value P is greater than B
> and lesser than E, the result is 0.
> If the mutation is "left" of the gene, the distance is negative and is
> equal to P-B.
> If the mutation is "right" of the gene, the distance is positive and is
> equal to P-E.
> How would i achieve this in R?
> Regards and thanks, S.
Jim has given you the answer in R. So why am I "shooting off my
mouth?". Well, good question! <grin/>
But I thought it might be a bit helpful to mention a couple of things.
First, you can do this same calculation in Excel by using the
=IF(logical-expression,true-expression,false-expression) in another
column on the spreadsheet. Just in case you wanted to do it in Excel
in addition to, or instead of, R. And, like the ifelse() function in
R, you can use a "nested" IF() within either, or both, of the value
expressions of the =IF. Something like:
where "P", "B", and "E" are replaced by the proper cell names. Of
course, the plus in R is that you can create the distance values "all
at once" because R can work with vectors. Whereas in Excel, you'll
need to put the =IF expression in every cell in the appropriate
column. I mention it only for completeness, just in case you need the
data in the spreadsheet also.
Secondly, you mentioned exporting to a csv. I would _guess_ that you
are using Windows, not a Mac nor Linux. If so, then built into Windows
is an ODBC driver for reading (and writing) from (to) an Excel
spreadsheet. Most users don't know much about ODBC, but it is there.
You might be interested:
I'm not where I can work my way through it because I'm at home on a
Linux machine. But if you need some step-by-step, I'll be glad to
_try_ on Monday when I'm back in the office. I just set up an ODBC for
accessing MS-SQL and the process is similar with Excel. Or you might
search YouTube, which is where I found out about this, albeit not
using R. On http://Youtube.com try searching on "set up odbc
connections excel" But I didn't really see too many which were
relevant. Most were how to set up ODBC to allow Excel to read from
some other database.
Anyway, once this is done, you could then read the Excel data using
the RODBC driver. Something like:
ch <- odbcConnect("Excel"); # The name of the ODBC connection you made above
data <- sqlFetch(ch,"SheetName"); # read all the data in the Excel
workbook in the tab named "SheetName"
# please forgive any errors - not where I can test today
Another possibility is to use an "R native" access via either openxlsx
or XLConnect. I use openxlsx myself, not because it is "better", but
because XLConnect uses Apache POI which means using Java which means
it is not as fast. Well, at least in my assumptions XLConnect must be
slower due to Java. I have not done any performance or reliability
testing. I have done some functionality testing and both seem to work
on my test Excel spreadsheet. In addition, these packages can also
create or replace an Excel spreadsheet by rewriting the ???.xlsx file.
Note this is not an "update in place" like with a real database.
Honestly, for me, these latter are easier to understand.
There is nothing more pleasant than traveling and meeting new people!
More information about the R-help