[R] working on a data frame

Matthew mccormack at molbio.mgh.harvard.edu
Mon Jul 28 20:26:26 CEST 2014


Thank you very much Peter, Bill and Petr for some great and quite 
elegant solutions. There is a lot I can learn from these.

     Yes to your question Bill about the raw numbers, they are counts 
and they can not be negatives. The data is RNA Sequencing data where 
there are approximately 32,000 genes being measured for changes between 
two conditions. There are some genes that are not present (can not be 
measured) initially, but are present in the second condition, and the 
reverse is true also of some genes that are present initially and then 
not be present in the second condition (these are often the most 
interesting genes). This makes it difficult to compare mathematically 
the changes of all genes, so it is common practice to change the 0's to 
1's and then redo the log2. 1 is considered sufficiently small, actually 
anywhere up to 3 or 5 could be just do to 'background noise' in the 
measurement process, but it is somewhat arbitrary.

Matthew

On 7/28/2014 2:43 AM, PIKAL Petr wrote:
> Hi
>
> I like to use logical values directly in computations if possible.
>
> yourData[,10] <- yourData[,9]/(yourData[,8]+(yourData[,8]==0))
>
> Logical values are automagicaly considered FALSE=0 and TRUE=1 and can be used in computations. If you really want to change 0 to 1 in column 8 you can use
>
> yourData[,8]  <-  yourData[,8]+(yourData[,8]==0)
>
> without ifelse stuff.
>
> Regards
> Petr
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of William Dunlap
>> Sent: Friday, July 25, 2014 8:07 PM
>> To: Matthew
>> Cc: r-help at r-project.org
>> Subject: Re: [R] working on a data frame
>>
>>> if
>>> yourData[,8]==0,
>>> then
>>> yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]
>> You could do express this in R as
>>     is8Zero <- yourData[,8] == 0
>>     yourData[is8Zero, 8] <- 1
>>     yourData[is8Zero, 10] <- yourData[is8Zero,9] / yourData[is8Zero,8]
>> Note how logical (Boolean) values are used as subscripts - read the '['
>> as 'such that' when using logical subscripts.
>>
>> There are many more ways to express the same thing.
>>
>> (I am tempted to change the algorithm to avoid the divide by zero
>> problem by making the quotient (numerator + epsilon)/(denominator +
>> epsilon) where epsilon is a very small number.  I am assuming that the
>> raw numbers are counts or at least cannot be negative.)
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>>
>> On Fri, Jul 25, 2014 at 10:44 AM, Matthew
>> <mccormack at molbio.mgh.harvard.edu> wrote:
>>> Thank you for your comments, Peter.
>>>
>>> A couple of questions.  Can I do something like the following ?
>>>
>>> if
>>> yourData[,8]==0,
>>> then
>>> yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]
>>>
>>>
>>> I think I am just going to have to learn more about R. I thought
>>> getting into R would be like going from Perl to Python or Java etc.,
>>> but it seems like R programming works differently.
>>>
>>> Matthew
>>>
>>>
>>> On 7/25/2014 12:06 AM, Peter Alspach wrote:
>>>> Tena koe Matthew
>>>>
>>>> " Column 10 contains the result of the value in column 9 divided by
>>>> the value in column 8. If the value in column 8==0, then the
>> division
>>>> can not be done, so  I want to change the zero to a one in order to
>> do the division.".
>>>> That being the case, think in terms of vectors, as Sarah says.  Try:
>>>>
>>>> yourData[,10] <- yourData[,9]/yourData[,8]
>>>> yourData[yourData[,8]==0,10] <- yourData[yourData[,8]==0,9]
>>>>
>>>> This doesn't change the 0 to 1 in column 8, but it doesn't appear
>> you
>>>> actually need to do that.
>>>>
>>>> HTH ....
>>>>
>>>> Peter Alspach
>>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org
>>>> [mailto:r-help-bounces at r-project.org]
>>>> On Behalf Of Matthew McCormack
>>>> Sent: Friday, 25 July 2014 3:16 p.m.
>>>> To: Sarah Goslee
>>>> Cc: r-help at r-project.org
>>>> Subject: Re: [R] working on a data frame
>>>>
>>>>
>>>> On 7/24/2014 8:52 PM, Sarah Goslee wrote:
>>>>> Hi,
>>>>>
>>>>> Your description isn't clear:
>>>>>
>>>>> On Thursday, July 24, 2014, Matthew
>>>>> <mccormack at molbio.mgh.harvard.edu
>> <mailto:mccormack at molbio.mgh.harvard.edu>> wrote:
>>>>>       I am coming from the perspective of Excel and VBA scripts, but
>> I
>>>>>       would like to do the following in R.
>>>>>
>>>>>        I have a data frame with 14 columns and 32,795 rows.
>>>>>
>>>>>       I want to check the value in column 8 (row 1) to see if it is
>> a 0.
>>>>>       If it is not a zero, proceed to the next row and check the
>> value
>>>>>       for column 8.
>>>>>       If it is a zero, then
>>>>>       a) change the zero to a 1,
>>>>>       b) divide the value in column 9 (row 1) by 1,
>>>>>
>>>>>
>>>>> Row 1, or the row in which column 8 == 0?
>>>> All rows in which the value in column 8==0.
>>>>> Why do you want to divide by 1?
>>>> Column 10 contains the result of the value in column 9 divided by
>> the
>>>> value in column 8. If the value in column 8==0, then the division
>> can
>>>> not be done, so  I want to change the zero to a one in order to do
>> the division.
>>>> This is a fairly standard thing to do with this data. (The data are
>>>> measurements of amounts at two time points. Sometimes a thing will
>>>> not be present in the beginning (0), but very present at the later
>>>> time. Column 10 is the log2 of the change. Infinite is not an easy
>>>> number to work with, so it is common to change the 0 to a 1. On the
>>>> other hand, something may be present at time 1, but not at the later
>>>> time. In this case column 10 would be taking the log2 of a number
>>>> divided by 0, so again the zero is commonly changed to a one in
>> order
>>>> to get a useable value in column 10. In both the preceding cases
>>>> there was a real change, but Inf and NaN are not helpful.)
>>>>>       c) place the result in column 10 (row 1) and
>>>>>
>>>>>
>>>>> Ditto on the row 1 question.
>>>> I want to work on all rows where column 8 (and column 9) contain a
>> zero.
>>>> Column 10 contains the result of the value in column 9 divided by
>> the
>>>> value in column 8. So, for row 1, column 10 row 1 contains the ratio
>>>> column
>>>> 9 row 1 divided by column 8 row 1, and so on through the whole
>> 32,000
>>>> or so rows.
>>>>
>>>> Most rows do not have a zero in columns 8 or 9. Some rows have  zero
>>>> in column 8 only, and some rows have a zero in column 9 only. I want
>>>> to get rid of the zeros in these two columns and then do the
>> division
>>>> to get a manageable value in column 10. Division by zero and Inf are
>>>> not considered 'manageable' by me.
>>>>> What do you want column 10 to be if column 8 isn't 0? Does it
>>>>> already have a value. I suppose it must.
>>>> Yes column 10 does have something, but this something can be Inf or
>>>> NaN, which I want to get rid of.
>>>>>       d) repeat this for each of the other 32,794 rows.
>>>>>
>>>>>       Is this possible with an R script, and is this the way to go
>> about
>>>>>       it. If it is, could anyone get me started ?
>>>>>
>>>>>
>>>>> Assuming you want to put the new values in the rows where column 8
>>>>> == 0, you can do it in two steps:
>>>>>
>>>>> mydata[,10] <- ifelse(mydata[,8] == 0, mydata[,9]/whatever,
>>>>> mydata[,10]) #where whatever is the thing you want to divide by
>> that
>>>>> probably isn't 1 mydata[,8] <- ifelse(mydata[,8] == 0, 1,
>>>>> mydata[,8])
>>>>>
>>>>> R programming is best done by thinking about vectorizing things,
>>>>> rather than doing them in loops. Reading the Intro to R that comes
>>>>> with your installation is a good place to start.
>>>> Would it be better to change the data frame into a matrix, or
>>>> something else ?
>>>> Thanks for your help.
>>>>> Sarah
>>>>>
>>>>>
>>>>>       Matthew
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sarah Goslee
>>>>> http://www.stringpage.com
>>>>> http://www.sarahgoslee.com
>>>>> http://www.functionaldiversity.org
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> The contents of this e-mail are confidential and may be subject to
>>>> legal privilege.
>>>>    If you are not the intended recipient you must not use,
>>>> disseminate, distribute or
>>>>    reproduce all or any part of this e-mail or attachments.  If you
>>>> have received this
>>>>    e-mail in error, please notify the sender and delete all material
>>>> pertaining to this
>>>>    e-mail.  Any opinion or views expressed in this e-mail are those
>> of
>>>> the individual
>>>>    sender and may not represent those of The New Zealand Institute
>> for
>>>> Plant and
>>>>    Food Research Limited.
>>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ________________________________
> Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům.
> Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému.
> Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
> Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu.
>
> V případě, že je tento e-mail součástí obchodního jednání:
> - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
> - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou.
> - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech.
> - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá.
>
> This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
> If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
> If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
> The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.
>
> In case that this e-mail forms part of business dealings:
> - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
> - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
> - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
> - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.



More information about the R-help mailing list