[R] Pairwise correlation

R. Michael Weylandt michael.weylandt at gmail.com
Fri Nov 18 03:35:49 CET 2011


Here's a function Josh Wiley provided in another thread:

spec.cor <- function(dat, r, ...) {
    x <- cor(dat, ...)
    x[upper.tri(x, TRUE)] <- NA
    i <- which(abs(x) >= r, arr.ind = TRUE)
    data.frame(matrix(colnames(x)[as.vector(i)], ncol = 2), value = x[i])
}

Michael

On Thu, Nov 17, 2011 at 4:08 PM, Musa Hassan <musahass at gmail.com> wrote:
> Hi Michael,
> I was able to solve this. I just used the WGCNA library which allows for
> stringsAsFactors to be defined in the work space making everything stored as
> strings remain strings. My problem now is parsing through the results to
> pull out only significant correlations defined by a certain Pearson
> correlation value say 0.8.
>
> On 17 November 2011 15:32, R. Michael Weylandt <michael.weylandt at gmail.com>
> wrote:
>>
>> I can't see how it's stored like that and the email servers garble it
>> up. Use dput() to create a plain text representation and paste that
>> back in.
>>
>> Thanks,
>> Michael
>>
>> On Thu, Nov 17, 2011 at 9:37 AM, muzz56 <musahass at gmail.com> wrote:
>> > Hi Michael,
>> > Here is a sample of the data.
>> >
>> >  Gene Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 Array9
>> > Array10
>> > Array11  Fth1 26016.01 23134.66 17445.71 39856.04 27245.45 23622.98
>> > 37887.75
>> > 49857.46 25864.73 21852.51 29198.4  B2m 7573.64 7768.52 6608.24 8571.65
>> > 6380.78 6242.76 6903.92 7330.63 7256.18 5678.21 10937.05  Tmsb4x 6192.44
>> > 4277.22 5024.59 4851.51 3062.55 4562.43 7948.1 5018.58 3200.17 2855.77
>> > 6139.23  H2-D1 3141.41 3986.06 3328.62 4726.6 3589.89 2885.95 7509.88
>> > 5257.62 4742.26 3431.33 5300.72  Prdx5 3935.7 3938.9 3401.68 4193.14
>> > 4028.95
>> > 3438.19 6640.15 5486.61 4424.57 3368.83 5265.92
>> > I want to retain the gene names in the data. What you've proposed will
>> > take
>> > them out and I'll have to append them back to the results after the
>> > cor()
>> >
>> > On 17 November 2011 09:33, Michael Weylandt [via R] <
>> > ml-node+s789695n4080177h34 at n4.nabble.com> wrote:
>> >
>> >> I think something like this should do it, but I can't test without
>> >> data:
>> >>
>> >> rownames(mydata) <- mydata[,1] # Put the elements in the first column
>> >> as rownames
>> >> mydata <- mydata[,-1] # drop the things that are now rownames
>> >>
>> >> Michael
>> >>
>> >> On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <[hidden
>> >> email]<http://user/SendEmail.jtp?type=node&node=4080177&i=0>>
>> >> wrote:
>> >>
>> >> > Hi Michael,
>> >> > Thanks for the response. I have noticed that the error occurred
>> >> > during
>> >> my
>> >> > data read. It appears that the rownames (which when the data is
>> >> transposed
>> >> > become my colnames) were converted to numbers instead of strings as
>> >> > they
>> >> > should be. The original header names don't change, just the rownames.
>> >> > I
>> >> have
>> >> > to figure out how to import the data and have the strings not
>> >> > converted.
>> >> > Right now am using:
>> >> > mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>> >> >
>> >> > then to convert the data frame to matrix
>> >> > mydata=data.matrix(mydata)
>> >> >
>> >> > Then I just do the correlation as Peter suggested.
>> >> >
>> >> > expression=cor(t(expression))
>> >> >
>> >> > Thanks.
>> >> >
>> >> > On 17 November 2011 08:51, R. Michael Weylandt <[hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4080177&i=1>>
>> >>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden
>> >> >> email]<http://user/SendEmail.jtp?type=node&node=4080177&i=2>>
>> >> wrote:
>> >> >> > Thanks to everyone who replied to my post, I finally got it to
>> >> >> > work.
>> >> I
>> >> >> > am
>> >> >> > however not sure how well it worked since it run so quickly, but
>> >> seems
>> >> >> > like
>> >> >> > I have a 2000 x 2000 data set.
>> >> >>
>> >> >> Behold the great and mighty power that is R! Don't worry -- on a
>> >> >> decent machine the correlation of a 2k x 2k data set should be
>> >> >> pretty
>> >> >> fast. (It's about 9 seconds on my old-ish laptop with a bunch of
>> >> >> other
>> >> >> junk running)
>> >> >>
>> >> >> >  My followup questions would be, how do I get
>> >> >> > only pairs with say a certain pearson correlation value
>> >> >> > additionally
>> >> it
>> >> >> > seems like my output didn't retain the headers but instead
>> >> >> > replaced
>> >> them
>> >> >> > with numbers making it hard to know which gene pairs correlate.
>> >> >>
>> >> >> This is a little worrisome: R carries column names through cor() so
>> >> >> this would suggest you weren't using them. Were your headers listed
>> >> >> as
>> >> >> part of your data (instead of being names)? If so, they would have
>> >> >> been taken as numbers.
>> >> >>
>> >> >> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> >> >> then they are being treated as data instead of numbers. If they are,
>> >> >> can you provide some reproducible code and we can debug more fully.
>> >> >> The easiest way to send data is to use the dput() function to get a
>> >> >> copy-pasteable plain text representation. It would also be great if
>> >> >> you could restrict it to a subset of your data rather than the full
>> >> >> 4M
>> >> >> data points, but if that's hard to do, don't worry.
>> >> >>
>> >> >> You should have expected behavior like
>> >> >>
>> >> >> X <- matrix(1:9,3)
>> >> >> colnames(X) <- c("A","B","C")
>> >> >> cor(X) # Prints with labels
>> >> >>
>> >> >> Michael
>> >> >>
>> >> >> >
>> >> >> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> >> >> > [hidden email]
>> >> >> > <http://user/SendEmail.jtp?type=node&node=4080177&i=3>>
>> >> wrote:
>> >> >> >
>> >> >> >> > -----Original Message-----
>> >> >> >> > From: [hidden
>> >> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0
>> >> >[mailto:
>> >> >> >> r-help-bounces at r-
>> >> >> >> > project.org] On Behalf Of muzz56
>> >> >> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> >> >> > To: [hidden
>> >> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> >> >> > Subject: Re: [R] Pairwise correlation
>> >> >> >> >
>> >> >> >> > Thanks Peter. I tried this after reading in the csv (read.csv)
>> >> >> >> > and
>> >> >> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> >> >> > correlation,
>> >> >> >> > I keeping getting the error (x must be numeric) yet when I view
>> >> the
>> >> >> >> > data,
>> >> >> >> > its numeric.
>> >> >> >> >
>> >> >> >>
>> >> >> >> What does R tell you if you execute the following?
>> >> >> >>
>> >> >> >> str(x)
>> >> >> >>
>> >> >> >> Just because the data looks like it is numeric when it prints
>> >> doesn't
>> >> >> >> mean
>> >> >> >> it is.
>> >> >> >>
>> >> >> >>
>> >> >> >> Dan
>> >> >> >>
>> >> >> >> Daniel J. Nordlund
>> >> >> >> Washington State Department of Social and Health Services
>> >> >> >> Planning, Performance, and Accountability
>> >> >> >> Research and Data Analysis Division
>> >> >> >> Olympia, WA 98504-5204
>> >> >> >>
>> >> >> >>
>> >> >> >> ______________________________________________
>> >> >> >> [hidden email]
>> >> >> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing
>> >> >> >> list
>> >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> >> PLEASE do read the posting guide
>> >> >> >> http://www.R-project.org/posting-guide.html
>> >> >> >> and provide commented, minimal, self-contained, reproducible
>> >> >> >> code.
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------
>> >> >> >>  If you reply to this email, your message will be added to the
>> >> >> >> discussion
>> >> >> >> below:
>> >> >> >>
>> >> >> >>
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >> >> >>  To unsubscribe from Pairwise correlation, click
>> >> >> >> here<
>> >>
>> >> >> >> .
>> >> >> >>
>> >> >> >> NAML<
>> >>
>> >> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > View this message in context:
>> >> >> >
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
>> >> >> > Sent from the R help mailing list archive at Nabble.com.
>> >> >> >        [[alternative HTML version deleted]]
>> >> >> >
>> >> >> > ______________________________________________
>> >> >> > [hidden email]
>> >> >> > <http://user/SendEmail.jtp?type=node&node=4080177&i=4>mailing list
>> >> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> > PLEASE do read the posting guide
>> >> >> > http://www.R-project.org/posting-guide.html
>> >> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >> >
>> >> >
>> >> >
>> >>
>> >> ______________________________________________
>> >> [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=4080177&i=5>mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> >> discussion
>> >> below:
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080177.html
>> >>  To unsubscribe from Pairwise correlation, click
>> >> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4076963&code=bXVzYWhhc3NAZ21haWwuY29tfDQwNzY5NjN8LTE5ODYxNDM0OTI=>
>> >> .
>> >>
>> >> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080194.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>



More information about the R-help mailing list