[BioC] Sorting matrix by column

Kasoji, Manjula (NIH/NCI) [C] manjula.kasoji at nih.gov
Tue Oct 23 17:44:07 CEST 2012


Ido not get anything:

> orderVector1
Error: object 'orderVector1' not found



On 10/23/12 11:42AM, "James W. MacDonald" <jmacdon at uw.edu> wrote:

>Also, what do you get from
>
>orderVector1
>
>
>
>On 10/23/2012 11:38 AM, Kasoji, Manjula (NIH/NCI) [C] wrote:
>> Hi Jim,
>>
>> The R session info below does correspond to the session I pasted. When I
>> tried your suggestion, I still get an error:
>>
>>> x[base::order(x[,2]),]
>> Error in base::order(x[, 2]) :
>>    unimplemented type 'list' in 'orderVector1'
>>
>>
>> I see that you don't have quotes around the ID and Gene Symbol names in
>> your matrix. Is there a way to remove the quotes?
>>
>> Thanks!
>>
>> On 10/23/12 11:27AM, "James W. MacDonald"<jmacdon at uw.edu>  wrote:
>>
>>>
>>> On 10/23/2012 11:15 AM, Guest [guest] wrote:
>>>> Hi,
>>>>
>>>> I would like to sort a matrix by a specific column (column 2). I tried
>>>> the order() function, but I get an error. I think it is because the
>>>> values in column 2 are not numeric, they are gene symbols. This may
>>>>be a
>>>> general R question, but I thought I would post it here since it is
>>>> microarray data analysis.
>>>>
>>>> I have matrix x:
>>>>
>>>>> x
>>>>            ID         Gene Symbol     logFC      Adj.PVal
>>>> 10344624 "10371400" "Lypla1"        0.3592492  0.9999522
>>>> 10344633 "10453900" "Tcea1"         0.1886117  0.9999522
>>>> 10344637 "10375051" "Atp6v1h"       0.6713107  0.9999522
>>>> 10344653 "10575211" "Oprk1"         -0.2342731 0.9999522
>>>> 10344658 "10566254" "Rb1cc1"        1.790676   0.9999522
>>>> 10344674 "10602372" "Fam150a"       1.397496   0.9999522
>>>> 10344679 "10398428" "St18"          -0.3278807 0.9999522
>>>> 10344707 "10383518" "Pcmtd1"        -0.2231074 0.9999522
>>>> 10344713 "10397054" "Ahcy"          -0.1844897 0.9999522
>>>> 10344723 "10384020" "Rrs1"          -0.2322781 0.9999522
>>>> 10344725 "10608710" "Adhfe1"        0.5993566  0.9999522
>>>> 10344741 "10363762" "Hnrnpa3"       -0.2660978 0.9999522
>>>> 10344743 "10375058" "3110035E14Rik" 0.9178868  0.9999522
>>>> 10344750 "10381603" "Sgk3"          -0.2961638 0.9999522
>>>> 10344772 "10442373" "6030422M02Rik" -0.1653454 0.9999522
>>>> 10344789 "10421227" "Cspp1"         -0.1480766 0.9999522
>>>> 10344799 "10534966" "Cspp1"         -0.2436361 0.9999522
>>>> 10344801 "10398408" "Cspp1"         -0.4040665 0.9999522
>>>> 10344803 "10398418" "Cspp1"         -0.2556627 0.9999522
>>>> 10344805 "10572772" "Cspp1"         -0.1864641 0.9999522
>>>>
>>>> I want to sort on the "Gene Symbol" column so that I can remove the
>>>> duplicates and keep the one with the highest log fold change.
>>>>
>>>> I tried the following and received an error.
>>>>> x[order(x[,2]),]
>>>> Error in order(x[, 2]) : unimplemented type 'list' in 'orderVector1'
>>> I am not sure the sessionInfo() you give below corresponds to the
>>> session above. I get:
>>>
>>>> x<- data.frame(ID = 12345:12354, Gene =
>>> Rkeys(mogene10sttranscriptclusterSYMBOL)[5001:5010], logFC = rnorm(10),
>>> pval = runif(10))
>>>> x
>>>        ID   Gene       logFC      pval
>>> 1  12345  Sepw1  0.56914952 0.4916910
>>> 2  12346  Serf1  0.83929962 0.4816986
>>> 3  12347 Gm4748  0.12462117 0.9372249
>>> 4  12348   Sez6 -0.21468480 0.4921201
>>> 5  12349  Foxp3 -1.36283694 0.4575675
>>> 6  12350  Sfpi1  1.03632565 0.5251826
>>> 7  12351  Sfrp1  0.04689108 0.3068112
>>> 8  12352   Frzb  0.08379607 0.1509499
>>> 9  12353  Sfrp4 -1.61513620 0.9336235
>>> 10 12354  Srsf2  1.56222316 0.2571122
>>>> x[order(x[,2]),]
>>>        ID   Gene       logFC      pval
>>> 5  12349  Foxp3 -1.36283694 0.4575675
>>> 8  12352   Frzb  0.08379607 0.1509499
>>> 3  12347 Gm4748  0.12462117 0.9372249
>>> 1  12345  Sepw1  0.56914952 0.4916910
>>> 2  12346  Serf1  0.83929962 0.4816986
>>> 4  12348   Sez6 -0.21468480 0.4921201
>>> 6  12350  Sfpi1  1.03632565 0.5251826
>>> 7  12351  Sfrp1  0.04689108 0.3068112
>>> 9  12353  Sfrp4 -1.61513620 0.9336235
>>> 10 12354  Srsf2  1.56222316 0.2571122
>>>
>>> It appears you have something loaded that thinks you want to use the
>>> orderVector1() function. You can always specify the function you are
>>> intending with the :: operator (in this case, you want base::order()).
>>>
>>>> x[base::order(x[,2]),]
>>>        ID   Gene       logFC      pval
>>> 5  12349  Foxp3 -1.36283694 0.4575675
>>> 8  12352   Frzb  0.08379607 0.1509499
>>> 3  12347 Gm4748  0.12462117 0.9372249
>>> 1  12345  Sepw1  0.56914952 0.4916910
>>> 2  12346  Serf1  0.83929962 0.4816986
>>> 4  12348   Sez6 -0.21468480 0.4921201
>>> 6  12350  Sfpi1  1.03632565 0.5251826
>>> 7  12351  Sfrp1  0.04689108 0.3068112
>>> 9  12353  Sfrp4 -1.61513620 0.9336235
>>> 10 12354  Srsf2  1.56222316 0.2571122
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>> If anyone has any suggestions for an easy way to sort a significant
>>>> gene list, remove duplicated values, and keep the value with highest
>>>> fold change, that would be helpful!
>>>>
>>>> I've posted my session info below.
>>>>
>>>> Thanks!
>>>>
>>>> Guest
>>>>
>>>>    -- output of sessionInfo():
>>>>
>>>>> sessionInfo()
>>>> R version 2.15.1 (2012-06-22)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] tools_2.15.1
>>>>
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> -- 
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>
>-- 
>James W. MacDonald, M.S.
>Biostatistician
>University of Washington
>Environmental and Occupational Health Sciences
>4225 Roosevelt Way NE, # 100
>Seattle WA 98105-6099
>



More information about the Bioconductor mailing list