[R] unqiue problem

jim holtman jholtman at gmail.com
Mon Jun 14 18:04:44 CEST 2010


Your process does remove all the duplicate entries based on the
content of the two columns.  After you do this, there are still
duplicate entries in the first column that you are trying to use as
rownames and therefore the error.  Why to you want to use non-unique
entries as rownames?  Do you really need the row names, or should you
only be keeping unique values for the first column?

On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:
> Hello everybody,
>
> I have a a matrix of 2 columns and over 27k rows.
> some of the rows are double , so I tried to remove them with the command
> unique():
>
>> Workbook5 <- read.delim(file =  "Workbook5.txt")
>> dim(Workbook5)
> [1] 27748     2
>> Workbook5 <- unique(Workbook5)
>> dim(Workbook5)
> [1] 20101     2
>
> it removed a lot of line, but unfortunately not all of them. I wanted to add
> the row names to the matrix and got this error message:
>> rownames(Workbook5) <- Workbook5[,1]
> Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L, 4L, 5L,  :
>  duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique values when setting 'row.names': ‘A_51_P102339’,
> ‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
> ‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
> ‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
> ‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
> ‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
> ‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
> ‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
> ‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
> ‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
> ‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
> ‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
> ‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
> ‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
> ‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
> ‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
> ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]
>
> Is there a better way to discard the duplicataions in the text file (Excel
> file is the origin).
>
>> R.version
>               _
> platform       x86_64-apple-darwin9.8.0
> arch           x86_64
> os             darwin9.8.0
> system         x86_64, darwin9.8.0
> status         Patched
> major          2
> minor          11.1
> year           2010
> month          06
> day            03
> svn rev        52201
> language       R
> version.string R version 2.11.1 Patched (2010-06-03 r52201)
>
> THX
>
> Assa
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list