[R] Sorting Text Frames

Uwe Ligges ligges at statistik.uni-dortmund.de
Wed Sep 7 08:48:54 CEST 2005


Murray Jorgensen wrote:
> [Using 2.0.1 under Windows XP]
> There are a few pages on the internet that list equivalents of
> "thank you" in many languages. I downloaded one from a Google search
> and I thought that it would be interesting and a good R exercise to
> sort the file into the order of the expressions, rather than the languages.
> 
> I tidied up the web page and got it into the format that it was nearly
> in: Language Name in columns 1-43, the expression in the remaining
> columns.
> 
> Then I read it in:
> 
>  > thanks <- read.fwf("C:\\Files\\Reading\\thankyou.txt", c(43,37))
>  > thanks[1:4,]
>                                             V1            V2
> 1 Abenaki (Maine USA, Montreal Canada)            Wliwni ni
> 2 Abenaki (Maine USA, Montreal Canada)               Wliwni
> 3 Abenaki (Maine USA, Montreal Canada)               Oliwni
> 4 Achí (Baja Verapaz Guatemala)               Mantiox chawe
> 
>  > dim(thanks)
> [1] 1254    2
> 
> Now I tried sorting the frame into the order of the second column:
> 
> tord <- order(thanks$V2)
> sink("C:\\Files\\Reading\\thanks.txt")
> thanks[tord[1:74],]
> sink()
> 
> This gives more or less the expected output, the file thanks.txt beginning
> 
>                                                    V1 
>     V2
> 145      Cahuila (United States)                                '\301cha-ma
> 862      Paipai (Mexico, USA)                                    'Ara'ya:ikm
> 863      Paipai (Mexico, USA)                                    'Ara'yai:km
> 864      Paipai (Mexico, USA)                                     'Ara'ye:km
> 311      Eyak (Alaska)                                            'Awa'ahdah
> 
> [you may get a bit of wrapping there!]
> 
> However I don't really want just 74 lines, I would like the whole file. But
> if I get rid of the [1:74] or replace 74 with any larger number I get 
> output
> like this, with no second column:
> 
>                                                    V1
> 145      Cahuila (United States)
> 862      Paipai (Mexico, USA)
> 863      Paipai (Mexico, USA)
> 864      Paipai (Mexico, USA)
> 311      Eyak (Alaska)

I guess there is just too much space or some special characters in your 
variables that cause problems when printing ...
Hence you have to "debug" your data yourself.

Uwe Ligges



> Does anyone know what is going on?
> Tusen tak in advance, in fact 1254 tak in advance!
> 
> Murray Jorgensen




More information about the R-help mailing list