[R] How to subset data, by sorting names alphabetically.

Leandro Roser learoser at gmail.com
Fri Feb 13 00:47:44 CET 2015


Hi, a solution could be:

# example matrix a:
a <- matrix(1:100, 10, 10)
a[, 1] <- (sample(c("aa","bb" , "ab"), 10,  rep=TRUE))
a <- a[order(a[, 1]), ]  # order the matrix by row = 1


#subsetting a:

lev <- levels(as.factor(a[, 1]))
subs <- list()
for(i in 1:length(lev)) {
subs[[i]]  <- a[a[, 1] %in% lev[i], ]
}

#result:
subs


## an alternative, with column 1 as name of list:

# example matrix a:
a <- matrix(1:100, 10, 10)
a[, 1] <- (sample(c("aa","bb" , "ab"), 10,  rep=TRUE))
a <- a[order(a[, 1]), ]  # order the matrix by row = 1

lev <- levels(as.factor(a[, 1]))
subs <- list()
for(i in 1:length(lev)) {
    subs[[i]]  <- a[a[, 1] %in% lev[i], -1]
}
names(subs) <- lev

#result:
subs

2015-02-12 19:20 GMT-03:00 Greg Snow <538280 at gmail.com>:
> The split function does essentially this, but puts the results into a list
> rather than using the dangerous and messy assign function.  The overall
> syntax is simpler as well.
>
> On Thu, Feb 12, 2015 at 3:14 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>
>> Hi Samarvir,
>> Assuming that you want to generate a separate data frame for each
>> value of "Name",
>>
>> # name of initial data frame is ssdf
>> for(nameval in unique(ssdf$Name)) assign(nameval,ssdf[ssdf$Name==nameval,])
>>
>> This will produce as many data frames as there are unique values of
>> ssdf$Name, each named by the values it contains.
>>
>> Jim
>>
>>
>> On Thu, Feb 12, 2015 at 3:57 PM, samarvir singh <samarvir1996 at gmail.com>
>> wrote:
>> > hello,
>> >
>> > I am cleaning some large data with 4 million observation and 7 variable.
>> > Of the 7 variables , 1 is name/string
>> >
>> > I want to subset data, which have same name
>> >
>> > Example-
>> >
>> >  Name var1 var2 var3 var4 var5 var6
>> > aa        -       -       -         -     -        -
>> > ab
>> > bd
>> > ac
>> > ad
>> > af
>> > ba
>> > bd
>> > aa
>> > av
>> >
>> > i want to sort the data something like this
>> >
>> > aa
>> > aa
>> > all aa in a same subset
>> >
>> > and all ab in same subset
>> >
>> > every column with same name in a subset
>> >
>> >
>> >
>> > thanks in advance.
>> > I am new to R community.
>> > appreciate your help
>> > - Samarvir
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Lic. Leandro Gabriel Roser
 Laboratorio de Genética
 Dto. de Ecología, Genética y Evolución,
 F.C.E.N., U.B.A.,
 Ciudad Universitaria, PB II, 4to piso,
 Nuñez, Cdad. Autónoma de Buenos Aires,
 Argentina.
 tel ++54 +11 4576-3300 (ext 219)
 fax ++54 +11 4576-3384



More information about the R-help mailing list