[R] unique/subset problem

Fri Jan 26 17:17:27 CET 2007

Without knowing more about your data, it is hard to say for certain,
but might you be confusing unique _values_ with _factor levels_?

> mydata <- as.factor(sort(rep(1:5, 2)))
# mydata has 10 values, 5 unique values, and 5 factor levels
> mydata
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> unique(mydata)
[1] 1 2 3 4 5
Levels: 1 2 3 4 5
> mydata.subset <- mydata[1:4]
# the subset now has only 2 unique values, but the output
# still lists all five factor levels
> unique(mydata.subset)
[1] 1 2
Levels: 1 2 3 4 5

# try drop=TRUE as an option to subset
> mydata.subset <- mydata[1:4, drop=TRUE]
> unique(mydata.subset)
[1] 1 2
Levels: 1 2

Alternatively, if this is the problem and you don't need those
data to be factors, you could always convert them to a more
appropriate form.

Sarah

> > On 1/25/07, lalitha viswanath
> > <lalithaviswanath at yahoo.com> wrote:
> > > Hi
> > > I am new to R programming and am using subset to
> > > extract part of a data as follows
> > >
> > > names(dataset) =
> > > c("genome1","genome2","dist","score");
> > > prunedrelatives <- subset(dataset, score < -5);
> > >
> > > However when I use unique to find the number of
> > unique
> > > genomes now present in prunedrelatives I get
> > results
> > > identical to calling unique(dataset$genome1)
> > although
> > > subset has eliminated many genomes and records.
> > >
> > > I would greatly appreciate your input about using
> > > "unique" correctly  in this regard.
> > >
> > > Thanks
> > > Lalitha
> > >

-- 
Sarah Goslee
http://www.functionaldiversity.org