[R] Query about extracting subsets from a table

Marc Schwartz marc_schwartz at comcast.net
Tue Jan 23 18:48:33 CET 2007


On Tue, 2007-01-23 at 09:28 -0800, lalitha viswanath wrote:
> Hi
> I am trying to process tabular data as follows:
> 
> Data in the input file is of the form
> 
> genome1 genome2 tree-dist log10escore
> 
> Genome1 and genome2 are alphabetic.
> Tree-dist and log10escore are numeric.
> 
> I wish to extract only those  rows from this table
> where the log10escore is less than -3.
> 
> 
> data <-read.table(filename);
> data$log10escore = data$log10escore[ data$log10escore
> < -3];
> 
> I would like to use this pruned list of escores to get
> the corresponding genomenames and treedist.
> 
> I did not find anything useful in the FAQs and Notes
> on R for this part of the data extraction.
> 
> As I am just beginning programming in R, I would
> appreciate your input about this.
> 
> Thanks
> L

help.search("subset") would lead you to ?subset, where you could do
something like:

DF <- subset(YourData, log10escore < -3)

If you just wanted the values of the two other columns, you could also
use:

DF <- subset(YourData, log10escore < -3, 
             select = c(genomenames, treedist))


One additional alternative is to use which(). This will return the
_indices_ of the values that match the criteria.  For example:

  Ind <- which(YourData$log10escore < -3)

In that case, you could then use:

  YourData$genomename[Ind]

and 
 
  YourData$treedist[Ind]

These would return vectors of the two columns meeting the criteria. 

Which approach you take depends upon what else you may want to do with
the data.

See ?which for more information.

HTH,

Marc Schwartz



More information about the R-help mailing list