[R] selection of missing data

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Sun Nov 13 19:56:09 CET 2005


I do not quite follow your post but here are some suggestions. 


1) You can the na.strings argument to simplify things 

   df <- read.delim(file="lala.txt", na.strings="-" )


2) If you can count the number of metastasis per row first, then find
the rows with zero sum.

   met.cols      <- c(11,12,14,21,23,24) # metastasis columns
   number.of.met <- rowSums( mela[ , met.cols ] == "-" )
   have.no.met   <- which( number.of.met == 0 )
   mela.no.met   <- mela[ have.no.met , ]

If you had coded your "-" as NA during read in then, the second line
needs to be changed to

   number.of.met <- rowSums( is.na( mela[ , met.cols ] ) )

or simply use complete.cases

   met.cols      <- c(11,12,14,21,23,24) # metastasis columns
   mela.no.met   <- mela[ which( complete.cases(mela[ , met.cols]) ) , ]


3) If you name your columns in a systematic fashion, then you can easily
extract and specify those columns. For example if your columns were
named 

   cn <- c( "age", "colon.met", "PSA.level", "prostate.met", "gender",
            "hospitalisation.days", "status", "liver.met", "ethnicity")

Then you can extract those names ending with ".met" as

   met.cols <- grep( "\\.met$", cn )
   met.cols
   [1] 2 4 8


Regards, Adai



On Sun, 2005-11-13 at 18:40 +0100, billemont at cegetel.net wrote:
> Hi i'm a french medical student,
> i have some data that i import from excel. My colomn of the datafram 
> are the localisations of metastasis. If there is a metatsasis there is 
> the symbol "_". i want to exclude the row without metastasis wich 
> represent the NA data.
> 
> so, i wrote this
> 
> mela is the data fram
> 
> mela1=ifelse(mela[,c(11:12,14:21,23,24)]=="_",1,0) # selection of the 
> colomn of metastasis localisation
> 
> mela4=subset(mela3,Skin ==0 & s.c == 0 & Mucosa ==0 & Soft.ti ==0 & 
> Ln.peri==0 & Ln.med==0 & Ln.abdo==0 & Lung==0 & Liver==0 & 
> Other.Visc==0 & Bone==0 & Marrow==0 & Brain==0 & Other==0) ## selection 
> of the row with no metastasis localisation
> nrow(mela4)
> 
> but i dont now if it is possible to make the same thin as 
> ifelse(mela3,Skin & s.c== 0, 0,NA) with more than colomn and after to 
> exclude of my data the Na with na.omit.
> 
> The last question is how can i omit only the row which are NA value for 
> the colomn metastasis c(11:12,14:21,23,24))
> 
> Thank you for your help
> 
> 
> 
> Bertrand billemont
> 	[[alternative text/enriched version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list