[R] factors and characters when attaching data...more info.

Gary Collins gco at eortc.be
Thu Apr 5 11:05:50 CEST 2001

Please find an ammendment to a problem I posted yesterday (04/04/01).
Unfortunately I recieved only one response, so I will give some more details
to the problem.
I have read some data called Version3.Studies, and to make life slightly
easier and programming less wordy, I want to attach a dataframe, but when I
do, all of my charater fields are forced into factors.

> Version3.Studies_read.table("c:\\Version3.Studies.dat",
header=TRUE,as.is=TRUE, strip.white=TRUE) 
> summary(Version3.Studies$Group)
   Length      Mode 
     3103 character 
> is.character(Version3.Studies$Group) # Just to make sure...
[1] TRUE
> unique(Version3.Studies$Group)
 [1] "Lung"         "Mesothelioma" "Breast"       "HeadandNeck"
 [6] "Ovary"        "Brain"        "Prostate"     "Testes"       "Stomach"

[11] "ColonRectum" 

Now I attach the data...

> attach(Version3.Studies)
> is.character(Group)
> is.factor(Group)
[1] TRUE

> unique(Group)
 [1]         Lung Mesothelioma       Breast  HeadandNeck   Oesophagus
 [6]        Ovary        Brain     Prostate       Testes      Stomach
[11]  ColonRectum 
Levels:          Lung        Brain        Ovary       Breast       Testes
Stomach     Prostate   Oesophagus  ColonRectum  HeadandNeck Mesothelioma 

Now, consider the following simple example, I want to extract another field,
say PF in Version3.Studies but indexing by a label in Group, say Lung.
Without attaching the data, I can simply do

> Version3.Studies$PF[Version3.Studies$Group=="Lung"]

and this calls the apropriate data.

After attaching the data, to retrieve the same data, I need to do

> PF[Group=="        Lung"]

inserting the neccesary white space.

What my question is why is R forcing my character fields to factor when
attaching a dataframe, is this what is supposed to happen, and is there a
way around it, keeping my original character fields as character and not as
Trying to force the Group to a character field still keeps white space which
was created when attaching the dataframe.

> unique(as.character(Group))
 [1] "        Lung" "Mesothelioma" "      Breast" " HeadandNeck" "
 [6] "       Ovary" "       Brain" "    Prostate" "      Testes" "
[11] " ColonRectum"
Any help would be greatly appreciated.

Gary Collins.
Dr. Gary S. Collins,
Statistics Research Fellow,
Quality of Life Unit, 
European Organisation for Research and Treatment of Cancer, 
EORTC Data Center, 
Avenue E. Mounier 83, bte. 11,
B-1200 Brussels, Belgium.

Tel: +32 2 774 1 606
Fax: +32 2 779 4 568
Email: gco at eortc.be

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list