# [R] When creating a data frame with data.frame() transforms "integers" into "factors"

António Brito Camacho toinobc at gmail.com
Sun May 26 17:34:02 CEST 2013

```Hello Bert.

I understood now that i was trying to do something that didn't made sense and that was why it failed.
I should have used an histogram do do a graph of the frequency of each number of 'posts' instead of going the convoluted way around and trying to do a scatterplot.
I now understand that table() transforms each value of the variable into a "factor" and counts how many times it shows up. It makes sense that these "factors" are then tranformed into "character" when in the data frame, because they are not a quantity, but the representation of the number.

Thanks for the help. Problem solved.

António Brito Camacho

No dia 26/05/2013, às 15:00, Bert Gunter <gunter.berton at gene.com> escreveu:

> 1. Please always cc. the list; do not reply just to me.
>
> 2.  OK, I see. I ERRED. Had you cc'ed the list, someone might have
> pointed this out. The correct example reproduces what you saw.
>
> z<- sample(1:10,30,rep=TRUE)
> table(z)
> w <- data.frame(table(z))
> w
>
>     z  Freq
> 1   1    2
> 2   2    3
> 3   3    1
> 4   4    3
> 5   5    5
> 6   6    3
> 7   7    5
> 8   8    4
> 9   9    1
> 10 10    3
>
>> sapply(w,class)
>        z      Freq
> "factor" "integer"
>
> This is exactly what is expected and documented.  See ?table. So the
> question is: What do you expect?  table() produces an array whose
> cross-classifying factors are the dimensions. data.frame converts this
> into a data frame. Perhaps the following will help clarify:
>
>> z <- data.frame(fac1= sample(LETTERS[1:3],10,rep=TRUE),
>      fac2 = sample(c("j","k"),10,rep=TRUE))
>> z
>   fac1 fac2
> 1     A    k
> 2     B    k
> 3     C    k
> 4     C    k
> 5     B    k
> 6     C    k
> 7     C    k
> 8     A    j
> 9     A    j
> 10    C    j
>
>> table(z)
>
>    fac2
> fac1 j k
>   A 2 1
>   B 0 2
>   C 1 4
>
>> data.frame(table(z))
>
>  fac1 fac2 Freq
> 1    A    j    2
> 2    B    j    0
> 3    C    j    1
> 4    A    k    1
> 5    B    k    2
> 6    C    k    4
>
>> table(z['fac1'])
>
> A B C
> 3 2 5
>
>> data.frame(table(z['fac1']))
>  Var1 Freq
> 1    A    3
> 2    B    2
> 3    C    5
>
> Cheers,
> Bert
>
> On Sat, May 25, 2013 at 6:54 PM, António Camacho <toinobc at gmail.com> wrote:
>> Hello Bert
>> I tried your example and it worked without a problem.
>>
>> But what i want is to create a data frame from the output of the function
>> table(), so in your example i tried "sapply(data.frame(tbl),class)" and the
>> output was z --> factor and Freq --->integer.
>> What is happening in the table() function that is transforming the integers
>> in z into values with labels ?
>> because when i do "names(tbl)" it returns each value of z as a name....
>>
>> I read the manual for " [ " but i didn't understand it completely. I have to
>> read the introduction to R more carefully.
>>
>> I also tried using "[," "[[" and "\$" for the extraction of the values from
>> the 'posts' column, but the problem persisted.
>>
>> Like i said, this code was taken from an example in a webpage. I contacted
>> the author and he confirmed me that the code worked on his machine, that was
>> running R 2.15.1....
>> Maybe something changed between versions in the data.frame() ??
>>
>> I really don't understant what I am doing wrong.
>>
>> António
>>
>> On 2013/05/26, at 01:44, Bert Gunter wrote:
>>
>>> Huh?
>>>
>>>> z <- sample(1:10,30,rep=TRUE)
>>>> tbl <- table(z)
>>>> tbl
>>>
>>> z
>>> 1 2 3 4 5 6 7 8 9 10
>>> 4 3 2 6 3 3 2 2 2 3
>>>>
>>>> data.frame(z)
>>>
>>>   z
>>> 1   5
>>> 2   2
>>> 3   4
>>> 4   1
>>> 5   6
>>> 6   4
>>> 7  10
>>> 8   4
>>> 9   3
>>> 10  8
>>> 11 10
>>> 12  4
>>> 13  3
>>> 14  9
>>> 15  2
>>> 16  2
>>> 17  6
>>> 18  1
>>> 19  4
>>> 20  7
>>> 21  9
>>> 22 10
>>> 23  7
>>> 24  5
>>> 25  5
>>> 26  6
>>> 27  8
>>> 28  1
>>> 29  1
>>> 30  4
>>>>
>>>> sapply(data.frame(z),class)
>>>
>>>       z
>>> "integer"
>>>
>>> Your error: you used df['posts']  . You should have used df[,'posts'] .
>>>
>>> The former is a data frame. The latter is a vector. Read the
>>> "Introduction to R tutorial" or ?"[" if you don't understand why.
>>>
>>> -- Bert
>>>
>>> -- Bert
>>>
>>> On Sat, May 25, 2013 at 12:36 PM, António Camacho <toinobc at gmail.com>
>>> wrote:
>>>>
>>>> Hello
>>>>
>>>>
>>>> I am novice to R and i was learning how to do a scatter plot with R using
>>>> an example from a website.
>>>>
>>>> My setup is iMac with Mac OS X 10.8.3, with R 3.0.1, default install,
>>>>
>>>> I created a .csv file in vim with  the following content
>>>> userID,user,posts
>>>> 1,user1,581
>>>> 2,user2,281
>>>> 3,user3,196
>>>> 4,user4,150
>>>> 5,user5,282
>>>> 6,user6,184
>>>> 7,user7,90
>>>> 8,user8,74
>>>> 9,user9,45
>>>> 10,user10,20
>>>> 11,user11,3
>>>> 12,user12,1
>>>> 13,user13,345
>>>> 14,user14,123
>>>>
>>>> i imported the file into R using : ' df <- read.csv('file.csv')
>>>> to confirm the data types i did : 'sappily(df, class) '
>>>> that returns "userID" --> "integer" ; "user" ---> "factor" ; "posts" --->
>>>> "integer"
>>>> then i try to create another data frame with the number of posts and its
>>>> frequencies,
>>>> so i did: 'postFreqCount<-data.frame(table(df['posts']))'
>>>> this gives me the postFreqCount data frame with two columns, one called
>>>> 'Var1' that has the number of posts each user did, and another collumn
>>>> 'Freq' with the frequency of each number of posts.
>>>> the problem is that if i do : 'sappily(postFreqCount['Var1'],class)' it
>>>> returns "factor".
>>>> So the data.frame() function transformed a variable that was "integer"
>>>> (posts) to a variable (Var1) that has the same values but is "factor".
>>>> I want to know how to prevent this from happening. How do i keep the
>>>> values
>>>> from being transformed from "integer" to "factor" ?
>>>>
>>>> Thank you for your help
>>>>
>>>> António
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>>
>>> Internal Contact Info:
>>> Phone: 467-7374
>>> Website:
>>>
>>
>>
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website: