[R] Value Labels: SPSS Dataset to R

Gregory Demin gdem|n @end|ng |rom gm@||@com
Sat Feb 8 18:36:28 CET 2020


Hi,
With 'expss' package code for your task looks like this:

library(haven)
library(expss) # it is important to load expss after haven

# CatsDogs = read_spss("path_to_file")

CatsDogs = structure(
    list(
        Animal = structure(
            c(0, 0, 0, 0, 0, 0, 0, 0, 0,
              0),
            label = "Animal",
            labels = c(Cat = 0, Dog = 1),
            class = "haven_labelled"
        ),
        Training = structure(
            c(1, 0, 0, 1, 0, 1, 0, 0, 1, 0),
            label = "Type of Training",
            labels = c(`Food as Reward` = 0,
                       `Affection as Reward` = 1),
            class = "haven_labelled"
        ),
        Dance = structure(
            c(1,
              1, 0, 1, 1, 0, 1, 0, 1, 1),
            label = "Did they dance?",
            labels = c(No = 0,
                       Yes = 1),
            class = "haven_labelled"
        )
    ),
    row.names = c(NA,-10L),
    class = c("tbl_df", "tbl", "data.frame")
)

CatsDogs = add_labelled_class(CatsDogs) # set labelled class ffor
variables with labels

# frequnecies
fre(list(CatsDogs$Training, CatsDogs$Dance))
# |                  |                     | Count | Valid percent |
Percent | Responses, % | Cumulative responses, % |
# | ---------------- | ------------------- | ----- | ------------- |
------- | ------------ | ----------------------- |
# | Type of Training |      Food as Reward |     6 |            60 |
   60 |           60 |                      60 |
# |                  | Affection as Reward |     4 |            40 |
   40 |           40 |                     100 |
# |                  |              #Total |    10 |           100 |
  100 |          100 |                         |
# |                  |                <NA> |     0 |               |
    0 |              |                         |
# |  Did they dance? |                  No |     3 |            30 |
   30 |           30 |                      30 |
# |                  |                 Yes |     7 |            70 |
   70 |           70 |                     100 |
# |                  |              #Total |    10 |           100 |
  100 |          100 |                         |
# |                  |                <NA> |     0 |               |
    0 |              |                         |

# barplots
use_labels(CatsDogs, barplot(table(Training), legend.text = TRUE))
use_labels(CatsDogs, barplot(table(Dance), legend.text = TRUE))
use_labels(CatsDogs, barplot(table(Dance, Training), legend.text = TRUE))

Regards,
Gregory
сб, 8 февр. 2020 г. в 18:36, Yawo Kokuvi <yawo1964 using gmail.com>:

>
> Thanks again - I realized after posting that sjlabelled is indirectly
> referencing haven's read_sav function.  For a moment I thought you were
> referring to the read.spss under the older foreign package.  But then
> realized that read_sav and read_spss are equivalent. So that's clear now.
>
> And I also realized there are so many ways to do the same thing in R - so
> as part of learning, I am discovering these different ways, and knowing
> when to use one over the other.
>
> Thanks for the references - I will read further on them.
>
> cheers, cY
>
> On Sat, Feb 8, 2020 at 10:28 AM John Kane <jrkrideau using gmail.com> wrote:
>
> > "use a different function (read_spss) as John has suggested to import the
> > file. "
> >
> > No! As far as I can see sjlabelled is simply using haven"s function "
> > read_sav()" to read in the data. It is just wrapped in the "read.spss()
> > function.There should be no difference between read_sav(sdata.sav) and
> > read_spss(sdata.sav).
> >
> > It just seems to keep the code simpler (more aesthetically pleasing?) if
> > you do not load more packages than needed. Likewise you do not need to load
> > "labels" as sjlabelledis taking care of this for you.
> >
> > Oh, BTW  Scratch$sex %>% attr('labels') can be replaced by something like
> > get_labels(dat1) in my example. There usually are a multitude of ways to do
> > the same thing in R.
> >
> > You might want to have a look at
> > https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
> > and https://strengejacke.github.io/sjlabelled/articles/labelleddata.html
> > for more about working with labels.
> >
> > On Sat, 8 Feb 2020 at 09:35, Yawo Kokuvi <yawo1964 using gmail.com> wrote:
> >
> >> Thanks so much for all your assistance.  I admit R's learning curve is a
> >> bit steep, but I am eager to learn ... and hopefully teach with it.
> >>
> >> with regard to my problem, I can now see two options:  either declare
> >> each categorical variable as factors, specifying the needed levels and
> >> labels.
> >>
> >> OR
> >>
> >> use a different function (read_spss) as John has suggested to import the
> >> file.
> >>
> >> I will experiment with both.
> >>
> >> With much appreciation, cY
> >>
> >> On Sat, Feb 8, 2020 at 9:25 AM John Kane <jrkrideau using gmail.com> wrote:
> >>
> >>> Hi Yawo Kokuvi;
> >>> As an R newbie transitioning from SPSS to R expect culture shock and the
> >>> possible feeling that yor brain is twisting within your skull but it is
> >>> well worth.
> >>>
> >>> Try something like this:
> >>> ##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> dat1  <- structure(list(Animal = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> >>> 0), label = "Animal", labels = c(Cat = 0, Dog = 1), class =
> >>> "haven_labelled"),
> >>>     Training = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), label = "Type
> >>> of Training", labels = c(`Food as Reward` = 0,
> >>>     `Affection as Reward` = 1), class = "haven_labelled"), Dance =
> >>> structure(c(1,
> >>>     1, 1, 1, 1, 1, 1, 1, 1, 1), label = "Did they dance?", labels = c(No
> >>> = 0,
> >>>     Yes = 1), class = "haven_labelled")), row.names = c(NA, -10L
> >>> ), class = c("tbl_df", "tbl", "data.frame"))
> >>>
> >>>
> >>> library(sjlabelled)
> >>> str(dat1)
> >>> get_labels(dat1)
> >>> barplot(table(as_label(dat1$Dance)))
> >>> ##==================================================================
> >>> Your problem sees to be omitting the as_label().
> >>>
> >>> You do not need to load "haven"
> >>> read_spss() in sjlabelled should do the trick.
> >>>
> >>>
> >>> On Sat, 8 Feb 2020 at 05:44, Rui Barradas <ruipbarradas using sapo.pt> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> Try
> >>>>
> >>>> aux_fun <- function(x){
> >>>>    levels <- attr(x, "labels")
> >>>>    factor(x, labels = names(levels), levels = levels)
> >>>> }
> >>>>
> >>>> newCatsDogs <- as.data.frame(lapply(CatsDogs, aux_fun))
> >>>>
> >>>> str(newCatsDogs)
> >>>> #'data.frame':  10 obs. of  3 variables:
> >>>> # $ Animal  : Factor w/ 2 levels "Cat","Dog": 1 1 1 1 1 1 1 1 1 1
> >>>> # $ Training: Factor w/ 2 levels "Food as Reward",..: 1 1 1 1 1 1 1 1 1
> >>>> 1
> >>>> # $ Dance   : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2
> >>>>
> >>>>
> >>>> As for the
> >>>>   - frequencies: ?table, ?tapply, ?aggregate,
> >>>>   - barplots: ?barplot
> >>>>
> >>>> You can find lots and lots of examples online of both covering what
> >>>> seems to simple use cases.
> >>>>
> >>>> Hope this helps,
> >>>>
> >>>> Rui Barradas
> >>>>
> >>>> Às 06:03 de 08/02/20, Yawo Kokuvi escreveu:
> >>>> > Thanks for all. Here is output from dput.  I used a different dataset
> >>>> > containing categorical variables since the previous one is on a
> >>>> different
> >>>> > computer.
> >>>> >
> >>>> > In the following dataset, my interest is in getting frequencies and
> >>>> > barplots for the two variables: Training and Dance, with value labels
> >>>> > displayed.
> >>>> >
> >>>> > thanks again - cY
> >>>> >
> >>>> >
> >>>> > =========
> >>>> > dput(head(CatsDogs, n = 10))
> >>>> > structure(
> >>>> >    list(
> >>>> >      Animal = structure(
> >>>> >        c(0, 0, 0, 0, 0, 0, 0, 0, 0,
> >>>> >          0),
> >>>> >        label = "Animal",
> >>>> >        labels = c(Cat = 0, Dog = 1),
> >>>> >        class = "haven_labelled"
> >>>> >      ),
> >>>> >      Training = structure(
> >>>> >        c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
> >>>> >        label = "Type of Training",
> >>>> >        labels = c(`Food as Reward` = 0,
> >>>> >                   `Affection as Reward` = 1),
> >>>> >        class = "haven_labelled"
> >>>> >      ),
> >>>> >      Dance = structure(
> >>>> >        c(1,
> >>>> >          1, 1, 1, 1, 1, 1, 1, 1, 1),
> >>>> >        label = "Did they dance?",
> >>>> >        labels = c(No = 0,
> >>>> >                   Yes = 1),
> >>>> >        class = "haven_labelled"
> >>>> >      )
> >>>> >    ),
> >>>> >    row.names = c(NA,-10L),
> >>>> >    class = c("tbl_df", "tbl", "data.frame")
> >>>> > )
> >>>> >
> >>>> >
> >>>> > On Fri, Feb 7, 2020 at 10:14 PM Bert Gunter <bgunter.4567 using gmail.com>
> >>>> wrote:
> >>>> >
> >>>> >> Yes. Most attachments are stripped by the server.
> >>>> >>
> >>>> >> Bert Gunter
> >>>> >>
> >>>> >> "The trouble with having an open mind is that people keep coming
> >>>> along and
> >>>> >> sticking things into it."
> >>>> >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Feb 7, 2020 at 5:34 PM John Kane <jrkrideau using gmail.com>
> >>>> wrote:
> >>>> >>
> >>>> >>> Hi,
> >>>> >>> Could you upload some sample data in dput form?  Something like
> >>>> >>> dput(head(Scratch, n=13)) will give us some real data to examine.
> >>>> Just
> >>>> >>> copy
> >>>> >>> and paste the output of dput(head(Scratch, n=13))into the email.
> >>>> This is
> >>>> >>> the best way to ensure that R-help denizens are getting the data in
> >>>> the
> >>>> >>> exact format that you have.
> >>>> >>>
> >>>> >>> On Fri, 7 Feb 2020 at 15:32, Yawo Kokuvi <yawo1964 using gmail.com>
> >>>> wrote:
> >>>> >>>
> >>>> >>>> Thanks for all your assistance
> >>>> >>>>
> >>>> >>>> Attached please is the Rdata scratch I have been using
> >>>> >>>>
> >>>> >>>> -----------------------------------------------------
> >>>> >>>>
> >>>> >>>>> head(Scratch, n=13)
> >>>> >>>> # A tibble: 13 x 6
> >>>> >>>>        ID           marital        sex      race    paeduc
> >>>> speduc
> >>>> >>>>     <dbl>         <dbl+lbl>  <dbl+lbl> <dbl+lbl> <dbl+lbl>
> >>>> <dbl+lbl>
> >>>> >>>>   1     1 3 [DIVORCED]      1 [MALE]   1 [WHITE]        NA
> >>>> NA
> >>>> >>>>   2     2 1 [MARRIED]       1 [MALE]   1 [WHITE]        NA
> >>>> NA
> >>>> >>>>   3     3 3 [DIVORCED]      1 [MALE]   1 [WHITE]         4
> >>>> NA
> >>>> >>>>   4     4 4 [SEPARATED]     1 [MALE]   1 [WHITE]        16
> >>>> NA
> >>>> >>>>   5     5 3 [DIVORCED]      1 [MALE]   1 [WHITE]        18
> >>>> NA
> >>>> >>>>   6     6 1 [MARRIED]       2 [FEMALE] 1 [WHITE]        14
> >>>> 20
> >>>> >>>>   7     7 1 [MARRIED]       2 [FEMALE] 2 [BLACK]        NA
> >>>> 12
> >>>> >>>>   8     8 1 [MARRIED]       2 [FEMALE] 1 [WHITE]        NA
> >>>> 12
> >>>> >>>>   9     9 3 [DIVORCED]      2 [FEMALE] 1 [WHITE]        11
> >>>> NA
> >>>> >>>> 10    10 1 [MARRIED]       2 [FEMALE] 1 [WHITE]        16        12
> >>>> >>>> 11    11 5 [NEVER MARRIED] 2 [FEMALE] 2 [BLACK]        NA        NA
> >>>> >>>> 12    12 3 [DIVORCED]      2 [FEMALE] 2 [BLACK]        NA        NA
> >>>> >>>> 13    13 3 [DIVORCED]      2 [FEMALE] 2 [BLACK]        16        NA
> >>>> >>>>
> >>>> >>>> -----------------------------------------------------
> >>>> >>>>
> >>>> >>>> and below is my script/command file.
> >>>> >>>>
> >>>> >>>> *#1: Load library and import SPSS dataset*
> >>>> >>>> library(haven)
> >>>> >>>> Scratch <- read_sav("~/Desktop/Scratch.sav")
> >>>> >>>>
> >>>> >>>> *#2: save the dataset with a name*
> >>>> >>>> save(ScratchImport, file="Scratch.Rdata")
> >>>> >>>>
> >>>> >>>> *#3: install & load necessary packages for descriptive statistics*
> >>>> >>>> install.packages ("freqdist")
> >>>> >>>> library (freqdist)
> >>>> >>>>
> >>>> >>>> install.packages ("sjlabelled")
> >>>> >>>> library (sjlabelled)
> >>>> >>>>
> >>>> >>>> install.packages ("labelled")
> >>>> >>>> library (labelled)
> >>>> >>>>
> >>>> >>>> install.packages ("surveytoolbox")
> >>>> >>>> library (surveytoolbox)
> >>>> >>>>
> >>>> >>>> *#4: Check the value labels of gender and marital status*
> >>>> >>>> Scratch$sex %>% attr('labels')
> >>>> >>>> Scratch$marital %>% attr('labels')
> >>>> >>>>
> >>>> >>>> *#5:  Frequency Distribution and BarChart for Categorical/Ordinal
> >>>> Level
> >>>> >>>> Variables such as Gender - SEX*
> >>>> >>>> freqdist(Scratch$sex)
> >>>> >>>> barplot(table(Scratch$marital))
> >>>> >>>>
> >>>> >>>> -----------------------------------------------------
> >>>> >>>>
> >>>> >>>> As you can see from above, I use the <haven> package to import the
> >>>> data
> >>>> >>>> from SPSS.  Apparently, the haven function keeps the value labels,
> >>>> as
> >>>> >>> the
> >>>> >>>> attribute options in section #4 of my script shows.
> >>>> >>>> The problem is that when I run frequency distribution for any of
> >>>> the
> >>>> >>>> categorical variables like sex or marital status, only the numbers
> >>>> (1,
> >>>> >>> 2,)
> >>>> >>>> are displayed in the output.  The labels (male, female) for
> >>>> example are
> >>>> >>>> not.
> >>>> >>>>
> >>>> >>>> Is there any way to force these to be shown in the output?  Is
> >>>> there a
> >>>> >>>> global property that I have to set so that these value labels are
> >>>> >>> reliably
> >>>> >>>> displayed with every output?  I read I can declare them as factors
> >>>> using
> >>>> >>>> the <as_factor()>, but once I do so, how do I invoke them in my
> >>>> >>> commands so
> >>>> >>>> that the value labels show...
> >>>> >>>>
> >>>> >>>> Sorry about all the noobs questions, but Ihopefully, I am able to
> >>>> get
> >>>> >>> this
> >>>> >>>> working.
> >>>> >>>>
> >>>> >>>> Thanks in advance.
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> Thanks - cY
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> On Fri, Feb 7, 2020 at 1:14 PM <cpolwart using chemo.org.uk> wrote:
> >>>> >>>>
> >>>> >>>>> I've never used it, but there is a labels function in haven...
> >>>> >>>>>
> >>>> >>>>> On 7 Feb 2020 17:05, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >>>> >>>>>
> >>>> >>>>> What does your data look like after importing? -- see ?head and
> >>>> ?str
> >>>> >>> to
> >>>> >>>>> tell us. Show us the code that failed to provide "labels." See the
> >>>> >>>> posting
> >>>> >>>>> guide below for how to post questions that are likely to elicit
> >>>> >>> helpful
> >>>> >>>>> responses.
> >>>> >>>>>
> >>>> >>>>> I know nothing about the haven package, but see ?factor or go
> >>>> through
> >>>> >>> an
> >>>> >>>> R
> >>>> >>>>> tutorial or two to learn about factors, which may be part of the
> >>>> issue
> >>>> >>>>> here. R *generally* obtains whatever "label" info it needs from
> >>>> the
> >>>> >>>> object
> >>>> >>>>> being tabled -- see ?tabulate, ?table etc. -- if that's what
> >>>> you're
> >>>> >>>> doing.
> >>>> >>>>>
> >>>> >>>>> Bert Gunter
> >>>> >>>>>
> >>>> >>>>> "The trouble with having an open mind is that people keep coming
> >>>> along
> >>>> >>>> and
> >>>> >>>>> sticking things into it."
> >>>> >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>> On Fri, Feb 7, 2020 at 8:28 AM Yawo Kokuvi <yawo1964 using gmail.com>
> >>>> >>> wrote:
> >>>> >>>>>
> >>>> >>>>>> Hello,
> >>>> >>>>>>
> >>>> >>>>>> I am just transitioning from SPSS to R.
> >>>> >>>>>>
> >>>> >>>>>> I used the haven library to import some of my spss data files to
> >>>> R.
> >>>> >>>>>>
> >>>> >>>>>> However, when I run procedures such as frequencies or crosstabs,
> >>>> >>> value
> >>>> >>>>>> labels for categorical variables such as gender (1=male,
> >>>> 2=female)
> >>>> >>> are
> >>>> >>>>> not
> >>>> >>>>>> shown. The same applies to many other output.
> >>>> >>>>>>
> >>>> >>>>>> I am confused.
> >>>> >>>>>>
> >>>> >>>>>> 1. Is there a global setting that I can use to force all
> >>>> categorical
> >>>> >>>>>> variables to display labels?
> >>>> >>>>>>
> >>>> >>>>>> 2. Or, are these labels to be set for each function or package?
> >>>> >>>>>>
> >>>> >>>>>> 3. How can I request the value labels for each function I run?
> >>>> >>>>>>
> >>>> >>>>>> Thanks in advance for your help..
> >>>> >>>>>>
> >>>> >>>>>> Best, Yawo
> >>>> >>>>>>
> >>>> >>>>>>          [[alternative HTML version deleted]]
> >>>> >>>>>>
> >>>> >>>>>> ______________________________________________
> >>>> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> >>>> see
> >>>> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>>>> PLEASE do read the posting guide
> >>>> >>>>>> http://www.R-project.org/posting-guide.html
> >>>> >>>>>> and provide commented, minimal, self-contained, reproducible
> >>>> code.
> >>>> >>>>>>
> >>>> >>>>>
> >>>> >>>>> [[alternative HTML version deleted]]
> >>>> >>>>>
> >>>> >>>>> ______________________________________________
> >>>> >>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>>> PLEASE do read the posting guide
> >>>> >>>>> http://www.R-project.org/posting-guide.html
> >>>> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>>
> >>>> >>>>
> >>>> >>>>          [[alternative HTML version deleted]]
> >>>> >>>>
> >>>> >>>> ______________________________________________
> >>>> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>>> PLEASE do read the posting guide
> >>>> >>>> http://www.R-project.org/posting-guide.html
> >>>> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>> >>>>
> >>>> >>>
> >>>> >>>
> >>>> >>> --
> >>>> >>> John Kane
> >>>> >>> Kingston ON Canada
> >>>> >>>
> >>>> >>>          [[alternative HTML version deleted]]
> >>>> >>>
> >>>> >>> ______________________________________________
> >>>> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> >>> PLEASE do read the posting guide
> >>>> >>> http://www.R-project.org/posting-guide.html
> >>>> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>> >>>
> >>>> >>
> >>>> >
> >>>> >       [[alternative HTML version deleted]]
> >>>> >
> >>>> > ______________________________________________
> >>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> > PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> > and provide commented, minimal, self-contained, reproducible code.
> >>>> >
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>>
> >>> --
> >>> John Kane
> >>> Kingston ON Canada
> >>>
> >>
> >
> > --
> > John Kane
> > Kingston ON Canada
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list