[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?

Ista Zahn istazahn at gmail.com
Fri Nov 13 16:00:58 CET 2015


Why do you think this is a bug in have? To the contrary, I don't think
this has anything to do with haven at all. The problem seems to be
that attr does partial matching by default. Check it out:

> attr(x, "labels") <- c("foo", "bar", "baz")
> attr(x, "label")
[1] "foo" "bar" "baz"

and see ?attr for details.

The answer I think is

fix_labels <- function(x, TextIfMissing) {
      val <- attr(x, "label", exact = TRUE)
      if (is.null(val)) TextIfMissing else val
}

Finally, note that the development version of rio
(https://github.com/leeper/rio) has an (non-exported) function for
cleaning up meta data from haven imports. See
https://github.com/leeper/rio/blob/master/R/utils.R#L86

Best,
Ista

On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> I have to rephrase my question again - it's clearly a small bug in
> haven. Here is what it is about:
>
> If I have a column in SPSS that has BOTH a long label and value
> labels, then everything works fine - I access one with 'label' and
> another with 'labels':
>
> attr(spss1$MYVAR, "label")
> [1] "LONG LABEL"
> attr(spss1$MYVAR, "labels")
>     DEFINITELY CONSIDER       PROBABLY CONSIDER   PROBABLY NOT
> CONSIDER DEFINITELY NOT CONSIDER
>                       1                       2
> 3                       4
>
> However, if I have a column that has no long label and ONLY value
> labels, then it's not working properly:
>
>> attr(spss1$MYVAR, "label")
> VERY/SOMEWHAT FAMILIAR    NOT AT ALL FAMILIAR
>                      1                      2
>> attr(spss1$MYVAR, "labels")
> VERY/SOMEWHAT FAMILIAR    NOT AT ALL FAMILIAR
>                      1                      2
>
> And I actually need to be able to identify if label is empty.
> Thank you for looking into it!
>
> Dimitri
>
>
> On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Looks like a little bug in 'haven':
>>
>> When I actually look at the attributes of one variable that has no
>> long label in SPSS but has Value Labels, I am getting:
>> attr(spss1$WAVE, "label")
>> NULL
>>
>> But when I sapply my function longlabels to my data frame and ask it
>> to print the long labels for each column, for the same column "WAVE" I
>> am getting - instead of NULL:
>> NULL
>> VERY/SOMEWHAT FAMILIAR    NOT AT ALL FAMILIAR
>>                      1                      2
>>
>> This is, of course, incorrect, because it grabs the next attribute
>> (which one? And replaces NULL with it).
>> Any suggestions?
>> Thanks!
>>
>>
>>
>>
>> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>> Hello!
>>>
>>> I don't have an example file, but I think my question should be clear
>>> without it.
>>> I have an SPSS file. I read it in using 'haven':
>>>
>>> library(haven)
>>> spss1 <- read_spss("SPSS_Example.sav")
>>>
>>> I created a function that extracts the long labels (in SPSS - "Label"):
>>>
>>> fix_labels <- function(x, TextIfMissing) {
>>>       val <- attr(x, "label")
>>>       if (is.null(val)) TextIfMissing else val
>>> }
>>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS")
>>>
>>> This function is supposed to create a vector of long labels and
>>> usually it does, e.g.:
>>>
>>> str(longlabels)
>>>  Named chr [1:64] "Serial number" ...
>>>  - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ...
>>>
>>> However, I just got an SPSS file with 92 columns and ran exactly the
>>> same function on it. Now, I am getting not a vector, but a list
>>>
>>> str(longlabels)
>>> List of 92
>>>  $ VEHRATED      : chr "VEHICLE RATED"
>>>  $ RESPID        : chr "RESPONDENT ID"
>>>  $ RESPID8       : chr "8 DIGIT RESPONDENT NUMBER"
>>>
>>> An observation about the structure of longlabels here: those columns
>>> that do NOT have a long lable in SPSS but DO have Values (value
>>> labels) - for them my function grabs their value labels, so that now
>>> my long label is recorded as a numeric vector with names, e.g.:
>>>
>>>  $ AWARE2        : Named num [1:2] 1 2
>>>   ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR"
>>>
>>> Question: How could I avoid the extraction of the Value Labels for the
>>> columns that have no long labels?
>>>
>>> Thank you very much!
>>> --
>>> Dimitri Liakhovitski
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list