[R] "haven" - read_spss: How to avoid extracting value labels instead of long labels?

Dimitri Liakhovitski dimitri.liakhovitski at gmail.com
Fri Nov 13 18:42:12 CET 2015


You are absolutely right, Ista - it's not haven's fault, my bad.
Of course, it's the attr function and exact = TRUE.
Thank you so much!
Dimitri

On Fri, Nov 13, 2015 at 10:00 AM, Ista Zahn <istazahn at gmail.com> wrote:
> Why do you think this is a bug in have? To the contrary, I don't think
> this has anything to do with haven at all. The problem seems to be
> that attr does partial matching by default. Check it out:
>
>> attr(x, "labels") <- c("foo", "bar", "baz")
>> attr(x, "label")
> [1] "foo" "bar" "baz"
>
> and see ?attr for details.
>
> The answer I think is
>
> fix_labels <- function(x, TextIfMissing) {
>       val <- attr(x, "label", exact = TRUE)
>       if (is.null(val)) TextIfMissing else val
> }
>
> Finally, note that the development version of rio
> (https://github.com/leeper/rio) has an (non-exported) function for
> cleaning up meta data from haven imports. See
> https://github.com/leeper/rio/blob/master/R/utils.R#L86
>
> Best,
> Ista
>
> On Thu, Nov 12, 2015 at 8:37 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> I have to rephrase my question again - it's clearly a small bug in
>> haven. Here is what it is about:
>>
>> If I have a column in SPSS that has BOTH a long label and value
>> labels, then everything works fine - I access one with 'label' and
>> another with 'labels':
>>
>> attr(spss1$MYVAR, "label")
>> [1] "LONG LABEL"
>> attr(spss1$MYVAR, "labels")
>>     DEFINITELY CONSIDER       PROBABLY CONSIDER   PROBABLY NOT
>> CONSIDER DEFINITELY NOT CONSIDER
>>                       1                       2
>> 3                       4
>>
>> However, if I have a column that has no long label and ONLY value
>> labels, then it's not working properly:
>>
>>> attr(spss1$MYVAR, "label")
>> VERY/SOMEWHAT FAMILIAR    NOT AT ALL FAMILIAR
>>                      1                      2
>>> attr(spss1$MYVAR, "labels")
>> VERY/SOMEWHAT FAMILIAR    NOT AT ALL FAMILIAR
>>                      1                      2
>>
>> And I actually need to be able to identify if label is empty.
>> Thank you for looking into it!
>>
>> Dimitri
>>
>>
>> On Thu, Nov 12, 2015 at 5:55 PM, Dimitri Liakhovitski
>> <dimitri.liakhovitski at gmail.com> wrote:
>>> Looks like a little bug in 'haven':
>>>
>>> When I actually look at the attributes of one variable that has no
>>> long label in SPSS but has Value Labels, I am getting:
>>> attr(spss1$WAVE, "label")
>>> NULL
>>>
>>> But when I sapply my function longlabels to my data frame and ask it
>>> to print the long labels for each column, for the same column "WAVE" I
>>> am getting - instead of NULL:
>>> NULL
>>> VERY/SOMEWHAT FAMILIAR    NOT AT ALL FAMILIAR
>>>                      1                      2
>>>
>>> This is, of course, incorrect, because it grabs the next attribute
>>> (which one? And replaces NULL with it).
>>> Any suggestions?
>>> Thanks!
>>>
>>>
>>>
>>>
>>> On Thu, Nov 12, 2015 at 11:56 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>> Hello!
>>>>
>>>> I don't have an example file, but I think my question should be clear
>>>> without it.
>>>> I have an SPSS file. I read it in using 'haven':
>>>>
>>>> library(haven)
>>>> spss1 <- read_spss("SPSS_Example.sav")
>>>>
>>>> I created a function that extracts the long labels (in SPSS - "Label"):
>>>>
>>>> fix_labels <- function(x, TextIfMissing) {
>>>>       val <- attr(x, "label")
>>>>       if (is.null(val)) TextIfMissing else val
>>>> }
>>>> longlabels <- sapply(spss1, fix_labels, TextIfMissing = "NO LABLE IN SPSS")
>>>>
>>>> This function is supposed to create a vector of long labels and
>>>> usually it does, e.g.:
>>>>
>>>> str(longlabels)
>>>>  Named chr [1:64] "Serial number" ...
>>>>  - attr(*, "names")= chr [1:64] "Respondent_Serial" "weight" "r7_1" "r7_2" ...
>>>>
>>>> However, I just got an SPSS file with 92 columns and ran exactly the
>>>> same function on it. Now, I am getting not a vector, but a list
>>>>
>>>> str(longlabels)
>>>> List of 92
>>>>  $ VEHRATED      : chr "VEHICLE RATED"
>>>>  $ RESPID        : chr "RESPONDENT ID"
>>>>  $ RESPID8       : chr "8 DIGIT RESPONDENT NUMBER"
>>>>
>>>> An observation about the structure of longlabels here: those columns
>>>> that do NOT have a long lable in SPSS but DO have Values (value
>>>> labels) - for them my function grabs their value labels, so that now
>>>> my long label is recorded as a numeric vector with names, e.g.:
>>>>
>>>>  $ AWARE2        : Named num [1:2] 1 2
>>>>   ..- attr(*, "names")= chr [1:2] "VERY/SOMEWHAT FAMILIAR" "NOT AT ALL FAMILIAR"
>>>>
>>>> Question: How could I avoid the extraction of the Value Labels for the
>>>> columns that have no long labels?
>>>>
>>>> Thank you very much!
>>>> --
>>>> Dimitri Liakhovitski
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



-- 
Dimitri Liakhovitski



More information about the R-help mailing list