[BioC] Did the behavior of as.vector(Rle(some.factor)) change on purpose?

Martin Morgan mtmorgan at fhcrc.org
Tue Aug 31 17:37:29 CEST 2010


On 08/31/2010 07:15 AM, Steve Lianoglou wrote:
> Hi all,
> 
> It looks as if the as.vector call to a run length encoded factor turns
> it to a vector of characters.
> 
> Did this happen on accident, or was it a deliberate design decision?

Bug fix

> x = factor(letters)
> as.vector(x)
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
"r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
> as.factor(x)
 [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

So

> Rle(factor(letters))
'factor' Rle of length 26 with 26 runs
  Lengths: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  Values : a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels(26): a b c d e f g h i j k l m n o p q r s t u v w x y z
> as.vector(Rle(factor(letters)))
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
"r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
> as.factor(Rle(factor(letters)))
 [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

There might be edge cases where our own code has not caught up with the
fix; please let us know...

> packageDescription('IRanges')$Version
[1] "1.7.32"


Martin

> 
> Previously:
> 
> R-2.12, IRanges_1.7.19, GenomicRanges_1.1.20
> (A factor of length one is returned):
> 
> R> a <- Rle(strand(c('+', '-', '+', '+', '-')))
> R> as.vector(a[1])
> [1] +
> Levels: + - *
> 
> =============================
> 
> Now:
> R-2.12, IRanges_1.7.31, GenomicRanges_1.1.20 (The factor is converted
> to a character)
> 
> R> a <- Rle(strand(c('+', '-', '+', '+', '-')))
> R> as.vector(a[1])
> [1] "+"
> 
> It seems like it would do what is expected (by me :-) if the
> `getMethod('as.vector', c("Rle", "missing"))` was changed from:
> 
> function (x, mode = "any")
> rep.int(as.vector(runValue(x)), runLength(x))
> 
> To:
> 
> function (x, mode = "any")
> rep.int(runValue(x), runLength(x))
> 
> but, upon further inspection, it seems like this was how it was
> defined previously anyway, so ... I guess something motivated this
> change?
> 
> The complete sessionInfo for my last (buggy(?)) case is:
> 
> R version 2.12.0 Under development (unstable) (2010-07-07 r52477)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=C
>  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
>                LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] GenomicRanges_1.1.20 IRanges_1.7.31
> 
> loaded via a namespace (and not attached):
> [1] tools_2.12.0
> 
> Thanks,
> -steve
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list