[BioC] append on DNAStringSet produces an empty DNAString as last element

Philip Kensche pkensche at cmbi.ru.nl
Wed Aug 11 13:25:05 CEST 2010


Dear Martin,

> On 08/10/2010 03:01 AM, Philip Kensche wrote:
> > Hi,
> > 
> > I noticed that following:
> > 
> >> append(DNAStringSet(), list(DNAString("aaaa"), DNAString("catc")))
> > 
> > [[1]]
> >   4-letter "DNAString" instance
> > seq: AAAA
> > 
> > [[3A2]]
> >   4-letter "DNAString" instance
> > seq: CATC
> > 
> > [[3]]
> >   A DNAStringSet instance of length 0
> > 
> > I guess, the last element shouldn't be there -- or not?

> this has to do with what base::append does when the first argument is
> zero length,

> > base::append
> function (x, values, after = length(x))
> {
>     lengx <- length(x)
>     if (!after)
>         c(values, x)
>     else if (after >= lengx)
>         c(x, values)
>     else c(x[1L:after], values, x[(after + 1L):lengx])
> }
> <environment: namespace:base>

> which leads to some inconsistent behavior, e.g., dropping zero-length
> atomic vectors but not other data structures

> > append(numeric(), list(1))
> [[1]]
> [1] 1

> > append(new.env(), list(1))
> [[1]]
> [1] 1

> [[2]]
> <environment: 0x461a508>

> I'm not sure what the reason for this behavior is; I might have expected
> list(numeric(), 1) in the first case, list(new.env(), 1) in the second.

If I see that right, it is a problem of the append function from package base, i.e. of an R core package.

Actually, I noticed that function base::append called on c("DNAStringSet", "list") returns a list. I would expect it to return an extended DNAStringSet.

Thanks, Martin!

	Philip

P.S.:

> is that '[[3A2]]' in your output correct? It suggests some kind of
> memory corruption (in R?) but I can't reproduce it.

It's not because of R. It must have happened in the editor -- so nothing to worry about :-)


> Martin

> > 
> > 
> > Regards,
> > 
> >	Philip
> > 
> > 
> > 
> > 
> > P.S.:
> > 
> > 
> >> sessionInfo()
> > R version 2.11.1 (2010-05-31) 
> > x86_64-pc-linux-gnu 
> > 
> > locale:
> >  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
> >  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
> >  [5] LC_MONETARY=C              LC_MESSAGES=de_DE.UTF-8   
> >  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> > [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
> > 
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base     
> > 
> > other attached packages:
> > [1] GenomicRanges_1.0.7 Biostrings_2.16.9   IRanges_1.6.6      
> > 
> > loaded via a namespace (and not attached):
> > [1] Biobase_2.8.0   BSgenome_1.16.2
> > 
> > 


> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109

> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



-- 
  | Philip Kensche <pkensche at cmbi.ru.nl>
  | http://www.cmbi.ru.nl/~pkensche
  |
  | Center for Molecular and Biomolecular Informatics
  | http://www2.cmbi.ru.nl
  |
  | phone +31 (0)24 36 19693
  | fax   +31 (0)24 36 19395



More information about the Bioconductor mailing list