[BioC] A question when using Biostrings

Hervé Pagès hpages at fhcrc.org
Mon Oct 21 21:24:33 CEST 2013

Hi Xiaoyun,

On 10/21/2013 06:50 AM, Xiaoyun Shang wrote:
> Dear Pages,
> Biostrings is a great package in R. Thank you all!
> I had a question when I used this package to deal with a fasta file.
> Here is my code:
> library("Biostrings")
> human_protein <- readAAStringSet("HUMAN.fasta","fasta",use.names=TRUE)
> names(human_peotein[[1]])
> When I use the function 'names' to extract the name of first name, it
> returns 'NULL', but it do have a name (description). How can I get this
> name included in the fasta file?

You need to understand the subtle but fundamental difference between
[[ and [ on a list-like object:

   x <- list(A=1:3, B=c(x=1, y=2), C=NULL, D=2:-1)

'x' is a named list:

   > names(x)
   [1] "A" "B" "C" "D"

Those names are the "outter names" (aka "top-level names") of the list.

The 1st list element is an integer vector of length 3:

   > x[[1]]
   [1] 1 2 3

   > length(x[[1]])
   [1] 3

It has no names:

   > names(x[[1]])

The 2nd list element is an integer vector of length 2:

   > x[[2]]
   x y
   1 2

It has names:

   > names(x[[2]])
   [1] "x" "y"

Those names are the "inner names" of the 2nd list element.

To extract the 1st *outter* name of the list, do:

   > names(x)[1]
   [1] "A"

You could get the same result with (note the use of the single
bracket here):

   > names(x[1])
   [1] "A"

This works because: x[1] is still a list like 'x' but only the
1st element of the list was kept (so it's a list of length 1).
And also because [ subsets the *outter* list names:

   > length(x[1])
   [1] 1

   > x[1]
   [1] 1 2 3

Note that, even if semantically equivalent, 'names(x)[i]' is preferred
over 'names(x[i])'.

The 'human_protein' object you got with readAAStringSet() is an
AAStringSet object, which is a list-like object. So the same apply.

Hope this helps,

> Many thanks!
> Xiaoyun
> --
> Xiaoyun Shang     MD, PhD
> Institute of Immunology, PLA
> Third Military Medical University
> 30# Gaotanyan, Shapingba,Chongqing
> 400038 P.R.China
> phone: +86 23 6877 1920 <tel:%2B86%2023%206877%201920>
> mobile:+86-135 2748 0908 <tel:%2B86-135%202748%200908>
> Email: shangxiaoyun at gmail.com <mailto:shangxiaoyun at gmail.com> ;
> shangxiaoyun at tmmu.edu.cn <mailto:shangxiaoyun at tmmu.edu.cn>

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioconductor mailing list