[R] Merging and extracting data from list

Dr. Viviana Menzel vivianamenzel at gmx.de
Thu Jan 21 19:46:01 CET 2010


Hello R-help group,

I have a question about merging lists. I have two lists:

Genes list (hSgenes)
name    chr    strand    start    end    transStart    transEnd    
symbol    description    feature
ENSG00000223972    1    1    11874    14412    11874    14412        
DEAD/H box polypeptide 11 like 1DEAD/H box polypeptide 11 like 3DEAD/H 
box polypeptide 11 like 9 ;; [Source:UniProtKB/TrEMBL;Acc:B7ZGX0]    gene
ENSG00000227232    1    -1    14363    29570    17551    29343    
WASH5P    WAS protein family homolog 5 pseudogene (WASH5P), non-coding 
RNA [Source:RefSeq DNA;Acc:NR_024540]    gene
.....

Chers list (chersList)
name    chr    start    end    cellType    antibody    features    
maxLevel    score
chr1.cher1    1    859132    859732    human    AB    ENSG00000223764 
ENSG00000231958 ENSG00000187634    1.25736038968316    0.664381383074449
chr1.cher2    1    889564    890464    human    AB    ENSG00000188976    
1.47884233632064    2.88839131446868
chr1.cher3    1    1106364    1106864    human    AB    
ENSG00000162571    1.83795654418115    3.58404359147275
....

In the second list, I want to add a column with the gene description 
(obtained from the first list). I used the following method:

chersMergeGenes <- 
data.frame(chersList,description=hSgenes$description[match(chersList$features, 
hSgenes$name)],symbol=hSgenes$symbol[match(chersList$features, 
hSgenes$name)])
write.table(chersMergeGenes, row.names=F, quote=F, sep="\t", 
file="chersMergeGenes.txt")


and it works only partially. When chersList$features contains more than 
a feature (e.g. ENSG00000223764 ENSG00000231958 ENSG00000187634), it 
doesn't work (NA as result).
But I don't know how to split the features to obtain all descriptions.

Can someone give me a hint to do this?


Another problem:

I have following data:

$ENSG00000000003
[1] "GO:0043123" "GO:0004871"

$ENSG00000000419
 [1] "GO:0018406" "GO:0035269" "GO:0006506" "GO:0019348" "GO:0005789"
 [6] "GO:0005624" "GO:0005783" "GO:0033185" "GO:0004582" "GO:0004169"
[11] "GO:0005515"

$ENSG00000000457
[1] "GO:0005737" "GO:0030027" "GO:0005794" "GO:0005515"

I want to extract a list of names ($ENSG00000?????) where go = 
GO:0005515. How can I do it?

Thanks on advance

Viviana

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Viviana Menzel
Rottweg 34
35428 Langgöns
Tel.: +49 6403 7748550
Mobil: +49 177 5126092
E-Mail: vivianamenzel at gmx.de
Web: www.dres-menzel.de



More information about the R-help mailing list