[BioC] R: Script for filtering

James W. MacDonald jmacdon at med.umich.edu
Thu Apr 23 17:59:10 CEST 2009


OK, still not very detailed, so I will make assumptions.

I assume the file GenomeWideSNP_6.0 is the csv file you can download 
from Affy.

I assume the second 'list' of 500 SNPs is actually a vector of the dbSNP 
RS IDs for 500 SNPs.

I assume you want to know how many of the 500 SNPs in the vector of IDs 
are found on the 6.0 chip.

I assume you read both into R, and called the Affy csv file 'GW6' and 
the vector of RS IDs 'rsid'. I also assume there is a column called 
'rsid' in the GW6 data.frame.

intersection <- GW6[GW6$rsid %in% rsid,]

And if there is a column in GW6 called 'gene' that you are interested 
in, then you could add

intersection <- GW6[GW6$rsid %in% rsid,"gene"]

to get just that column.

Hopefully that helps.

But maybe you see my point about detailed questions. When you want to 
know how to do something, you are asking a very _specific_ question. If 
you don't give very specific details about what you are trying to do, 
preferably with sample code if things aren't working the way you think 
they should, then people are left to guess what you want and what you 
have tried.

Best,

Jim



Alberto Goldoni wrote:
> Sorry i'll be more detailed.
> 
> in R i'd need to load the file GenomeWideSNP_6.0 containing all the SNPs and in the second time i compare this list with a second list containing SNPs of 500 genes.
> 
> I would like to know how many genes (of the second list: 500) are included in the first list (GenomeWideSNP_6.0 database) and 
> which SNPs are the same between the two lists.
> 
> best regards.
> 
> ________________________________________
> Da: James W. MacDonald [jmacdon at med.umich.edu]
> Inviato: giovedì 23 aprile 2009 15.35
> A: Alberto Goldoni
> Cc: bioconductor at stat.math.ethz.ch
> Oggetto: Re: [BioC] Script for filtering
> 
> Hi Alberto,
> 
> Alberto Goldoni wrote:
>> Hi to everybody,
>>
>> i have to extract 500 genes from all the genes present on the GenomeWideSNP_6.0 database.
> 
> I'm not familiar with this database. Could you please give more
> information? Also, there are no genes measured on the GenomeWideSNP_6.0
> chip. This chip measures SNPs, which may or may not be in or near genes.
> 
> 
>> If i have the list of these 500 genes, are there a script in order to extract only these genes from the complete list?
> 
> This question is too vague to be answered. What is the 'complete list'?
> 
> Maybe you are trying to subset a list or data.frame, in which case you
> should look at
> 
> ?'['
> ?'%in%
> or perhaps
> ?which
> 
> Best,
> 
> Jim
> 
> 
>> Thanks a lot.
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826



More information about the Bioconductor mailing list