[BioC] Communicating gene-probe relationships to sam

Charles C. Berry cberry at tajo.ucsd.edu
Sun May 7 00:22:00 CEST 2006



Charles,

You used method='cat.stat', which is (only) suitable for SNPs, I think.

This seems like an odd choice for detecting SFPs. You would have to do 
more work to get this to work properly if you actually were counting SNPs 
- i.e. you would have to **call** the SNPs. (And reasonably enough, sam() 
doesn't like small sample numbers either.)

I suggest you use a linear model (method='d.stat') like the one in our 
Genome Research paper available here:

 	http://naturalsystems.uchicago.edu/naturalvariation/sfp/

BTW, sam() will not object to very small sample numbers with 
'method="d.stat"'

HTH,

Chuck Berry

On Fri, 5 May 2006, Charles Crane wrote:

> Dear newsgroup:
>    I am trying to use sam to identify single-feature polymorphisms between
> two barley cultivars.  I am running Bioconductor 1.7, Biobase 1.8.0, affy
> 1.8.1, and siggenes 1.4.0 under Mac OS X 10.3.9.  I can successfully extract
> probe-level data from my CEL files, background correct the pm values with
> rma, quantile normalize, and isolate the pm intensities in a data frame, but
> both sam and sam.snp consistently and instantly fail with "Error in
> FUN(data, cl, ...) : There should be at least 25 samples."  This happens
> even though there are 22840 genes and 11 probes per gene in the data.  What
> am I overlooking?  Does sam need some sort of a mapping from the 11 probes
> to their parent gene?  A session history follows.
>    Thank your very much for your time.  Also, do you plan to include
> directions for using sam for SNP analysis, going all the way from ReadAffy
> through sam for a case history, in a future vignette?  It would save the
> uninitiated a lot of time and frustration.
>
> Charles Crane
> USDA-ARS, MWA, Crop Production and Pest Control Research Unit
> And Department of Botany and Plant Pathology, Purdue University
>
>> library(affy)
>> library(siggenes)
>> sidata <- ReadAffy(filenames = c("000113_5hr_kindered_H20_inoc_2x2.CEL",
> "000114_12hr_kindered_H20_inoc_3bx1.CEL",
> +  "000122_5hr_peruvian_H20_inoc_14bx1.CEL",
> "000123_12hr_peruvian_H20_inoc_15bx1.CEL",
> "000141_Non-host-resistant-barley_16bx2.CEL",
> +  "000157_24hr_Kindered_H2O_Inoc_1.CEL",
> "000158_0hr_Kindered_H2O_Inoc_4.CEL",
> "000159_0hr_Kindered_S_passerini_inoc_5.CEL",
> +  "000161_0hr_Peruvian_H2O_inoc_13.CEL",
> "000162_0hr_Peruvian_S_passerini_17.CEL",
> "000189_0hr_403-rep2_H2O_inoc_21b.CEL",
> +  "000190_5hr_403-rep2_H2O_inoc_22b.CEL",
> "000191_12hr_403-rep2_H2O_inoc_23b.CEL",
> "000192_24hr_403-rep2_H2O_inoc_24b.CEL",
> +  "000193_0hr_403-rep2_inoc-s_pass_25b.CEL",
> "000201_0hr_405-rep2_H2O_inoc_33b.CEL",
> "000202_5hr_405-rep2_H2O_inoc_34b.CEL",
> +  "000203_12hr_405-rep2_H2O_inoc_35b.CEL",
> "000204_24hr_405-rep2_H2O_inoc_36b.CEL",
> "000205_0hr_405-rep2_inoc-s_pass_37b.CEL"),
> +  sampleNames = c("000113_5hr_kindered_H20_inoc_2x2",
> "000114_12hr_kindered_H20_inoc_3bx1", "000122_5hr_peruvian_H20_inoc_14bx1",
> +  "000123_12hr_peruvian_H20_inoc_15bx1",
> "000141_Non-host-resistant-barley_16bx2", "000157_24hr_Kindered_H2O_Inoc_1",
> +  "000158_0hr_Kindered_H2O_Inoc_4",
> "000159_0hr_Kindered_S_passerini_inoc_5", "000161_0hr_Peruvian_H2O_inoc_13",
> +  "000162_0hr_Peruvian_S_passerini_17", "000189_0hr_403-rep2_H2O_inoc_21b",
> "000190_5hr_403-rep2_H2O_inoc_22b",
> +  "000191_12hr_403-rep2_H2O_inoc_23b", "000192_24hr_403-rep2_H2O_inoc_24b",
> "000193_0hr_403-rep2_inoc-s_pass_25b",
> +  "000201_0hr_405-rep2_H2O_inoc_33b", "000202_5hr_405-rep2_H2O_inoc_34b",
> "000203_12hr_405-rep2_H2O_inoc_35b",
> +  "000204_24hr_405-rep2_H2O_inoc_36b",
> "000205_0hr_405-rep2_inoc-s_pass_37b"),
> +  phenoData = "Hordeumphenodata.txt")
>> signames <- geneNames(sidata)
>> sirma <- bg.correct(sidata, method = "rma")
>> sinorm <- normalize(sirma)
>> siprobeintensities <- as.data.frame(probes(sinorm, "pm"))
>> write.table(siprobeintensities, file = "siprobeintensities.txt")
>> sicl <- c(1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
>> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names =
> dimnames(siprobeintensities)[[1]], B = 1000, ran = 11279)
> Error in FUN(data, cl, ...) : There should be at least 25 samples.
>> sigenenames <- scan(file = "sigenenamestrimmed.txt", what = 'character')
> Read 251437 items
>> sigenenames[1:22]
> [1] "1200459_Reg_88-1740_at"  "1200459_Reg_88-1740_at"
> [3] "1200459_Reg_88-1740_at"  "1200459_Reg_88-1740_at"
> [5] "1200459_Reg_88-1740_at"  "1200459_Reg_88-1740_at"
> [7] "1200459_Reg_88-1740_at"  "1200459_Reg_88-1740_at"
> [9] "1200459_Reg_88-1740_at"  "1200459_Reg_88-1740_at"
> [11] "1200459_Reg_88-1740_at"  "1289374_Reg_826-1545_at"
> [13] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [15] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [17] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [19] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
> [21] "1289374_Reg_826-1545_at" "1289374_Reg_826-1545_at"
>> samout <- sam(siprobeintensities, sicl, method = "cat.stat", gene.names =
> sigenenames, B = 1000, ran = 11279)
> Error in FUN(data, cl, ...) : There should be at least 25 samples.
>
>
>
>    [ Part 3.6: "Included Message" ]
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717



More information about the Bioconductor mailing list