[BioC] snpStats, read.long, alleles in two columns

Liz Hare doggene at earthlink.net
Wed Mar 7 18:22:59 CET 2012


Hi David and Vincent,

Thanks so much for the quick responses to my problem!

When I did:

CanineHD <- read.long(file="filename",
                       fields=c(snp=1, sample=2, genotype=NA, 
allele.A=3, allele.B=4),
                       verbose=TRUE)

I got:

Data to be read from the file filename
No confidence thresholds specified
Initial scan of file
First sample: 04-0677/J279
First snp: BICF2G630100019
Last snp: YNp1-608
Last sample: 10-1160
96x173662 matrix to be read
Error in length(gcodes) : 'gcodes' is missing

It looks from the help page like gcodes should only be used if the 
genotype is in one field.

My allele fields contain either A, C, T, G or -. What should I tell gcodes?

Thanks,
Liz

On 3/7/2012 11:17 AM, David Clayton wrote:
> What _should_ work is
>
> ..., fields(genotype=NA, allele.A=3, allele.B=4), ...
>
> but I have to agree that the documentation is distinctly lacking.
>
> Let me know if this doesn't work.
>
> David Clayton
>
>
> On 07/03/12 15:16, Liz Hare wrote:
>> Hello,
>>
>> I am trying to read an Illumina final format .txt file (tab-delimited)
>> into snpStats. The file contains 4 columns: snp, sample, allele 1, and
>> allele 2. Some sample lines:
>>
>> BICF2G630100019 04-0677/J279 C C
>> BICF2G630100032 04-0677/J279 T T
>> BICF2G630100034 04-0677/J279 G G
>> BICF2G630100043 04-0677/J279 A A
>> BICF2G630100054 04-0677/J279 T T
>> BICF2G630100063 04-0677/J279 T C
>> BICF2G630100075 04-0677/J279 T T
>> BICF2G63010009 04-0677/J279 G G
>> BICF2G630100090 04-0677/J279 C C
>>
>> I can't figure out from the documentation or vignette on data input how
>> to specify that the alleles are in two columns.
>>
>> This doesn't work:
>>
>> > CanineHD <- read.long(file="filename",
>> + fields=c(snp=1, sample=2, genotype=3, genotype=4),
>> + verbose=TRUE)
>> Data to be read from the file filename
>> No confidence thresholds specified
>> Genotype read as a single field of two characters (which specify the
>> alleles)
>> Initial scan of file
>> First sample: 04-0677/J279
>> First snp: BICF2G630100019
>> Last snp: YNp1-608
>> Last sample: 10-1160
>> 96x173662 matrix to be read
>> Reading genotypes from file
>> 20% 40% 60% 80% 100%
>> .........|.........|.........|.........|.........|
>> -Error in read.long(file = "filename", :
>> at line 1: C (expecting a 2-character genotype field)
>> In addition: Warning message:
>> closing unused connection 3 (filename)
>>
>> So I tried:
>>
>> > CanineHD <- read.long(file="filename",
>> + fields=c(snp=1, sample=2, genotype=3),
>> + gcodes="\t", codes="nucleotide", verbose=TRUE)
>> Error in read.long(file = "filename", :
>> unused argument(s) (codes = "nucleotide")
>> > CanineHD <- read.long(file="filename",
>> + fields=c(snp=1, sample=2, genotype=3),
>> + split="\t", verbose=TRUE)
>> Data to be read from the file filename
>> No confidence thresholds specified
>> Genotype read as a single field of two characters (which specify the
>> alleles)
>> Initial scan of file
>> First sample: 04-0677/J279
>> First snp: BICF2G630100019
>> Last snp: YNp1-608
>> Last sample: 10-1160
>> 96x173662 matrix to be read
>> Reading genotypes from file
>> 20% 40% 60% 80% 100%
>> .........|.........|.........|.........|.........|
>> -Error in read.long(file = "filename", :
>> at line 1: C (expecting a 2-character genotype field)
>> In addition: Warning message:
>> closing unused connection 12 (filename)
>>
>> Is there a keyword for alleles rather than genotypes? I tried
>> substituting the word 'allele' but didn't get anywhere. I suspect I'm
>> not understanding something in the Details section of the documentation.
>>
>> Thanks,
>> Liz
>>
>



More information about the Bioconductor mailing list