[BioC] error in function reduce from girafe

Joern Toedling Joern.Toedling at curie.fr
Thu Jun 24 17:26:33 CEST 2010


Hi Andreia,

when "reducing" overlapping aligned reads, also the consensus sequence of the
merged reads is determined, using the function consensusString from package
Biostrings. And for some reason this step does not work for you.

I cannot be completely sure but I think there may be an issue with the way
that you generate the AlignedGenomeIntervals object. The recommended way is
to use package ShortRead first to generate an AlignedRead object and then
coercing that object into an AlignedGenomeIntervals object. However, creating
it manually is possible but more prone to such problems.
However, thank you very much for trying it and reporting the results, maybe I
need to implement an additional check of the object to provide users in this
case with a more meaningful error message. 

I think the problem here is that your specified read sequence always seems to
be one shorter than the specified interval lengths. 
For example, in the first row the sequence 
TTTCTAATGAGCCCAGGGAGGGCTAGA
has 27 letters but the interval goes from position 43076512 to 43076539, which
is 28 bases. So maybe your alignment coordinates are in half-open interval
notation, in which case the interval actually ends at position 43076538. Then
the argument 'end' should be:
end=data$V6-1L
Also note that the coordinates in girafe are 1-based, so you may need to
adjust the alignment coordinates accordingly.

It could also be an issue with factors in your data.frame 'data' (which btw.
is not a very good name since there also is a basic R function of that name),
especially the strand could be a candidate here. When you read in data using
read.table it's generally a good idea to use the argument
stringsAsFactors=FALSE to make sure there are no factor-related
errors later on unless you specifically want to make use of factors in your
data.frame.

Please let me know if these suggested changes resolve the problem. Otherwise,
I would be grateful if you can send me a small excerpt of the file that you
read in with read.table off-list and I can have a closer look at it.

Cheers,
Joern


On Thu, 24 Jun 2010 15:18:10 +0100, Andreia Fonseca wrote 
> Dear Joern, 
> 
> I decided to follow your suggestion, and I am trying to use girafe. As the
alignment file that I have as one line for each hit, I have prepared another
file which has an extra column which is the number of matches and only has one
row for each sequence. 
>  V1 V2 V3 V4 V5        V6        V7  V8 V9 
> 1 TTTCTAATGAGCCCAGGGAGGGCTAGA   27  +  5  43076512  43076539 100 27   1 
> 2          ATAACTGTAGAGGCAAGC  18  -  8 141533705 141533723 100 18   1 
> 3        CTGAAGGGTGGATAAATTGG   20  - 22  40929788  40929808 100 20   1 
> 4             ACTGATTGGGCTAGG   15  - 16  28567043  28567058 100 15   1 
> 5     GCGCGTCGCCATGGAGCCCGACG   23  - 19  36605745  36605768 100 23   1 
> 6        TGTTCTGGAACGGGCCGAGC   20  + 15  83680515  83680535 100 20   1 
> 
> I have created an object called data using read.table and then converted it
into an AlignedGenomeIntervals object using the following command: 
> 
> A<-AlignedGenomeIntervals(start=data$V5, end =data$V6, chromosome=data$V4,
strand=data$V3, sequence=as.character(data$V1), matches=data$V9) 
> 
>  organism(A)<-"Hs" 
> 
> and then I called the reduce 
> 
>  B<-reduce(A) 
> 
> and then I am getting the following error: 
> 
> Error in .local(x, ...) : 
>   invalid consensus matrix 'x' (some columns do not sum to 1). 
>   Please make sure 'x' was obtained by a call to consensusMatrix(....,
as.prob=TRUE) 
> 
> can you explain me what this error means? 
> 
> with kind regards, 
> Andreia 
> 

--- 
 Joern Toedling 
 Institut Curie -- U900 
 26 rue d'Ulm, 75005 Paris, FRANCE 
 Tel. +33 (0)156246927



More information about the Bioconductor mailing list