[R] Error Correcting Codes, Simplex

Richard Graham rickhg12hs at gmail.com
Tue Oct 17 08:26:50 CEST 2006


On 10/16/06, Björn Egert <begert at ipb-halle.de> wrote:
> On 10/8/06, Egert, Bjoern <begert at ipb-halle.de> wrote:
> >> Hello,
> >>
> >>   Is there a way in R to construct an (error correcting) binary code
> >>   e.g. for an source alphabet containing integers from 1 to say 255
> >>   with the property that each pair of distinct codewords of length m
> >>   is at Hamming distance exactly m/2 ?
> >>
> >>   I was suggested to use so called simplex codes, which should be
> >>   fairly standard, but I haven't found a direct way via R packages
> >>   to do so, that's why I ask whether there might be in indirect way
> >>   to solve this problem.
> >>
> >>   Example:
> >>   v1  =c(1,2,3,4)
> >>   v2  =c(1,2,5,6)
> >>   similarity(v1,v2)=0.5, (because 2 out of 4 elements are equal).
> >>   Obviously, a binary representation of would yield a different
> >>   similarity of:
> >>   binary(v1) =001 010 011 100
> >>   binary(v1) =001 010 101 110
> >>   similarity(binary(v1),binary(v2))= 9/12
> >>
> >>   Remark: The focus here is not on error correction, but rather the
> >>   binary encoding retaining similarity of the elements of vectors.
> >>
> >> Many thanks,
> >> Bjoern
> >
> > Bjoern,
> >
> > NB:  I'm an R newbie and I only know a bit about error correcting codes.
> >
> > I haven't seen any responses to your questions and I don't know if you
> > still
> > have a need, but it is certainly possible to construct forward error
> > correction
> > codes with all the great math capability in R.
> >
> > It seems you want to generate code words that still have the original
> > bits
> > present.  These are systematic codes and there are lots of them available
> > to use.  Many codes are specified by the code word length (n), number
> > of original data
> > bits in each code word (k), and the minimum Hamming distance of the
> > code words (d)
> > as a [n,k,d] code. Simplex Codes have these parameters: [2^k - 1, k,
> > 2^(k - 1)].  These
> > codes could be generated as a simple matrix multiply in R, but are you
> > sure that's what
> > you want?  The code words will be quite long.
> >
> > Regards,
> > Richard Graham
>
>
> Hello,
>
>   thank you.
>
>   yes, basically, that's what I want.
>   Just a binary encoding of an arbitrary integer value (or vector of
> integers)
>   with the property that each pair of distinct integer values have an
> equal Hamming-
>   distance (m/2), so as to be able to a similarity search
>
>   I got the idea from: Gionis: "Efficient and Tunable Similar Set
> Retrieval" (Chap 3.2)
>
> regards
> Bjoern
>
>

Bjoern,

I read only the section of the paper you mention and I'll trust that
the stated properties of Simplex Codes are true.  I haven't researched
or verified it.

[from http://magma.maths.usyd.edu.au/]
"Magma is a large, well-supported software package designed to solve
computationally hard problems in algebra, number theory, geometry and
combinatorics. It provides a mathematically rigorous environment for
computing with algebraic, number-theoretic, combinatoric and geometric
objects."

I don't understand a fraction of its capability but I still find it to
be very useful.  In fact, they have an online calculator that will
give you the generator matrix you want.  The online Magma calculator
is at:

http://magma.maths.usyd.edu.au/calc/

To calculate the generator matrix I think you are asking for, go to
the above URL and cut/paste the following command:

ExtendCode(SimplexCode(8));

Click "Evaluate" and the output window will contain a "[256, 8, 128]
Linear Code over GF(2)".  You'll need to "massage" this a bit to use
it as a matrix for R.  I'd use Ruby to do this, but anything will do.
If you want to encode more/less than 8 bits, you can modify the above
argument to SimplexCode.

I used "ExtendCode" so that the codeword length == Dmin * 2

The Gionis claim I'll research or verify sometime is that _every_ pair
of Simplex Code words  of length m have Hamming distance == m/2.  If
you have a reference to a proof, I'd like to read it (like I said, I
only know a bit about ECC).

Good Luck with your work!
Richard Graham



More information about the R-help mailing list