[R] A general Mathematicasl pointer if you could? Thank you.

David H. david at frosted.net
Tue Nov 2 08:17:14 CET 2004


Greetings.

A friend of mine and I are thinking about implementing the SIQ Protocol,
describe as RFC here:
http://www.milter.info/milter-siq/draft-irtf-asrg-iar-howe-siq-00.txt

as an Apache Module. This would implement the HTTP "connector§ described
in the RFC. While we do not think that this is too complicated, we are
concerned about the whole process that is attached to building the
necessary data.

I would like to know your opinion on SIQ per se and my idea.

Our goal is to minimise User interaction as best as we can, we would
also only train on Error.

An Error is defined as a host that sends/sent Spam. This would put the
burden of clearly identifying a message as SPAM on the submitters
shoulder. The Submitter would send in his corpus of SPAM which is then
processed into the system and a score for the "mail senders" in question
is built.

This built score should decay over a given time, while the score itself
and the amount of decay, as well as the total value of the score should
be determined from the behaviour of the "sending host". Which means a
host that is often reported to send Spam will naturally have a higher
score and the score will reset slowly. A host which sends little Spam or
has a high burst of Spam due to a (fixed) misconfiguration will show a
lower score and the score will reset quickly.

I am no mathematician. This is where I need your help. Could you point
me to a newsgroup (preferred) or Mailing-List that could tell me which
discipline in Math is suited for this? I heard that Survival Analysis
and "Time Series" might be suited to fit my "problem".

Just to clarify once more. The IP address of the "sending" host as well as the "domain" that it tries to identify
itself by are scored. I have to find a way how to do this fairly. Since we expect a huge influx of data, this has to be automated as best as possible. 

This of course should all result in a public service that will be made
available freely. Thank you for listening to my stammering.

-d




More information about the R-help mailing list