Message-ID: <40101B1E.3030908@blue-labs.org>
Date: Thu, 22 Jan 2004 13:49:02 -0500
From: David Ford <david+challenge-response@blue-labs.org>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7a) Gecko/20040121
MIME-Version: 1.0
To: David Lang <david.lang@digitalinsight.com>
CC: Jes Sorensen <jes@wildopensource.com>, Zan Lynx <zlynx@acm.org>,
       Andreas Jellinghaus <aj@dungeon.inka.de>,
       Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [OT] Confirmation Spam Blocking was: List 'linux-dvb' closed
 to public posts
References: <ecartis-01212004203954.14209.1@mail.convergence2.de> <20040121194315.GE9327@redhat.com> <Pine.LNX.4.58.0401211155300.2123@home.osdl.org> <1074717499.18964.9.camel@localhost.localdomain> <20040121211550.GK9327@redhat.com> <20040121213027.GN23765@srv-lnx2600.matchmail.com> <pan.2004.01.21.23.40.00.181984@dungeon.inka.de> <1074731162.25704.10.camel@localhost.localdomain> <yq0hdyo15gt.fsf@wildopensource.com> <401000C1.9010901@blue-labs.org> <Pine.LNX.4.58.0401221034090.4548@dlang.diginsite.com>
In-Reply-To: <Pine.LNX.4.58.0401221034090.4548@dlang.diginsite.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1731
Lines: 45

I've been amusing myself once or twice a week by studying some of these 
emails.  Due to the use of common words just like your email below, 
bayesian score is far too low (granting it a negative point value in SA).

The problem is that properly trained is too fluid.  It'd be far more 
achievable if I only talked geek..  Or if I only talked automotive.  Or 
that I only talked medical.  However, my "vocabulary" is far to varied 
to train a bayesian filter that the use of medical terms, computer 
terms, or a given topic, is taboo.

It cuts the gray area far to close to the middle of the road and thus 
makes marking the email as probable spam useless.  All I'm doing now is 
wasting CPU because in the end I'm doing the job of dealing with the 
spam myself.

Yes, I did see this.  I'm not so spiteful and actively pay attention to 
my queue when having this type of correspondence.

David

David Lang wrote:

>On Thu, 22 Jan 2004, David Ford wrote:
>  
>
>>Considering that Bayesian filters are useless against the new spam that
>>is proliferating these days, that's laughable.  Spam now comes with a
>>good 5-10K of random dictionary words.
>>    
>>
>so we need to extend the Bayesian filters to deal with multi-word combos,
>how many legit mail has those dictionary words in them? properly traind
>their presence should help identify the spam.
>
>not that you will ever see this (other then through the list) as I won't
>respond to your confirmation message.
>
>David Lang
>  
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/