Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965039AbWJBQkx (ORCPT ); Mon, 2 Oct 2006 12:40:53 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965063AbWJBQkx (ORCPT ); Mon, 2 Oct 2006 12:40:53 -0400 Received: from smtp.osdl.org ([65.172.181.4]:47810 "EHLO smtp.osdl.org") by vger.kernel.org with ESMTP id S965039AbWJBQkw (ORCPT ); Mon, 2 Oct 2006 12:40:52 -0400 Date: Mon, 2 Oct 2006 09:40:42 -0700 (PDT) From: Linus Torvalds To: "Martin J. Bligh" cc: Lee Revell , Matti Aarnio , linux-kernel Subject: Re: Spam, bogofilter, etc In-Reply-To: <45212F39.5000307@mbligh.org> Message-ID: References: <1159539793.7086.91.camel@mindpipe> <20061002100302.GS16047@mea-ext.zmailer.org> <1159802486.4067.140.camel@mindpipe> <45212F39.5000307@mbligh.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1611 Lines: 37 On Mon, 2 Oct 2006, Martin J. Bligh wrote: > > If you got rid of "slut" and "schoolgirl" that'd get rid of half of it. The problem with bogo-filter is that THE WHOLE CONCEPT IS FLAWED. I'm sorry, but spam-filtering is simply harder than the bayesian word-count weenies think it is. I even used to _know_ something about bayesian filtering, since it was one of the projects I worked on at uni, and dammit, it's not a good approach, as shown by the fact that it's trivial to get around. I don't know why people got so excited about the whole bayesian thing. It's fine as _one_ small clause in a bigger framework of deciding spam, but it's totally inappropriate for a "yes/no" kind of decision on its own. If you want a yes/no kind of thing, do it on real hard issues, like not accepting email from machines that aren't registered MX gateways. Sure, that will mean that people who just set up their local sendmail thing and connect directly to port 25 will just not be able to email, but let's face it, that's why we have ISP's and DNS in the first place. But don't do it purely on some bogus word analysis. If you want to do word analysis, use it like SpamAssassin does it - with some Bayesian rule perhaps adding a few points to the score. That's entirely appropriate. But running bogo-filter _instead_ of spamassassin is just asinine. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/