Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756461AbXIFLPc (ORCPT ); Thu, 6 Sep 2007 07:15:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755198AbXIFLPY (ORCPT ); Thu, 6 Sep 2007 07:15:24 -0400 Received: from ra.tuxdriver.com ([70.61.120.52]:2351 "EHLO ra.tuxdriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755161AbXIFLPX (ORCPT ); Thu, 6 Sep 2007 07:15:23 -0400 Date: Thu, 6 Sep 2007 07:08:36 -0400 From: Neil Horman To: Patrick McHardy Cc: Rusty Russell , adam@yggdrasil.com, jcm@jonmasters.org, netfilter-devel@lists.netfilter.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/2] Fix (improve) deadlock condition on module removal netfilter socket option removal Message-ID: <20070906110836.GA31868@hmsreliant.think-freely.org> References: <20070904202433.GA19083@hmsreliant.think-freely.org> <46DEC9BF.9010807@trash.net> <1189008806.10802.150.camel@localhost.localdomain> <20070905170831.GA25050@hmsreliant.think-freely.org> <46DFD790.6040908@trash.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <46DFD790.6040908@trash.net> User-Agent: Mutt/1.5.12-2006-07-14 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2515 Lines: 56 On Thu, Sep 06, 2007 at 12:33:52PM +0200, Patrick McHardy wrote: > Neil Horman wrote: > > On Thu, Sep 06, 2007 at 02:13:26AM +1000, Rusty Russell wrote: > > > >>On Wed, 2007-09-05 at 17:22 +0200, Patrick McHardy wrote: > >> > >>>But I'm wondering, wouldn't module refcounting alone fix this problem? > >>>If we make nf_sockopt() call try_module_get(ops->owner), remove_module() > >>>on ip_tables.ko would simply fail because the refcount is above zero > >>>(so it would fail at point 3 above). Am I missing something important? > >> > >>Yes, that seems the correct solution to me, too. ISTR that this code > >>predates the current module code. > >> > >>Rusty. > > > > > > Thanks guys- > > When I first started looking at this problem I would have agreed with > > you, that module reference counting alone would fix the problem. However, > > delete_module can work in either a non-blocking or a blocking mode. rmmod > > passes O_NONBLOCK to delete module, and so is fine, but modprobe does not. So > > if you currently use modprobe -r to remove modules (as the iptables service > > script nominally does), modprobe winds up waiting in the kernel for the module > > reference count to become zero. Since we can hold a reference to the module > > being removed in the same path that forks a modprobe request to load that same > > module (which then blocks on the first modprobes fcntl lock), we still get > > deadlock. The way I fixed this was by use of the second patch, which brings > > modprobes behavior into line with the rmmod utility (which is to default to > > non-blocking operation), leading to the remove_module failure and breaking of > > the deadlock that you describe above. > > > Thanks for the explanation, I've applied your patch. Thanks Patrick! Neil > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- /*************************************************** *Neil Horman *Software Engineer *Red Hat, Inc. *nhorman@tuxdriver.com *gpg keyid: 1024D / 0x92A74FA1 *http://pgp.mit.edu ***************************************************/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/