Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757649AbZDUQoh (ORCPT ); Tue, 21 Apr 2009 12:44:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757839AbZDUQn7 (ORCPT ); Tue, 21 Apr 2009 12:43:59 -0400 Received: from mail.vyatta.com ([76.74.103.46]:58976 "EHLO mail.vyatta.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757825AbZDUQn5 (ORCPT ); Tue, 21 Apr 2009 12:43:57 -0400 Date: Tue, 21 Apr 2009 09:43:50 -0700 From: Stephen Hemminger To: Linus Torvalds Cc: Paul Mackerras , paulmck@linux.vnet.ibm.com, Eric Dumazet , Evgeniy Polyakov , David Miller , kaber@trash.net, jeff.chua.linux@gmail.com, mingo@elte.hu, laijs@cn.fujitsu.com, jengelh@medozas.de, r000n@r000n.net, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, benh@kernel.crashing.org, mathieu.desnoyers@polymtl.ca Subject: Re: [PATCH] netfilter: use per-cpu recursive lock (v11) Message-ID: <20090421094350.1e00207a@nehalam> In-Reply-To: References: <49E72E83.50702@trash.net> <20090416.153354.170676392.davem@davemloft.net> <20090416234955.GL6924@linux.vnet.ibm.com> <20090417012812.GA25534@linux.vnet.ibm.com> <20090418094001.GA2369@ioremap.net> <20090418141455.GA7082@linux.vnet.ibm.com> <20090420103414.1b4c490f@nehalam> <49ECBE0A.7010303@cosmosbay.com> <18924.59347.375292.102385@cargo.ozlabs.ibm.com> <20090420215827.GK6822@linux.vnet.ibm.com> <18924.64032.103954.171918@cargo.ozlabs.ibm.com> <20090420160121.268a8226@nehalam> Organization: Vyatta X-Mailer: Claws Mail 3.6.1 (GTK+ 2.16.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5490 Lines: 127 On Tue, 21 Apr 2009 09:13:52 -0700 (PDT) Linus Torvalds wrote: > > Ok, so others already pointed out how dangerous/buggy this looks, but I'd > like to strengthen that a bit more: > > On Mon, 20 Apr 2009, Stephen Hemminger wrote: > > + > > +/** > > + * xt_table_info_rdlock_bh - recursive read lock for xt table info > > + * > > + * Table processing calls this to hold off any changes to table > > + * (on current CPU). Always leaves with bottom half disabled. > > + * If called recursively, then assumes bh/preempt already disabled. > > + */ > > +void xt_info_rdlock_bh(void) > > +{ > > + struct xt_info_lock *lock; > > + > > + preempt_disable(); > > + lock = &__get_cpu_var(xt_info_locks); > > + if (likely(++lock->depth == 0)) > > + spin_lock_bh(&lock->lock); > > + preempt_enable_no_resched(); > > +} > > +EXPORT_SYMBOL_GPL(xt_info_rdlock_bh); > > This function is FUCKED UP. > > It's total crap for several reasons" > > - the already-mentioned race with bottom half locking. > > If bottom halfs aren't already disabled, then if a bottom half comes in > after the "++lock->depth" and before the spin_lock_bh(), then you will > have no locking AT ALL for the processing of that bottom half - it will > just increment the lock depth again, and nobody will have locked > anything at all. > > And if for some reason, you can prove that bottom half processing is > already disabled, then ALL THAT OTHER CRAP is just that - CRAP. The > whole preemption disabling, the whole "_bh()" thing, everything. > > So either it's horribly buggy, or it's horribly broken and pointless. > Take your pick. > > - the total lack of comments. Why does that "count" protect anything? > It's not a recursive lock, since there is no ownership (two > independent accessors could call this and both "get" the lock), so you > had damn well better create some big-ass comments about why it's ok in > this case, and what the rules are that make it ok. > > So DON'T GO AROUND CALLING IT A RECURSIVE LOCK! Don't write comments > that are TOTAL AND UTTER SH*T! Just DON'T! > > It's a "reader lock". It's not "recursive". It never was recursive, it > never will be, and calling it that is just a sign that whoever wrote > the function is a moron and doesn't know what he is doing. STOP DOING THIS! > > - that _idiotic_ "preempt_enable_no_resched()". F*ck me, but the comment > already says that preemption is disabled when exiting, so why does it > even bother to enable it? Why play those mindless games with preemption > counts, knowing that they are bogus? > > Do it readably. Disable preemption first, and just re-enable it at > UNLOCK time. No odd pseudo-reenables anywhere. > > Oh, I know very well that the spli-locking will disable preemption, so > it's "correct" to play those games, but the point is, it's just damn > stupid and annoying. It just makes the source code actively _misleading_ > to see crap like that - it looks like you enabled preemption, when in fact > the code very much on purpose does _not_ enable preemption at all. > > In other words, I really REALLY hate that patch. I think it looks slightly > better than Eric Dumazet's original patch (at least the locking subtlety > is now in a function of its own and _could_ be commented upon sanely and > if it wasn't broken it might be acceptable), but quite frankly, I'd still > horribly disgusted with the crap. > > Why are you doing this insane thing? If you want a read-lock, just use the > damned read-write locks already! Ad far as I can tell, this lock is in > _no_ way better than just using those counting reader-writer locks, except > it is open-coded and looks buggy. > > There is basically _never_ a good reason to re-implement locking > primitives: you'll just introduce bugs. As proven very ably by the amount > of crap above in what is supposed to be a very simple function. > > If you want a counting read-lock (in _order_ to allow recursion, but not > because the lock itself is somehow recursive!), then that function should > look like this: > > void xt_info_rdlock_bh(void) > { > struct xt_info_lock *lock > > local_bh_disable(); > lock = &__get_cpu_var(xt_info_locks); > read_lock(&lock->lock); > } > > And then the "unlock" should be the reverse. No games, no crap, and > hopefully then no bugs. And if you do it that way, you don't even need the > comments, although quite frankly, it probably makes a lot of sense to talk > about the interaction between "local_bh_disable()" and the preempt count, > and why you're not using "read_lock_bh()". > > And if you don't want a read-lock, then fine - don't use a read-lock, do > something else. But then don't just re-implement it (badly) either and > call it something else! > > Linus > > PS: Ingo, why do the *_bh() functions in kernel/spinlock.c do _both_ a > "local_bh_disable()" and a "preempt_disable()"? BH disable should disable > preemption too, no? Or am I confused? In which case we need that in > the above rdlock_bh too. Ah a nice day, with Linus giving constructive feedback. Too bad he has to channel it out of the dark side. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/