Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754574AbZDMXxh (ORCPT ); Mon, 13 Apr 2009 19:53:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752645AbZDMXxY (ORCPT ); Mon, 13 Apr 2009 19:53:24 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:60898 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752342AbZDMXxW (ORCPT ); Mon, 13 Apr 2009 19:53:22 -0400 Date: Tue, 14 Apr 2009 01:52:55 +0200 From: Ingo Molnar To: Linus Torvalds Cc: Andrew Morton , Stephen Hemminger , paulmck@linux.vnet.ibm.com, davem@davemloft.net, paulus@samba.org, laijs@cn.fujitsu.com, jeff.chua.linux@gmail.com, dada1@cosmosbay.com, jengelh@medozas.de, kaber@trash.net, r000n@r000n.net, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, benh@kernel.crashing.org Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU Message-ID: <20090413235255.GJ817@elte.hu> References: <20090411174801.GG6822@linux.vnet.ibm.com> <18913.53699.544083.320542@cargo.ozlabs.ibm.com> <20090412173108.GO6822@linux.vnet.ibm.com> <20090412.181330.23529546.davem@davemloft.net> <20090413040413.GQ6822@linux.vnet.ibm.com> <20090413095309.631cf395@nehalam> <20090413152437.c48723f6.akpm@linux-foundation.org> <20090413162000.5f8d9a05@nehalam> <20090413162620.01353461.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2227 Lines: 61 * Linus Torvalds wrote: > > > On Mon, 13 Apr 2009, Andrew Morton wrote: > > > > > > > > > > - rcu_read_lock_bh(); > > > > > - private = rcu_dereference(table->private); > > > > > - table_base = rcu_dereference(private->entries[smp_processor_id()]); > > > > > + local_bh_disable(); > > > > > + spin_lock(&__get_cpu_var(ip_tables_lock)); > > > > > > > > spin_lock_bh()? > > > > > > No. get_cpu_var implies smp_processor_id which is not safe > > > without preempt_disable (ie bh disable). > > > > spin_lock_bh() will dtrt, but spelling it out seems a good idea. > > No, spin_lock_bh() will _not_ do the right thing. > > On UP it will actually work for two reasons: it will work because (a) it's > UP, so there are no issues with smp_processor_id() to beging with, but > also because even if there _were_ issues, it would still work because it > would all expand as a macro, and the preempt_disable() will actually > happen before the argument is evaluated. > > But on SMP, spin_lock_bh() expands to just _spin_lock_bh(), and is > a real function - and the argument will be evaluated before the > call (obviously), and thus before the preempt_disable(). > > So > > local_bh_disable(); > spin_lock(&__get_cpu_var(ip_tables_lock)); > > is correct, and > > spin_lock_bh(&__get_cpu_var(ip_tables_lock)); > > is _not_ correct. The latter will do > "&__get_cpu_var(ip_tables_lock)" with no protection from the > process being switched to another CPU. One option would be to make it more robust for such use. The downside would be all the other cases where the expression is really constant (but still takes a few instructions to calculate) and could be (and should be) evaluated outside of that critical section. So i'd tend to leave it in its current (optimistic) form: we've got 1283 uses of spin_lock_bh(), and just a quick look at git grep suggests that the current optimistic optimization matters in practice. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/