Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753865AbZDMXUd (ORCPT ); Mon, 13 Apr 2009 19:20:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752427AbZDMXUU (ORCPT ); Mon, 13 Apr 2009 19:20:20 -0400 Received: from mail.vyatta.com ([76.74.103.46]:39538 "EHLO mail.vyatta.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751482AbZDMXUT (ORCPT ); Mon, 13 Apr 2009 19:20:19 -0400 Date: Mon, 13 Apr 2009 16:20:00 -0700 From: Stephen Hemminger To: Andrew Morton Cc: paulmck@linux.vnet.ibm.com, davem@davemloft.net, paulus@samba.org, mingo@elte.hu, torvalds@linux-foundation.org, laijs@cn.fujitsu.com, jeff.chua.linux@gmail.com, dada1@cosmosbay.com, jengelh@medozas.de, kaber@trash.net, r000n@r000n.net, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, benh@kernel.crashing.org Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU Message-ID: <20090413162000.5f8d9a05@nehalam> In-Reply-To: <20090413152437.c48723f6.akpm@linux-foundation.org> References: <20090411174801.GG6822@linux.vnet.ibm.com> <18913.53699.544083.320542@cargo.ozlabs.ibm.com> <20090412173108.GO6822@linux.vnet.ibm.com> <20090412.181330.23529546.davem@davemloft.net> <20090413040413.GQ6822@linux.vnet.ibm.com> <20090413095309.631cf395@nehalam> <20090413152437.c48723f6.akpm@linux-foundation.org> Organization: Vyatta X-Mailer: Claws Mail 3.6.1 (GTK+ 2.16.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3823 Lines: 112 On Mon, 13 Apr 2009 15:24:37 -0700 Andrew Morton wrote: > On Mon, 13 Apr 2009 09:53:09 -0700 > Stephen Hemminger wrote: > > > This is an alternative version of ip/ip6/arp tables locking using > > per-cpu locks. This avoids the overhead of synchronize_net() during > > update but still removes the expensive rwlock in earlier versions. > > > > The idea for this came from an earlier version done by Eric Duzamet. > > Locking is done per-cpu, the fast path locks on the current cpu > > and updates counters. The slow case involves acquiring the locks on > > all cpu's. > > > > The mutex that was added for 2.6.30 in xt_table is unnecessary since > > there already is a mutex for xt[af].mutex that is held. > > > > Tested basic functionality (add/remove/list), but don't have test cases > > for stress, ip6tables or arptables. > > > > unsigned int > > ipt_do_table(struct sk_buff *skb, > > @@ -339,9 +341,10 @@ ipt_do_table(struct sk_buff *skb, > > > > IP_NF_ASSERT(table->valid_hooks & (1 << hook)); > > > > - rcu_read_lock_bh(); > > - private = rcu_dereference(table->private); > > - table_base = rcu_dereference(private->entries[smp_processor_id()]); > > + local_bh_disable(); > > + spin_lock(&__get_cpu_var(ip_tables_lock)); > > spin_lock_bh()? No. get_cpu_var implies smp_processor_id which is not safe without preempt_disable (ie bh disable). > > > + private = table->private; > > + table_base = private->entries[smp_processor_id()]; > > > > e = get_entry(table_base, private->hook_entry[hook]); > > > > @@ -436,8 +439,8 @@ ipt_do_table(struct sk_buff *skb, > > e = (void *)e + e->next_offset; > > } > > } while (!hotdrop); > > - > > - rcu_read_unlock_bh(); > > + spin_unlock(&__get_cpu_var(ip_tables_lock)); > > + local_bh_enable(); > > > > #ifdef DEBUG_ALLOW_ALL > > return NF_ACCEPT; > > @@ -891,86 +894,34 @@ get_counters(const struct xt_table_info > > struct xt_counters counters[]) > > { > > unsigned int cpu; > > - unsigned int i; > > - unsigned int curcpu; > > + unsigned int i = 0; > > + unsigned int curcpu = raw_smp_processor_id(); > > > > /* Instead of clearing (by a previous call to memset()) > > * the counters and using adds, we set the counters > > * with data used by 'current' CPU > > * We dont care about preemption here. > > */ > > - curcpu = raw_smp_processor_id(); > > - > > - i = 0; > > + spin_lock_bh(&per_cpu(ip_tables_lock, curcpu)); > > IPT_ENTRY_ITERATE(t->entries[curcpu], > > t->size, > > set_entry_to_counter, > > counters, > > &i); > > + spin_unlock_bh(&per_cpu(ip_tables_lock, curcpu)); > > > > for_each_possible_cpu(cpu) { > > if (cpu == curcpu) > > continue; > > + > > i = 0; > > + spin_lock_bh(&per_cpu(ip_tables_lock, cpu)); > > IPT_ENTRY_ITERATE(t->entries[cpu], > > t->size, > > add_entry_to_counter, > > counters, > > &i); > > - } > > This would be racy against cpu hotplug if this code was hotplug-aware. > > And it should be hotplug aware, really. num_possible_cpus() can exceed > num_online_cpus(). The extent by which possible>online is > controversial, but one can conceive of situations where it is "lots". It is doing right thing already with hotplug. This code still needs to count packets processed by previously online cpu, that is no longer there. > Is lib/percpu_counter.c no good for this application? Unfixably no > good? That code automagically handles cpu hotplug. percpu_counter can't deal with the layout/load here. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/