Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758070AbZDUTLH (ORCPT ); Tue, 21 Apr 2009 15:11:07 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758174AbZDUTKt (ORCPT ); Tue, 21 Apr 2009 15:10:49 -0400 Received: from mx2.mail.elte.hu ([157.181.151.9]:37948 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757306AbZDUTKr (ORCPT ); Tue, 21 Apr 2009 15:10:47 -0400 Date: Tue, 21 Apr 2009 21:10:07 +0200 From: Ingo Molnar To: Stephen Hemminger , Peter Zijlstra Cc: Linus Torvalds , Paul Mackerras , paulmck@linux.vnet.ibm.com, Eric Dumazet , Evgeniy Polyakov , David Miller , kaber@trash.net, jeff.chua.linux@gmail.com, laijs@cn.fujitsu.com, jengelh@medozas.de, r000n@r000n.net, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, netdev@vger.kernel.org, benh@kernel.crashing.org, mathieu.desnoyers@polymtl.ca Subject: Re: [PATCH] netfilter: use per-cpu recursive lock (v11) Message-ID: <20090421191007.GA15485@elte.hu> References: <20090418094001.GA2369@ioremap.net> <20090418141455.GA7082@linux.vnet.ibm.com> <20090420103414.1b4c490f@nehalam> <49ECBE0A.7010303@cosmosbay.com> <18924.59347.375292.102385@cargo.ozlabs.ibm.com> <20090420215827.GK6822@linux.vnet.ibm.com> <18924.64032.103954.171918@cargo.ozlabs.ibm.com> <20090420160121.268a8226@nehalam> <20090421111541.228e977a@nehalam> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090421111541.228e977a@nehalam> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2174 Lines: 91 * Stephen Hemminger wrote: > +void xt_info_wrlock_bh(void) > +{ > + unsigned int i; > + > + local_bh_disable(); > + for_each_possible_cpu(i) { > + write_lock(&per_cpu(xt_info_locks, i)); > +#if NR_CPUS > (PREEMPT_MASK - 1) > + /* > + * Since spin_lock disables preempt, the following is > + * required to avoid overflowing the preempt counter > + */ > + preempt_enable_no_resched(); > +#endif > + } > +} hm, this is rather ugly and it will make a lot of instrumentation code explode. Why not use the obvious solution: a _single_ wrlock for global access and read_can_lock() plus per cpu locks in the fastpath? That way there's no global cacheline bouncing (just the _reading_ of a global cacheline - which will be nicely localized - on NUMA too) - and we will hold at most 1-2 locks at once! Something like: __cacheline_aligned DEFINE_RWLOCK(global_wrlock); DEFINE_PER_CPU(rwlock_t local_lock); void local_read_lock(void) { again: read_lock(&per_cpu(local_lock, this_cpu)); if (unlikely(!read_can_lock(&global_wrlock))) { read_unlock(&per_cpu(local_lock, this_cpu)); /* * Just wait for any global write activity: */ read_unlock_wait(&global_wrlock); goto again; } } void global_write_lock(void) { write_lock(&global_wrlock); for_each_possible_cpu(i) write_unlock_wait(&per_cpu(local_lock, i)); } Note how nesting friendly this construct is: we dont actually _hold_ NR_CPUS locks all at once, we simply cycle through all CPUs and make sure they have our attention. No preempt overflow. No lockdep explosion. A very fast and scalable read path. Okay - we need to implement read_unlock_wait() and write_unlock_wait() which is similar to spin_unlock_wait(). The trivial first-approximation is: read_unlock_wait(x) { read_lock(x); read_unlock(x); } write_unlock_wait(x) { write_lock(x); write_unlock(x); } Hm? Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/