Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932995AbZAPWKo (ORCPT ); Fri, 16 Jan 2009 17:10:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1764402AbZAPWJr (ORCPT ); Fri, 16 Jan 2009 17:09:47 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:42542 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762480AbZAPWJp (ORCPT ); Fri, 16 Jan 2009 17:09:45 -0500 Date: Fri, 16 Jan 2009 23:08:32 +0100 From: Ingo Molnar To: Rusty Russell Cc: Herbert Xu , akpm@linux-foundation.org, tj@kernel.org, hpa@zytor.com, brgerst@gmail.com, ebiederm@xmission.com, cl@linux-foundation.org, travis@sgi.com, linux-kernel@vger.kernel.org, steiner@sgi.com, hugh@veritas.com, "David S. Miller" , netdev@vger.kernel.org, Mathieu Desnoyers Subject: Re: [PATCH] percpu: add optimized generic percpu accessors Message-ID: <20090116220832.GB20653@elte.hu> References: <20090115183942.GA6325@elte.hu> <20090116001544.GA11073@elte.hu> <20090116001824.GA9221@gondor.apana.org.au> <200901170827.33729.rusty@rustcorp.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200901170827.33729.rusty@rustcorp.com.au> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3126 Lines: 71 * Rusty Russell wrote: > On Friday 16 January 2009 10:48:24 Herbert Xu wrote: > > On Fri, Jan 16, 2009 at 01:15:44AM +0100, Ingo Molnar wrote: > > > > > > > So if you could design the API such that we have a variant of add/inc > > > > that automatically disables/enables preemption then we can optimise that > > > > away on x86. > > > > > > Yeah. percpu_add(var, 1) does exactly that on x86. > > . No it doesn't. What do you mean by "No it doesn't". It does exactly what i claimed it does. > It's really nice that everyone's excited about this, but it's more > complex than this. Unf. I'm too busy preparing for linux.conf.au to > explain it all properly right now, but here's the highlights: > > 1) This only works on static per-cpu vars. > - We are working on fixing this, but it's non-trivial for large allocs like > those in networking. Small allocs, we have patches for. How do difficulties of dynamic percpu-alloc make my above suggestion unsuitable for SNMP stats in networking? Most of those stats are not dynamically allocated - they are plain straightforward percpu variables. Plus the majority of percpu usage is static - just like the majority of local variables is static, not dynamic. So any percpu-alloc complication is a non-issue. > 2) The generic versions of these as posted by Tejun are unsuitable for > networking; they need to bh_disable. That would make networking less > efficient than it is now for non-x86, and to be generic it would have > to be local_irq_save/restore anyway. The generic versions will not be used on 95%+ of the active Linux systems out there, as they run on x86. If you worry about the remaining 5%, those can be optimized too. > 3) local_t was designed to do exactly this: a fast cpu-local counter > implemented optimally for each arch. For sparc64, doing a trivalue version > seems optimal, for s390 atomics, for x86 single-insn, for powerpc > irq_save/restore, etc. But local_t does not actually solve this problem at all - because one still has to have per-cpu-ness. > 4) Unfortunately, local_t has been extended beyond a simple counter, meaning > it now has more complex requirements (eg. Mathieu wants nmi-safe, even > though that's impossible on sparc and parisc, and percpu_counter wants > local_add_return, which makes trival less desirable). These discussions > are on the back burner at the moment, but ongoing. In reality local_t has almost zero users in the kernel - despite being with us at least since v2.6.12. That pretty much tells us all about its utility. The thing is, local_t without proper percpu integration is a toothless tiger in the jungle. And our APIS do exactly that kind of integration and i expect them to be more popular than local_t. There's already a dozen usage sites of it in arch/x86. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/