Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759988Ab2BJTf7 (ORCPT ); Fri, 10 Feb 2012 14:35:59 -0500 Received: from mail-qy0-f180.google.com ([209.85.216.180]:34943 "EHLO mail-qy0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755209Ab2BJTf5 (ORCPT ); Fri, 10 Feb 2012 14:35:57 -0500 X-Greylist: delayed 498 seconds by postgrey-1.27 at vger.kernel.org; Fri, 10 Feb 2012 14:35:57 EST Message-ID: <4F356FA6.4000104@redhat.com> Date: Fri, 10 Feb 2012 11:27:34 -0800 From: Richard Henderson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20120131 Thunderbird/10.0 MIME-Version: 1.0 To: Linus Torvalds CC: Andrew MacLeod , paulmck@linux.vnet.ibm.com, Torvald Riegel , Jan Kara , LKML , linux-ia64@vger.kernel.org, dsterba@suse.cz, ptesarik@suse.cz, rguenther@suse.de, gcc@gcc.gnu.org Subject: Re: Memory corruption due to word sharing References: <20120201151918.GC16714@quack.suse.cz> <1328118174.15992.6206.camel@triegel.csb> <1328128874.15992.6430.camel@triegel.csb> <20120201224554.GK2382@linux.vnet.ibm.com> <20120202184209.GD2518@linux.vnet.ibm.com> <20120202193747.GG2518@linux.vnet.ibm.com> <4F2C0D8A.70103@redhat.com> <4F2C329B.2080107@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2830 Lines: 72 On 02/03/2012 12:00 PM, Linus Torvalds wrote: > do { > load-link %r,%m > if (r == value) > return 0; > add > } while (store-conditional %r,%m) > return 1; > > and it is used to implement two *very* common (and critical) > reference-counting use cases: > > - decrement ref-count locklessly, and if it hits free, take a lock > atomically (ie "it would be a bug for anybody to ever see it decrement > to zero while holding the lock") > > - increment ref-count locklessly, but if we race with the final free, > don't touch it, because it's gone (ie "if somebody decremented it to > zero while holding the lock, we absolutely cannot increment it again") > > They may sound odd, but those two operations are absolutely critical > for most lock-less refcount management. And taking locks for reference > counting is absolutely death to performance, and is often impossible > anyway (the lock may be a local lock that is *inside* the structure > that is being reference-counted, so if the refcount is zero, then you > cannot even look at the lock!) > > In the above example, the load-locked -> store-conditional would > obviously be a cmpxchg loop instead on architectures that do cmpxchg > instead of ll/sc. > > Of course, it you expose some intrinsic for the whole "ll/sc" model > (and you then turn it into cmpxchg on demand), we could literally > open-code it. We can't expose the ll/sc model directly, due to various target-specific hardware constraints (e.g. no other memory references, i.e. no spills). But the "weak" compare-exchange is sorta-kinda close. A "weak" compare-exchange is a ll/sc pair, without the loop for restart on sc failure. Thus a weak compare-exchange can fail for arbitrary reasons. You do, however, get back the current value in the memory, so you can loop back yourself and try again. So that loop above becomes the familiar expression as for CAS: r = *m; do { if (r == value) return; n = r + inc; } while (!atomic_compare_exchange(m, &r, n, /*weak=*/true, __ATOMIC_RELAXED, __ATOMIC_RELAXED)); which, unlike with the current __sync_val_compare_and_swap, does not result in a nested loop for the ll/sc architectures. Yes, the ll/sc architectures can technically do slightly better by writing this as inline asm, but the gain is significantly less than before. Given that the line must be in L1 cache already, the redundant load from M in the ll insn should be nearly negligible. Of course for architectures that implement cas, there is no difference between "weak" and "strong". r~ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/