Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751740AbdFITgU (ORCPT ); Fri, 9 Jun 2017 15:36:20 -0400 Received: from merlin.infradead.org ([205.233.59.134]:42264 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751572AbdFITgT (ORCPT ); Fri, 9 Jun 2017 15:36:19 -0400 Date: Fri, 9 Jun 2017 21:36:04 +0200 From: Peter Zijlstra To: Will Deacon Cc: Paul McKenney , Boqun Feng , linux-kernel@vger.kernel.org, Ingo Molnar , Thomas Gleixner Subject: Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Message-ID: <20170609193604.ncw3hhgvewzc3h5u@hirez.programming.kicks-ass.net> References: <20170609092450.jwmldgtli57ozxgq@hirez.programming.kicks-ass.net> <20170609154442.GB9236@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170609154442.GB9236@arm.com> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4902 Lines: 143 On Fri, Jun 09, 2017 at 04:44:42PM +0100, Will Deacon wrote: > > +++ b/Documentation/atomic_t.txt 2017-06-09 11:05:31.501599153 +0200 > > @@ -0,0 +1,147 @@ > > + > > +On atomic types (atomic_t atomic64_t and atomic_long_t). > > + > > +The atomic type provides an interface to the architecture's means of atomic > > +RmW operations between CPUs (it specifically does not order/work/etc. on > > +IO). > > We should be stronger here: atomics to IO could lead to kernel panics (i.e. > raise a fatal abort), whereas this sounds like they just lose some ordering > or atomicity guarantees. > > > +The 'full' API consists of: > > Need to mention the 64-bit and the long variants? Makes a bit of a mess of things I felt, maybe I should be a little more explicit and mention that everything applies to all 3 variants? > > +Non RmW ops: > > + > > +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically > > +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and > > +smp_store_release() respectively. > > + > > +The one detail to this is that atomic_set() should be observable to the RmW > > +ops. That is: > > + > > + CPU0 CPU1 > > + > > + val = atomic_read(&X) > > + do { > > + atomic_set(&X, 0) > > + new = val + 1; > > + } while (!atomic_try_cmpxchg(&X, &val, new)); > > + > > +Should cause the cmpxchg to *FAIL* (when @val != 0). This is typically true; > > +on 'normal' platforms; a regular competing STORE will invalidate a LL/SC. > > I see what you're getting at here, but the example is a bit weird because > CPU1 might hold the store to X in a local store-buffer and not shoot down > the cmpxchg immediately. I think you need something to show how the write > to X has at least partially propagated. Hurm.. would the example from metag atomic_set be better? > > +The obvious case where this is not so is where we need to implement atomic ops > > +with a spinlock hashtable; the typical solution is to then implement > > +atomic_set() with atomic_xchg(). > > Looking at sparc32, atomic_set takes the hashed lock, so I can't see what > goes wrong here: atomic_try_cmpxchg will get called with val !=0, but the > comparison will fail because the value in memory will be 0. What am I > missing? This is the reason their atomic_set() is special and needs to take the lock. > > + > > + > > +RmW ops: > > + > > +These come in various forms: > > + > > + - plain operations without return value: atomic_{}() > > Maybe just list the API here, instead of having the separate section > previously? That then leaves you with no place to put those comments. I wanted to slice the API in different ways for each subject without endless repetition. > > + - operations which return the modified value: atomic_{}_return() > > + > > + these are limited to the arithmetic operations because those are > > + reversible. Bitops are irreversible and therefore the modified value > > + is of dubious utility. > > + > > + - operations which return the original value: atomic_fetch_{}() > > + > > + - swap operations: xchg(), cmpxchg() and try_cmpxchg() > > + > > + - misc; the special purpose operations that are commonly used and would, > > + given the interface, normally be implemented using (try_)cmpxchg loops but > > + are time critical and can, (typically) on LL/SC architectures, be more > > + efficiently implemented. > > + > > + > > +All these operations are SMP atomic; that is, the operations (for a single > > +atomic variable) can be fully ordered and no intermediate state is lost or > > +visible. > > + > > + > > +Ordering: (go read memory-barriers.txt first) > > + > > +The rule of thumb: > > + > > + - non-RmW operations are unordered; > > + > > + - RmW operations that have no return value are unordered; > > + > > + - RmW operations that have a return value are Sequentially Consistent; > > I think it's stronger than that, because they also order non-RmW operations, > whereas this makes it sounds like there's just a total order over all RmW > operations. Right, what should I call it? > > + - RmW operations that are conditional are unordered on FAILURE, otherwise the > > + above rules apply. > > We should make it clear that "unordered" here refers to accesses to other > memory locations. > > > + > > +Except of course when an operation has an explicit ordering like: > > + > > + {}_relaxed: unordered > > + {}_acquire: the R of the RmW is an ACQUIRE > > + {}_release: the W of the RmW is a RELEASE > > + > > +NOTE: our ACQUIRE/RELEASE are RCpc > > The NOTE belongs in memory-barriers.txt It is of course; I'll remove that line. > > +The barriers: > > + > > + smp_mb__{before,after}_atomic() > > + > > +only apply to the RmW ops and can be used to augment/upgrade the ordering > > +inherit to the used atomic op. These barriers provide a full smp_mb(). > > inherit? "inherent" is I think the word I wanted..