Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754082AbdF0XiI (ORCPT ); Tue, 27 Jun 2017 19:38:08 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39196 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754035AbdF0Xh5 (ORCPT ); Tue, 27 Jun 2017 19:37:57 -0400 Date: Tue, 27 Jun 2017 16:37:48 -0700 From: "Paul E. McKenney" To: Linus Torvalds Cc: Andrea Parri , Linux Kernel Mailing List , priyalee.kushwaha@intel.com, drozdziak1@gmail.com, Arnd Bergmann , ldr709@gmail.com, Thomas Gleixner , Peter Zijlstra , Josh Triplett , Nicolas Pitre , Krister Johansen , Vegard Nossum , dcb314@hotmail.com, Wu Fengguang , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Ingo Molnar , Alan Stern , Luc Maranget , Jade Alglave Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13 Reply-To: paulmck@linux.vnet.ibm.com References: <20170612213754.GA7201@linux.vnet.ibm.com> <20170614025404.GA2525@andrea> <20170614043317.GK3721@linux.vnet.ibm.com> <20170614143322.GA3368@andrea> <20170614202329.GV3721@linux.vnet.ibm.com> <20170619162428.GA9632@andrea> <20170627205802.GY3721@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17062723-0044-0000-0000-0000035DE7ED X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007287; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00879641; UDB=6.00438432; IPR=6.00659795; BA=6.00005445; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015981; XFM=3.00000015; UTC=2017-06-27 23:37:55 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17062723-0045-0000-0000-0000078BF0DB Message-Id: <20170627233748.GZ3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-27_15:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706270368 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6466 Lines: 151 On Tue, Jun 27, 2017 at 02:48:18PM -0700, Linus Torvalds wrote: > On Tue, Jun 27, 2017 at 1:58 PM, Paul E. McKenney > wrote: > > > > So what next? > > > > One option would be to weaken the definition of spin_unlock_wait() so > > that it had acquire semantics but not release semantics. Alternatively, > > we could keep the full empty-critical-section semantics and add memory > > barriers to spin_unlock_wait() implementations, perhaps as shown in the > > patch below. I could go either way, though I do have some preference > > for the stronger semantics. > > > > Thoughts? > > I would prefer to just say > > - document that spin_unlock_wait() has acquire semantics > > - mindlessly add the smp_mb() to all users > > - let users then decide if they are ok with just acquire > > That's partly because I think we actually have *fewer* users than we > have implementations of spin_unlock_wait(). So adding a few smp_mb()'s > in the users is actually likely the smaller change. You are right about that! There are only five invocations of spin_unlock_wait() in the kernel, with a sixth that has since been converted to spin_lock() immediately followed by spin_unlock(). > But it's also because then that allows people who *can* say that > acquire is sufficient to just use it. People who use > spin_unlock_wait() tend to have some odd performance reason to do so, > so I think allowing them to use the more light-weight memory ordering > if it works for them is a good idea. > > But finally, it's partly because I think "acquire" semantics are > actually the saner ones that we can explain the logic for much more > clearly. > > Basically, acquire semantics means that you are guaranteed to see any > changes that were done inside a previously locked region. > > Doesn't that sound like sensible semantics? It is the semantics that most implementations of spin_unlock_wait() provide. Of the six invocations, two of them very clearly rely only on the acquire semantics and two others already have the needed memory barriers in place. I have queued one patch to add smp_mb() to the remaining spin_unlock_wait() of the surviving five instances, and another patch to convert the spin_lock/unlock pair to smp_mb() followed by spin_unlock_wait(). So, yes, it is a sensible set of semantics. At this point, agreeing -any- reasonable semantics would be good, as it would allow us to get locking added to the prototype Linux-kernel memory model. ;-) > Then, the argument for "smp_mb()" (before the spin_unlock_wait()) becomes: > > - I did a write that will affect any future lock takes > > - the smp_mb() now means that that write will be ordered wrt the > acquire that guarantees we've seen all old actions taken by a lock. > > Does those kinds of semantics make sense to people? In case the answer is "yes", the (untested) patch below (combining three commits) shows the changes that I believe would be required. A preview is also available as individual commits on branch spin_unlock_wait.2017.06.27a on -rcu here: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git As usual, thoughts? ;-) Thanx, Paul ------------------------------------------------------------------------ diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index ef68232b5222..cc01b77a079a 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -704,8 +704,10 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, /* initialize eh_tries */ ap->eh_tries = ATA_EH_MAX_TRIES; - } else + } else { + smp_mb(); /* Add release semantics for spin_unlock_wait(). */ spin_unlock_wait(ap->lock); + } } EXPORT_SYMBOL(ata_scsi_cmd_error_handler); diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index d9510e8522d4..0c3f54e2a1d1 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -373,21 +373,21 @@ static __always_inline int spin_trylock_irq(spinlock_t *lock) * spin_unlock_wait - Interpose between successive critical sections * @lock: the spinlock whose critical sections are to be interposed. * - * Semantically this is equivalent to a spin_lock() immediately - * followed by a spin_unlock(). However, most architectures have - * more efficient implementations in which the spin_unlock_wait() - * cannot block concurrent lock acquisition, and in some cases - * where spin_unlock_wait() does not write to the lock variable. - * Nevertheless, spin_unlock_wait() can have high overhead, so if - * you feel the need to use it, please check to see if there is - * a better way to get your job done. + * Semantically this is equivalent to a spin_lock() immediately followed + * by a mythical spin_unlock() that has no ordering semantics. However, + * most architectures have more efficient implementations in which the + * spin_unlock_wait() cannot block concurrent lock acquisition, and in some + * cases where spin_unlock_wait() does not write to the lock variable. + * Nevertheless, spin_unlock_wait() can have high overhead, so if you + * feel the need to use it, please check to see if there is a better way + * to get your job done. * - * The ordering guarantees provided by spin_unlock_wait() are: - * - * 1. All accesses preceding the spin_unlock_wait() happen before - * any accesses in later critical sections for this same lock. - * 2. All accesses following the spin_unlock_wait() happen after - * any accesses in earlier critical sections for this same lock. + * The spin_unlock_wait() function guarantees that all accesses following + * the spin_unlock_wait() happen after any accesses in earlier critical + * sections for this same lock. Please note that it does -not- guarantee + * that accesses preceding the spin_unlock_wait() happen before any accesses + * in later critical sections for this same lock. If you need this latter + * ordering, precede the spin_unlock_wait() with an smp_mb() or similar. */ static __always_inline void spin_unlock_wait(spinlock_t *lock) { diff --git a/ipc/sem.c b/ipc/sem.c index 947dc2348271..ef42e55e9dd0 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -307,8 +307,8 @@ static void complexmode_enter(struct sem_array *sma) for (i = 0; i < sma->sem_nsems; i++) { sem = sma->sem_base + i; - spin_lock(&sem->lock); - spin_unlock(&sem->lock); + smp_mb(); /* Add release semantics for spin_unlock_wait(). */ + spin_unlock_wait(&sem->lock); } }