Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751866AbdF3FPH (ORCPT ); Fri, 30 Jun 2017 01:15:07 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:35052 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751333AbdF3FPG (ORCPT ); Fri, 30 Jun 2017 01:15:06 -0400 Date: Fri, 30 Jun 2017 13:16:54 +0800 From: Boqun Feng To: "Paul E. McKenney" Cc: Alan Stern , Will Deacon , Linus Torvalds , Andrea Parri , Linux Kernel Mailing List , priyalee.kushwaha@intel.com, =?utf-8?Q?Stanis=C5=82aw?= Drozd , Arnd Bergmann , ldr709@gmail.com, Thomas Gleixner , Peter Zijlstra , Josh Triplett , Nicolas Pitre , Krister Johansen , Vegard Nossum , dcb314@hotmail.com, Wu Fengguang , Frederic Weisbecker , Rik van Riel , Steven Rostedt , Ingo Molnar , Luc Maranget , Jade Alglave Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13 Message-ID: <20170630051654.wsoog5nlwtmbh5y2@tardis> References: <20170629113848.GA18630@arm.com> <20170629181126.GA2393@linux.vnet.ibm.com> <20170630025126.jhflffwrnedlqrmz@tardis> <20170630040241.GR2393@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="o2q4xsgtd3mfamcz" Content-Disposition: inline In-Reply-To: <20170630040241.GR2393@linux.vnet.ibm.com> User-Agent: NeoMutt/20170225 (1.8.0) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8144 Lines: 209 --o2q4xsgtd3mfamcz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 29, 2017 at 09:02:41PM -0700, Paul E. McKenney wrote: [...] > > > o net/netfilter/nf_conntrack_core.c nf_conntrack_lock() > > > This instance of spin_unlock_wait() interacts with > > > nf_conntrack_all_lock()'s instance of spin_unlock_wait(). > > > Although nf_conntrack_all_lock() has an smp_mb(), which I > > > believe provides release semantics given current implementations, > > > nf_conntrack_lock() just has smp_rmb(). > > >=20 > > > I believe that the smp_rmb() needs to be smp_mb(). Am I missing > > > something here that makes the current code safe on x86? > > >=20 > >=20 > > actually i think the smp_rmb() or even along with the spin_unlock_wait() > > in nf_conntrack_lock() is not needed, we could > > implementnf_conntrack_lock() as: > >=20 > > =09 > > void nf_conntrack_lock(spinlock_t *lock) __acquires(lock) > > { > > spin_lock(lock); > > while (unlikely(smp_load_acquire(nf_conntrack_locks_all))) { > > spin_unlock(lock); > > cpu_relaxed(); > > spin_lock(lock); > > } > > } > >=20 > > because in nf_conntrack_all_unlock(), we have: > >=20 > > smp_store_release(&nf_conntrack_locks_all, false); > > spin_unlock(&nf_conntrack_locks_all_lock); > >=20 > > so if we exit the loop, which means we observe nf_conntrack_locks_all > > being false, we actually hold the per bucket lock and observe everything > > before the smp_store_release(), which is the same as everything in the > > critical section of nf_conntrack_locks_all_lock. Otherwise, we observe > > the nf_conntrack_locks_all being true, which means a global lock > > critical section may be on its way, we simply drop the per bucket lock > > and test whether the global lock is finished again some time later. > >=20 > > So I think spin_unlock_wait() in the nf_conntrack_lock() just requires > > acquire semantics, at least. > >=20 > > Maybe I miss someting? >=20 > Or perhaps I was being too paranoid. >=20 > But does the same analysis work in the case where an nf_conntrack_lock > races with an nf_contrack_all_lock()? >=20 You mean the smp_mb()+spin_unlock_wait() in nf_conntrack_all_lock(), right? I think it's different, because nf_conntrack_all_lock() relies this release-like operation to let all the next critical sections of per bucket locks observe nf_conntrack_locks_all=3Dtrue, otherwise nf_conntrack_lock() will break out the loop and access some data while the global lock crictial section is doing the same. The variable @nf_conntrack_locks_all is used for synchronized between two kinds of locks and is set by nf_conntrack_all_lock(), I think this make things different. > > > I believe that this code could use spin_lock+spin_unlock without > > > significant performance penalties -- I do not believe that > > > nf_conntrack_locks_all_lock gets significant contention. > > >=20 > > > raw_spin_unlock_wait() (Courtesy of Andrea Parri with added commentar= y): > > >=20 > > > o kernel/exit.c do_exit() > > > Seems to rely on both acquire and release semantics. The > > > raw_spin_unlock_wait() primitive is preceded by a smp_mb(). > > > But this is task exit doing spin_unlock_wait() on the task's > > > lock, so spin_lock+spin_unlock should work fine here. > > >=20 > > > o kernel/sched/core.c do_task_dead() > > > Seems to rely on the acquire semantics only. The > > > raw_spin_unlock_wait() primitive is preceded by an inexplicable > > > smp_mb(). Again, this is task exit doing spin_unlock_wait() on > > > the task's lock, so spin_lock+spin_unlock should work fine here. > > >=20 > > > o kernel/task_work.c task_work_run() > > > Seems to rely on the acquire semantics only. This is to handle > >=20 > > I think this one needs the stronger semantics, the smp_mb() is just > > hidden in the cmpxchg() before the raw_spin_unlock_wait() ;-) > >=20 > > cmpxchg() sets a special value to indicate the task_work has been taken, > > and raw_spin_unlock_wait() must wait until the next critical section of > > ->pi_lock(in task_work_cancel()) could observe this, otherwise we may > > cancel a task_work while executing it. >=20 > But either way, replacing the spin_unlock_wait() with a spin_lock() > immediately followed by a spin_unlock() should work correctly, right? >=20 Yep ;-) I was thinking about the case that we kept spin_unlock_wait() with a simpler acquire semantics, and if so, we would actually have to do the replace. But I saw your patchset of removing it, so it doesn't matter. Regards, Boqun > Thanx, Paul >=20 > > Regards, > > Boqun > > > a race with task_work_cancel(), which appears to be quite rare. > > > So the spin_lock+spin_unlock should work fine here. > > >=20 > > > spin_lock()/spin_unlock(): > > >=20 > > > o ipc/sem.c complexmode_enter() > > > This used to be spin_unlock_wait(), but was changed to a > > > spin_lock()/spin_unlock() pair by 27d7be1801a4 ("ipc/sem.c: > > > avoid using spin_unlock_wait()"). > > >=20 > > > Looks to me like we really can drop spin_unlock_wait() in favor of > > > momentarily acquiring the lock. There are so few use cases that I do= n't > > > see a problem open-coding this. I will put together yet another patch > > > series for my spin_unlock_wait() collection of patch serieses. ;-) > > >=20 > > > > As regards (2), I did a little digging. spin_unlock_wait was > > > > introduced in the 2.1.36 kernel, in mid-April 1997. I wasn't able = to > > > > find a specific patch for it in the LKML archives. At the time it > > > > was used in only one place in the entire kernel (in kernel/exit.c): > > > >=20 > > > > void release(struct task_struct * p) > > > > { > > > > int i; > > > >=20 > > > > if (!p) > > > > return; > > > > if (p =3D=3D current) { > > > > printk("task releasing itself\n"); > > > > return; > > > > } > > > > for (i=3D1 ; i > > > if (task[i] =3D=3D p) { > > > > #ifdef __SMP__ > > > > /* FIXME! Cheesy, but kills the window... -DaveM */ > > > > while(p->processor !=3D NO_PROC_ID) > > > > barrier(); > > > > spin_unlock_wait(&scheduler_lock); > > > > #endif > > > > nr_tasks--; > > > > task[i] =3D NULL; > > > > REMOVE_LINKS(p); > > > > release_thread(p); > > > > if (STACK_MAGIC !=3D *(unsigned long *)p->kernel_stack_page) > > > > printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n"= , p->comm); > > > > free_kernel_stack(p->kernel_stack_page); > > > > current->cmin_flt +=3D p->min_flt + p->cmin_flt; > > > > current->cmaj_flt +=3D p->maj_flt + p->cmaj_flt; > > > > current->cnswap +=3D p->nswap + p->cnswap; > > > > free_task_struct(p); > > > > return; > > > > } > > > > panic("trying to release non-existent task"); > > > > } > > > >=20 > > > > I'm not entirely clear on the point of this call. It looks like it= =20 > > > > wanted to wait until p was guaranteed not to be running on any=20 > > > > processor ever again. (I don't see why it couldn't have just acqui= red=20 > > > > the scheduler_lock -- was release() a particularly hot path?) > > > >=20 > > > > Although it doesn't matter now, this would mean that the original > > > > semantics of spin_unlock_wait were different from what we are > > > > discussing. It apparently was meant to provide the release guarant= ee: > > > > any future critical sections would see the values that were visible > > > > before the call. Ironic. > > >=20 > > > Cute!!! ;-) > > >=20 > > > Thanx, Paul > > >=20 >=20 >=20 --o2q4xsgtd3mfamcz Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAllV3sMACgkQSXnow7UH +rifwQf9F52ua2QkCrum20Ar5yo2jUEvkYwRQ8iwzpCIHq6lhiiIB0Cjg3zlaXtr AtHoH080bIDZqUxYvmPyrX+aRItBc9kPmZJMHK/VFPoYoUio5m5gookBTizrtkm1 kWe6mhwXUuGBJZTcdE/h5oBQK11fpYHfp4dWBsKun8rhExx7fvRuAR+/nIw5oqNx L7Mshiu4DNd6meBcxNSkODnR1CcuUmrMB3KNOUHIm8lF0iOdFJYYg6IAbyb6m5Mn hVaQ2Qwlat4H4CHMjyDPEfAatSFmXmBVq7KohTY2fiJQdX/Oj6G67HEWBBAxxJcA CMMkCkcL/y4vLFNi8TyaWlZsyx7Gfg== =zE5G -----END PGP SIGNATURE----- --o2q4xsgtd3mfamcz--