Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754107AbdFND6w (ORCPT ); Tue, 13 Jun 2017 23:58:52 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40116 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753579AbdFND6u (ORCPT ); Tue, 13 Jun 2017 23:58:50 -0400 Date: Tue, 13 Jun 2017 20:58:43 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: Peter Zijlstra , Krister Johansen , Ingo Molnar , linux-kernel@vger.kernel.org, Paul Gortmaker , Thomas Gleixner Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up. Reply-To: paulmck@linux.vnet.ibm.com References: <20170609032546.GF2553@templeofstupid.com> <20170609071957.GJ8337@worktop.programming.kicks-ass.net> <20170609124554.GF3721@linux.vnet.ibm.com> <20170613192308.173dd86a@gandalf.local.home> <20170613234205.GD3721@linux.vnet.ibm.com> <20170613211547.49814d25@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170613211547.49814d25@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061403-0052-0000-0000-00000221EAB7 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007228; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00874475; UDB=6.00435292; IPR=6.00654564; BA=6.00005420; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015815; XFM=3.00000015; UTC=2017-06-14 03:58:46 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061403-0053-0000-0000-000050F43319 Message-Id: <20170614035843.GI3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-14_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706140072 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5947 Lines: 170 On Tue, Jun 13, 2017 at 09:15:47PM -0400, Steven Rostedt wrote: > On Tue, 13 Jun 2017 16:42:05 -0700 > "Paul E. McKenney" wrote: > > > On Tue, Jun 13, 2017 at 07:23:08PM -0400, Steven Rostedt wrote: > > > On Fri, 9 Jun 2017 05:45:54 -0700 > > > "Paul E. McKenney" wrote: > > > > > > > On Fri, Jun 09, 2017 at 09:19:57AM +0200, Peter Zijlstra wrote: > > > > > On Thu, Jun 08, 2017 at 08:25:46PM -0700, Krister Johansen wrote: > > > > > > The behavior of swake_up() differs from that of wake_up(), and from the > > > > > > swake_up() that came from RT linux. A memory barrier, or some other > > > > > > synchronization, is needed prior to a swake_up so that the waiter sees > > > > > > the condition set by the waker, and so that the waker does not see an > > > > > > empty wait list. > > > > > > > > > > Urgh.. let me stare at that. But it sounds like the wrong solution since > > > > > we wanted to keep the wait and swait APIs as close as possible. > > > > > > > > But don't they both need some sort of ordering, be it memory barriers or > > > > locking, to handle the case where the wait/swait doesn't actually sleep? > > > > > > > > > > Looking at an RCU example, and assuming that ordering can move around > > > within a spin lock, and that changes can leak into a spin lock region > > > from both before and after. Could we have: > > > > > > (looking at __call_rcu_core() and rcu_gp_kthread() > > > > > > CPU0 CPU1 > > > ---- ---- > > > __call_rcu_core() { > > > > > > spin_lock(rnp_root) > > > need_wake = __rcu_start_gp() { > > > rcu_start_gp_advanced() { > > > gp_flags = FLAG_INIT > > > } > > > } > > > > > > rcu_gp_kthread() { > > > swait_event_interruptible(wq, > > > gp_flags & FLAG_INIT) { > > > spin_lock(q->lock) > > > > > > *fetch wq->task_list here! * > > > > > > list_add(wq->task_list, q->task_list) > > > spin_unlock(q->lock); > > > > > > *fetch old value of gp_flags here * > > > > Both reads of ->gp_flags are READ_ONCE(), so having seen the new value > > in swait_event_interruptible(), this task/CPU cannot see the old value > > from some later access. You have to have accesses to two different > > variables to require a memory barrier (at least assuming consistent use > > of READ_ONCE(), WRITE_ONCE(), or equivalent). > > If I'm not mistaken, READ_ONCE() and WRITE_ONCE() is just volatiles > added. The compiler may not leak or move the the fetches, but what > about the hardware? The hardware cannot move the references if both references are in the same thread and to the same variable, which is the case with ->gp_flags. > A spin_lock() only needs to make sure what is after it does not leak > before it. > > A spin_unlock() only needs to make sure what is before it must not leak > after it. Both true, with the exception of a spin_is_locked() to that same lock variable, which cannot be reordered with either spin_lock() or spin_unlock() in either direction. > From my understandings of reading memory-barrier.txt, there's no > guarantees that the hardware doesn't let reads or writes that happen > before a spin_lock() happen after it. Nor does it guarantee that reads > or writes that happen after a spin_unlock() doesn't happen before it. > > The spin_locks only need to protect the inside of the critical section, > not the outside of it leaking in. Again, quite true. > I'm looking at this in particular: > > ==== > (1) ACQUIRE operation implication: > > Memory operations issued after the ACQUIRE will be completed after the > ACQUIRE operation has completed. > > Memory operations issued before the ACQUIRE may be completed after > the ACQUIRE operation has completed. An smp_mb__before_spinlock(), > combined with a following ACQUIRE, orders prior stores against > subsequent loads and stores. Note that this is weaker than smp_mb()! > The smp_mb__before_spinlock() primitive is free on many architectures. > > (2) RELEASE operation implication: > > Memory operations issued before the RELEASE will be completed before the > RELEASE operation has completed. > > Memory operations issued after the RELEASE may be completed before the > RELEASE operation has completed. > ==== And here is the part you also need to look at: ==== (*) Overlapping loads and stores within a particular CPU will appear to be ordered within that CPU. This means that for: a = READ_ONCE(*X); WRITE_ONCE(*X, b); the CPU will only issue the following sequence of memory operations: a = LOAD *X, STORE *X = b And for: WRITE_ONCE(*X, c); d = READ_ONCE(*X); the CPU will only issue: STORE *X = c, d = LOAD *X (Loads and stores overlap if they are targeted at overlapping pieces of memory). ==== This section needs some help -- the actual guarantee is stronger, that all CPUs will agree on the order of volatile same-sized aligned accesses to a given single location. So if a previous READ_ONCE() sees the new value, any subsequent READ_ONCE() from that same variable is guaranteed to also see the new value (or some later value). Does that help, or am I missing something here? Thanx, Paul > -- Steve > > > > > > > spin_unlock(rnp_root) > > > > > > rcu_gp_kthread_wake() { > > > swake_up(wq) { > > > swait_active(wq) { > > > list_empty(wq->task_list) > > > > > > } * return false * > > > > > > if (condition) * false * > > > schedule(); > > > > > > Looks like a memory barrier is missing. Perhaps we should slap on into > > > swait_active()? I don't think it is wise to let users add there own, as > > > I think we currently have bugs now. > > > > I -know- I have bugs now. ;-) > > > > But I don't believe this is one of them. Or am I getting confused? > > > > Thanx, Paul >