Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752811AbdFNPzy (ORCPT ); Wed, 14 Jun 2017 11:55:54 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:40740 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752271AbdFNPzw (ORCPT ); Wed, 14 Jun 2017 11:55:52 -0400 Date: Wed, 14 Jun 2017 08:55:31 -0700 From: "Paul E. McKenney" To: Steven Rostedt Cc: Peter Zijlstra , Krister Johansen , Ingo Molnar , linux-kernel@vger.kernel.org, Paul Gortmaker , Thomas Gleixner Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up. Reply-To: paulmck@linux.vnet.ibm.com References: <20170609032546.GF2553@templeofstupid.com> <20170609071957.GJ8337@worktop.programming.kicks-ass.net> <20170609124554.GF3721@linux.vnet.ibm.com> <20170613192308.173dd86a@gandalf.local.home> <20170613234205.GD3721@linux.vnet.ibm.com> <20170613211547.49814d25@gandalf.local.home> <20170614035843.GI3721@linux.vnet.ibm.com> <20170614091015.01d7dc89@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170614091015.01d7dc89@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061415-0056-0000-0000-00000388D5B5 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007231; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00874713; UDB=6.00435436; IPR=6.00654802; BA=6.00005421; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015823; XFM=3.00000015; UTC=2017-06-14 15:55:34 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061415-0057-0000-0000-000007BEE63C Message-Id: <20170614155531.GO3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-14_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706140268 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4936 Lines: 175 On Wed, Jun 14, 2017 at 09:10:15AM -0400, Steven Rostedt wrote: > On Tue, 13 Jun 2017 20:58:43 -0700 > "Paul E. McKenney" wrote: > > > And here is the part you also need to look at: > > Why? We are talking about two different, unrelated variables modified > on two different CPUs. I don't see where the overlap is. It does sound like we are talking past each other. Please see below for how I was interpreting your sequence of events. > > ==== > > > > (*) Overlapping loads and stores within a particular CPU will appear to be > > ordered within that CPU. This means that for: > > > > a = READ_ONCE(*X); WRITE_ONCE(*X, b); > > > > the CPU will only issue the following sequence of memory operations: > > > > a = LOAD *X, STORE *X = b > > > > And for: > > > > WRITE_ONCE(*X, c); d = READ_ONCE(*X); > > > > the CPU will only issue: > > > > STORE *X = c, d = LOAD *X > > > > (Loads and stores overlap if they are targeted at overlapping pieces of > > memory). > > > > ==== > > > > This section needs some help -- the actual guarantee is stronger, that > > all CPUs will agree on the order of volatile same-sized aligned accesses > > to a given single location. So if a previous READ_ONCE() sees the new > > value, any subsequent READ_ONCE() from that same variable is guaranteed > > to also see the new value (or some later value). > > > > Does that help, or am I missing something here? > > Maybe I'm missing something. Let me rewrite what I first wrote, and > then abstract it into a simpler version: > > Here's what I first wrote: > > (looking at __call_rcu_core() and rcu_gp_kthread() > > CPU0 CPU1 > ---- ---- > __call_rcu_core() { > > spin_lock(rnp_root) > need_wake = __rcu_start_gp() { > rcu_start_gp_advanced() { > gp_flags = FLAG_INIT > } > } > > rcu_gp_kthread() { > swait_event_interruptible(wq, > gp_flags & FLAG_INIT) { This is the first access to ->gp_flags from rcu_gp_kthread(). > spin_lock(q->lock) > > *fetch wq->task_list here! * > > list_add(wq->task_list, q->task_list) > spin_unlock(q->lock); > > *fetch old value of gp_flags here * This is the second access to ->gp_flags. Since you are saying that ->gp_flags is only accessed once, perhaps this code from spin_lock() down is intended to be an expansion of swait_event_interruptible()? #define swait_event_interruptible(wq, condition) \ ({ \ int __ret = 0; \ if (!(condition)) \ __ret = __swait_event_interruptible(wq, condition); \ __ret; \ }) But no, in this case, we have the macro argument named "condition" accessing ->gp_flags, and a control dependency forcing that access to precede the spin_lock() in __prepare_to_swait(). We cannot acquire the spinlock unless the condition is false, that is, the old value is fetched. So there is a first fetch of ->gp_flags that is constrained to happen before the spin_lock(). Any fetch of ->gp_flags after the spin_unlock() must therefore be a second fetch. Which of course might still get the old value because the update to ->gp_flags might not have propagated yet. But it appears that you are worried about something else. > spin_unlock(rnp_root) > > rcu_gp_kthread_wake() { > swake_up(wq) { > swait_active(wq) { > list_empty(wq->task_list) We don't hold q->lock here, so I am guessing that your concern is that we aren't guaranteed to see the above list_add(). Is that the case? If so, your suggested fix is to place an smp_mb() between swait_event_interruptible()'s access to "condition" and __prepare_to_swait()'s list_add(), correct? And also an smp_mb() before swake_up()'s call to swait_active(), correct? The second smp_mb() could be placed by the user, but the first one cannot, at least not reasonably. So did I get the point eventually? ;-) Thanx, Paul > } * return false * > > if (condition) * false * > schedule(); > > > Now let's make it simpler. I'll even add the READ_ONCE and WRITE_ONCE > where applicable. > > > CPU0 CPU1 > ---- ---- > LOCK(A) > > LOCK(B) > WRITE_ONCE(X, INIT) > > (the cpu may postpone writing X) > > (the cpu can fetch wq list here) > list_add(wq, q) > > UNLOCK(B) > > (the cpu may fetch old value of X) > > (write of X happens here) > > if (READ_ONCE(X) != init) > schedule(); > > UNLOCK(A) > > if (list_empty(wq)) > return; > > Tell me again how the READ_ONCE() and WRITE_ONCE() helps in this > scenario? > > Because we are using spinlocks, this wont be an issue for most > architectures. The bug happens if the fetching of the list_empty() > leaks into before the UNLOCK(A). > > If the reading/writing of the list and the reading/writing of gp_flags > gets reversed in either direction by the CPU, then we have a problem. > > -- Steve >