Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752844AbdFOR4j (ORCPT ); Thu, 15 Jun 2017 13:56:39 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:50353 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752828AbdFOR4g (ORCPT ); Thu, 15 Jun 2017 13:56:36 -0400 Date: Thu, 15 Jun 2017 10:56:29 -0700 From: "Paul E. McKenney" To: Boqun Feng Cc: Krister Johansen , Steven Rostedt , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Paul Gortmaker , Thomas Gleixner Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up. Reply-To: paulmck@linux.vnet.ibm.com References: <20170609071957.GJ8337@worktop.programming.kicks-ass.net> <20170609124554.GF3721@linux.vnet.ibm.com> <20170613192308.173dd86a@gandalf.local.home> <20170613234205.GD3721@linux.vnet.ibm.com> <20170613211547.49814d25@gandalf.local.home> <20170614035843.GI3721@linux.vnet.ibm.com> <20170614091015.01d7dc89@gandalf.local.home> <20170614110240.10abe2ed@gandalf.local.home> <20170614162558.GA2368@templeofstupid.com> <20170615041828.zk3a3sfyudm5p6nl@tardis> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170615041828.zk3a3sfyudm5p6nl@tardis> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061517-0036-0000-0000-0000022C4591 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007238; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00875231; UDB=6.00435748; IPR=6.00655319; BA=6.00005423; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015837; XFM=3.00000015; UTC=2017-06-15 17:56:32 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061517-0037-0000-0000-000040BD4AE0 Message-Id: <20170615175629.GE3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-15_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706150309 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7422 Lines: 232 On Thu, Jun 15, 2017 at 12:18:28PM +0800, Boqun Feng wrote: > On Wed, Jun 14, 2017 at 09:25:58AM -0700, Krister Johansen wrote: > > On Wed, Jun 14, 2017 at 11:02:40AM -0400, Steven Rostedt wrote: > > > On Wed, 14 Jun 2017 09:10:15 -0400 > > > Steven Rostedt wrote: > > > > > > > Now let's make it simpler. I'll even add the READ_ONCE and WRITE_ONCE > > > > where applicable. > > > > > > > > > > > > CPU0 CPU1 > > > > ---- ---- > > > > LOCK(A) > > > > > > > > LOCK(B) > > > > WRITE_ONCE(X, INIT) > > > > > > > > (the cpu may postpone writing X) > > > > > > > > (the cpu can fetch wq list here) > > > > list_add(wq, q) > > > > > > > > UNLOCK(B) > > > > > > > > (the cpu may fetch old value of X) > > > > > > > > (write of X happens here) > > > > > > > > if (READ_ONCE(X) != init) > > > > schedule(); > > > > > > > > UNLOCK(A) > > > > > > > > if (list_empty(wq)) > > > > return; > > > > > > > > Tell me again how the READ_ONCE() and WRITE_ONCE() helps in this > > > > scenario? > > > > > > > > Because we are using spinlocks, this wont be an issue for most > > > > architectures. The bug happens if the fetching of the list_empty() > > > > leaks into before the UNLOCK(A). > > > > > > > > If the reading/writing of the list and the reading/writing of gp_flags > > > > gets reversed in either direction by the CPU, then we have a problem. > > > > > > FYI.. > > > > > > Both sides need a memory barrier. Otherwise, even with a memory barrier > > > on CPU1 we can still have: > > > > > > > > > CPU0 CPU1 > > > ---- ---- > > > > > > LOCK(A) > > > LOCK(B) > > > > > > list_add(wq, q) > > > > > > (cpu waits to write wq list) > > > > > > (cpu fetches X) > > > > > > WRITE_ONCE(X, INIT) > > > > > > UNLOCK(A) > > > > > > smp_mb(); > > > > > > if (list_empty(wq)) > > > return; > > > > > > (cpu writes wq list) > > > > > > UNLOCK(B) > > > > > > if (READ_ONCE(X) != INIT) > > > schedule() > > > > > > > > > Luckily for us, there is a memory barrier on CPU0. In > > > prepare_to_swait() we have: > > > > > > raw_spin_lock_irqsave(&q->lock, flags); > > > __prepare_to_swait(q, wait); > > > set_current_state(state); > > > raw_spin_unlock_irqrestore(&q->lock, flags); > > > > > > And that set_current_state() call includes a memory barrier, which will > > > prevent the above from happening, as the addition to the wq list must > > > be flushed before fetching X. > > > > > > I still strongly believe that the swait_active() requires a memory > > > barrier. > > > > FWLIW, I agree. There was a smb_mb() in RT-linux's equivalent of > > swait_activate(). > > > > https://www.spinics.net/lists/linux-rt-users/msg10340.html > > > > If the barrier goes in swait_active() then we don't have to require all > > of the callers of swait_active and swake_up to issue the barrier > > instead. Handling this in swait_active is likely to be less error > > prone. Though, we could also do something like wq_has_sleeper() and use > > that preferentially in swake_up and its variants. > > > > I think it makes more sense that we delete the swait_active() in > swake_up()? Because we seems to encourage users to do the quick check on > wait queue on their own, so why do the check again in swake_up()? > Besides, wake_up() doesn't call waitqueue_activie() outside the lock > critical section either. > > So how about the patch below(Testing is in progress)? Peter? It is quite possible that a problem I am seeing is caused by this, but there are reasons to believe otherwise. And in any case, the problem is quite rare, taking tens or perhaps even hundreds of hours of rcutorture to reproduce. So, would you be willing to create a dedicated swait torture test to check this out? The usual approach would be to create a circle of kthreads, with each waiting on the previous kthread and waking up the next one. Each kthread, after being awakened, checks a variable that its waker sets just before the wakeup. Have another kthread check for hangs. Possibly introduce timeouts and random delays to stir things up a bit. But maybe such a test already exists. Does anyone know of one? I don't see anything obvious. Interested? Thanx, Paul > Regards, > Boqun > > --------------------->8 > Subject: [PATCH] swait: Remove the lockless swait_active() check in > swake_up*() > > Steven Rostedt reported a potential race in RCU core because of > swake_up(): > > CPU0 CPU1 > ---- ---- > __call_rcu_core() { > > spin_lock(rnp_root) > need_wake = __rcu_start_gp() { > rcu_start_gp_advanced() { > gp_flags = FLAG_INIT > } > } > > rcu_gp_kthread() { > swait_event_interruptible(wq, > gp_flags & FLAG_INIT) { > spin_lock(q->lock) > > *fetch wq->task_list here! * > > list_add(wq->task_list, q->task_list) > spin_unlock(q->lock); > > *fetch old value of gp_flags here * > > spin_unlock(rnp_root) > > rcu_gp_kthread_wake() { > swake_up(wq) { > swait_active(wq) { > list_empty(wq->task_list) > > } * return false * > > if (condition) * false * > schedule(); > > In this case, a wakeup is missed, which could cause the rcu_gp_kthread > waits for a long time. > > The reason of this is that we do a lockless swait_active() check in > swake_up(). To fix this, we can either 1) add a smp_mb() in swake_up() > before swait_active() to provide the proper order or 2) simply remove > the swait_active() in swake_up(). > > The solution 2 not only fixes this problem but also keeps the swait and > wait API as close as possible, as wake_up() doesn't provide a full > barrier and doesn't do a lockless check of the wait queue either. > Moreover, there are users already using swait_active() to do their quick > checks for the wait queues, so it make less sense that swake_up() and > swake_up_all() do this on their own. > > This patch then removes the lockless swait_active() check in swake_up() > and swake_up_all(). > > Reported-by: Steven Rostedt > Signed-off-by: Boqun Feng > --- > kernel/sched/swait.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c > index 3d5610dcce11..2227e183e202 100644 > --- a/kernel/sched/swait.c > +++ b/kernel/sched/swait.c > @@ -33,9 +33,6 @@ void swake_up(struct swait_queue_head *q) > { > unsigned long flags; > > - if (!swait_active(q)) > - return; > - > raw_spin_lock_irqsave(&q->lock, flags); > swake_up_locked(q); > raw_spin_unlock_irqrestore(&q->lock, flags); > @@ -51,9 +48,6 @@ void swake_up_all(struct swait_queue_head *q) > struct swait_queue *curr; > LIST_HEAD(tmp); > > - if (!swait_active(q)) > - return; > - > raw_spin_lock_irq(&q->lock); > list_splice_init(&q->task_list, &tmp); > while (!list_empty(&tmp)) { > -- > 2.13.0 >