Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751958AbdFPDJV (ORCPT ); Thu, 15 Jun 2017 23:09:21 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57526 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751711AbdFPDJU (ORCPT ); Thu, 15 Jun 2017 23:09:20 -0400 Date: Thu, 15 Jun 2017 20:09:14 -0700 From: "Paul E. McKenney" To: Boqun Feng Cc: Krister Johansen , Steven Rostedt , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, Paul Gortmaker , Thomas Gleixner Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up. Reply-To: paulmck@linux.vnet.ibm.com References: <20170613192308.173dd86a@gandalf.local.home> <20170613234205.GD3721@linux.vnet.ibm.com> <20170613211547.49814d25@gandalf.local.home> <20170614035843.GI3721@linux.vnet.ibm.com> <20170614091015.01d7dc89@gandalf.local.home> <20170614110240.10abe2ed@gandalf.local.home> <20170614162558.GA2368@templeofstupid.com> <20170615041828.zk3a3sfyudm5p6nl@tardis> <20170615175629.GE3721@linux.vnet.ibm.com> <20170616010757.kegygn4ndivdb4wh@tardis> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170616010757.kegygn4ndivdb4wh@tardis> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061603-0048-0000-0000-000001A8A516 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007240; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00875415; UDB=6.00435857; IPR=6.00655504; BA=6.00005423; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015845; XFM=3.00000015; UTC=2017-06-16 03:09:17 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061603-0049-0000-0000-000041894D83 Message-Id: <20170616030914.GM3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-16_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=3 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706160049 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2415 Lines: 64 On Fri, Jun 16, 2017 at 09:07:57AM +0800, Boqun Feng wrote: > On Thu, Jun 15, 2017 at 10:56:29AM -0700, Paul E. McKenney wrote: > [...] > > > > > > > > FWLIW, I agree. There was a smb_mb() in RT-linux's equivalent of > > > > swait_activate(). > > > > > > > > https://www.spinics.net/lists/linux-rt-users/msg10340.html > > > > > > > > If the barrier goes in swait_active() then we don't have to require all > > > > of the callers of swait_active and swake_up to issue the barrier > > > > instead. Handling this in swait_active is likely to be less error > > > > prone. Though, we could also do something like wq_has_sleeper() and use > > > > that preferentially in swake_up and its variants. > > > > > > > > > > I think it makes more sense that we delete the swait_active() in > > > swake_up()? Because we seems to encourage users to do the quick check on > > > wait queue on their own, so why do the check again in swake_up()? > > > Besides, wake_up() doesn't call waitqueue_activie() outside the lock > > > critical section either. > > > > > > So how about the patch below(Testing is in progress)? Peter? > > > > It is quite possible that a problem I am seeing is caused by this, but > > there are reasons to believe otherwise. And in any case, the problem is > > quite rare, taking tens or perhaps even hundreds of hours of rcutorture > > to reproduce. > > > > So, would you be willing to create a dedicated swait torture test to check > > this out? The usual approach would be to create a circle of kthreads, > > with each waiting on the previous kthread and waking up the next one. > > Each kthread, after being awakened, checks a variable that its waker > > sets just before the wakeup. Have another kthread check for hangs. > > > > Possibly introduce timeouts and random delays to stir things up a bit. > > > > But maybe such a test already exists. Does anyone know of one? I don't > > see anything obvious. > > > > Your waketorture patchset[1] seems to be something similar, at least a > good start ;-) Glad I could help! ;-) > As we don't know which kind of scenario will trigger the problem easily, > I will play around with different ones, and hopefully we can find a way. Makes sense, please let me know how it goes! Thanx, Paul > Regards, > Boqun > > [1]: https://marc.info/?l=linux-kernel&m=146602969518960 > > > Interested? > > > > Thanx, Paul > > > [...]