Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp3146150ybd; Fri, 28 Jun 2019 03:42:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqy61p7D1Q2V4fdJZpd+UDgZVtO+foZvwFN4N1foitvEJXmTihh70Wr8IKg8VIb88dBefZIf X-Received: by 2002:a17:902:2ec5:: with SMTP id r63mr10637832plb.21.1561718540716; Fri, 28 Jun 2019 03:42:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561718540; cv=none; d=google.com; s=arc-20160816; b=CNVU4SaeL0z3MTwVK4UD5a6MBDuHGU7j4uTy7WWk8apeAke6pImL52r3jkzxeXnPk1 FRmJEWEp9Fer4bUqh8M9f5GgrOVmzEBRnldZUZ97TGahdT2JFheSLc+AeWEKe9fZ9Z9n xWWxiaulNKwqH1sN2XA+0yzp8VEjuIXVuIcG2hv5LwNIojXdvZdW3u0FVCwv35Wi/g6F ejWml/jTBTkjq12tlwn8bUikYkg1UxL+nO2dxHX4rrndJTIS6ZTw1kiNtmTH+Y7/XYtt ZmS/Vb4RNWRKR8Mc7K19ZJ3Vcu2313HLXw4TlzNrSQXaVjOC19mqOBP9vNpMGMip/inv CWcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=GjCH/sxIincQeNpvMQJPvjxbG47ZxmfoQjvKF7tpGfI=; b=SgObgyeJSdOBqrSK4JGavZNlbKjJH/GSdtADStHCZpW2uwkaiNLdhZ4qpfYGfW2TBb brCm5KknM5lGRHFY97REpiMLnbvHrisCaR3JP/OLzocuW/0DMTrLhmXjQDAv+0BaDwAB sWt2NqV93fnEnvtb3NcnBfxWPvCJSItDqNI+vhoekSOt0WI2ORZxYyo2axCSq/XcXwyI bm83reODUc8ZZTlahYNbuIRy6HXdnMmHzz7awE1pWO7ZA2X0iEIjmZgEeJ+aFFg1xOWJ EBo56CHZaMDq8cIw73D837OnZXD9QxPlxONl6c5f7vM1xEzzzFoEylHdCSd+jof0wJ9v ba3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m37si2058572pje.45.2019.06.28.03.42.04; Fri, 28 Jun 2019 03:42:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726836AbfF1Kla (ORCPT + 99 others); Fri, 28 Jun 2019 06:41:30 -0400 Received: from lgeamrelo11.lge.com ([156.147.23.51]:45029 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726564AbfF1Kla (ORCPT ); Fri, 28 Jun 2019 06:41:30 -0400 Received: from unknown (HELO lgemrelse6q.lge.com) (156.147.1.121) by 156.147.23.51 with ESMTP; 28 Jun 2019 19:41:26 +0900 X-Original-SENDERIP: 156.147.1.121 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO X58A-UD3R) (10.177.222.33) by 156.147.1.121 with ESMTP; 28 Jun 2019 19:41:26 +0900 X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Fri, 28 Jun 2019 19:40:45 +0900 From: Byungchul Park To: "Paul E. McKenney" Cc: Scott Wood , Joel Fernandes , Steven Rostedt , Sebastian Andrzej Siewior , rcu , LKML , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Josh Triplett , Mathieu Desnoyers , Lai Jiangshan Subject: Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs Message-ID: <20190628104045.GA8394@X58A-UD3R> References: <20190627103455.01014276@gandalf.local.home> <20190627153031.GA249127@google.com> <20190627155506.GU26519@linux.ibm.com> <20190627173831.GW26519@linux.ibm.com> <20190627181638.GA209455@google.com> <20190627184107.GA26519@linux.ibm.com> <13761fee4b71cc004ad0d6709875ce917ff28fce.camel@redhat.com> <20190627203612.GD26519@linux.ibm.com> <20190628073138.GB13650@X58A-UD3R> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190628073138.GB13650@X58A-UD3R> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 28, 2019 at 04:31:38PM +0900, Byungchul Park wrote: > On Thu, Jun 27, 2019 at 01:36:12PM -0700, Paul E. McKenney wrote: > > On Thu, Jun 27, 2019 at 03:17:27PM -0500, Scott Wood wrote: > > > On Thu, 2019-06-27 at 11:41 -0700, Paul E. McKenney wrote: > > > > On Thu, Jun 27, 2019 at 02:16:38PM -0400, Joel Fernandes wrote: > > > > > > > > > > I think the fix should be to prevent the wake-up not based on whether we > > > > > are > > > > > in hard/soft-interrupt mode but that we are doing the rcu_read_unlock() > > > > > from > > > > > a scheduler path (if we can detect that) > > > > > > > > Or just don't do the wakeup at all, if it comes to that. I don't know > > > > of any way to determine whether rcu_read_unlock() is being called from > > > > the scheduler, but it has been some time since I asked Peter Zijlstra > > > > about that. > > > > > > > > Of course, unconditionally refusing to do the wakeup might not be happy > > > > thing for NO_HZ_FULL kernels that don't implement IRQ work. > > > > > > Couldn't smp_send_reschedule() be used instead? > > > > Good point. If current -rcu doesn't fix things for Sebastian's case, > > that would be well worth looking at. But there must be some reason > > why Peter Zijlstra didn't suggest it when he instead suggested using > > the IRQ work approach. > > > > Peter, thoughts? > > Hello, > > Isn't the following scenario possible? > > The original code > ----------------- > rcu_read_lock(); > ... > /* Experdite */ > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true); > ... > __rcu_read_unlock(); > if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) > rcu_read_unlock_special(t); > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false); > rcu_preempt_deferred_qs_irqrestore(t, flags); > barrier(); /* ->rcu_read_unlock_special load before assign */ > t->rcu_read_lock_nesting = 0; > > The reordered code by machine > ----------------------------- > rcu_read_lock(); > ... > /* Experdite */ > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true); > ... > __rcu_read_unlock(); > if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) > rcu_read_unlock_special(t); > t->rcu_read_lock_nesting = 0; <--- LOOK AT THIS!!! > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false); > rcu_preempt_deferred_qs_irqrestore(t, flags); > barrier(); /* ->rcu_read_unlock_special load before assign */ > > An interrupt happens > -------------------- > rcu_read_lock(); > ... > /* Experdite */ > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true); > ... > __rcu_read_unlock(); > if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) > rcu_read_unlock_special(t); > t->rcu_read_lock_nesting = 0; <--- LOOK AT THIS!!! > <--- Handle an (any) irq > rcu_read_lock(); > /* This call should be skipped */ > rcu_read_unlock_special(t); Wait.. I got a little bit confused on recordering. This 'STORE rcu_read_lock_nesting = 0' can happen before 'STORE rcu_read_unlock_special.b.exp_hint = false' regardless of the order a compiler generated to by the barrier(), because anyway they are independent so it's within an arch's right. Then.. is this scenario possible? Or all archs properly deal with interrupts across this kind of reordering? Thanks, Byungchul > WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false); > rcu_preempt_deferred_qs_irqrestore(t, flags); > barrier(); /* ->rcu_read_unlock_special load before assign */ > > We don't have to handle the special thing twice like this which is one > reason to cause the problem even though another problem is of course to > call ttwu w/o being aware it's within a context holding pi lock. > > Apart from the discussion about how to avoid ttwu in an improper > condition, I think the following is necessary. I may have something > missing. It would be appreciated if you let me know in case I'm wrong. > > Anyway, logically I think we should prevent reordering between > t->rcu_read_lock_nesting and t->rcu_read_unlock_special.b.exp_hint not > only by compiler but also by machine like the below. > > Do I miss something? > > Thanks, > Byungchul > > ---8<--- > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > index 3c8444e..9b137f1 100644 > --- a/kernel/rcu/tree_plugin.h > +++ b/kernel/rcu/tree_plugin.h > @@ -412,7 +412,13 @@ void __rcu_read_unlock(void) > barrier(); /* assign before ->rcu_read_unlock_special load */ > if (unlikely(READ_ONCE(t->rcu_read_unlock_special.s))) > rcu_read_unlock_special(t); > - barrier(); /* ->rcu_read_unlock_special load before assign */ > + /* > + * Prevent reordering between clearing > + * t->rcu_reak_unlock_special in > + * rcu_read_unlock_special() and the following > + * assignment to t->rcu_read_lock_nesting. > + */ > + smp_wmb(); > t->rcu_read_lock_nesting = 0; > } > if (IS_ENABLED(CONFIG_PROVE_LOCKING)) { > >