Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp134093ybd; Fri, 28 Jun 2019 16:13:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqwvWHeOpyvHT0IZcVvvvxPjddyRHiNYnU5zVkIpfMpWvxG6J3uVgqzhaG/Tl8uH6BAr0QkO X-Received: by 2002:a17:902:4e25:: with SMTP id f34mr14339143ple.305.1561763593582; Fri, 28 Jun 2019 16:13:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561763593; cv=none; d=google.com; s=arc-20160816; b=UGbViBd9CdyqMdnStKEsISlk4BcojVGPyfyu/soMISyt/Hs2TXaJgOX27n2vv+NKG9 BRIFR5DVS1uWlRzFVDkiz50BI28wYSPRSYnYV6IJNhbO7RF4hloLgLlW2Hz+C6iXUE9g tsGOU6Pq++pbk3K3fayHphhuAVQ/9LVx+G1QHT6dVJ5r2q5t3XJ63e4hi2Y6NoHmGK5u c91ySqsiWwcnb4pXG4ZPDHvOOyK+M4jRt9Z5UeApBMzyY/nBoNCR2Slz//obXJkBFDho MxVzyjJAptsyu+mLkjHvhsh7kAcnfZ77vCsXW9X4MLWOeP5ZdHV6EjFhrwgRk8fd0Umf z9/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=CpH/P5qpJBU807JBOnNWmYowkfB91BubJ9QNuL0+L5Y=; b=imFE9r80uq6ICG1mEMkQpurcvPgZsBQOM38c0yj+bUReMUVwccc+l6oxZbdTHBDWwX nndAPeDiLHnDhgke1xO19So02UcJyhYmSq8ENdpihb+oqt2a72Kai7gVPdvBBdc+Fela LPBTzWkaNQ4Ki+FjMm9P18jPeY9jevnuCQcX26XPbpci7XFgMI5V4le6MPxO9E51ALb/ I0dUNgEhPc5R3uAh3YEUA9LeB8FDjoaMoaR2MFcZzM6Wx0GvpHcIzyCT54GJ0GvpZvDj Je/O6D40xf1q/ZGlv2lRVz54/y2JVeasg2xMmN3lWGyRgubMTfkRd2+1rUbftx0+lGoN pehg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=raHyMqze; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t2si3014496pgp.343.2019.06.28.16.12.55; Fri, 28 Jun 2019 16:13:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=raHyMqze; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726867AbfF1XMp (ORCPT + 99 others); Fri, 28 Jun 2019 19:12:45 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:45070 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726631AbfF1XMo (ORCPT ); Fri, 28 Jun 2019 19:12:44 -0400 Received: by mail-pg1-f196.google.com with SMTP id z19so3218301pgl.12 for ; Fri, 28 Jun 2019 16:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=CpH/P5qpJBU807JBOnNWmYowkfB91BubJ9QNuL0+L5Y=; b=raHyMqze832w7AW5zkJ3nEt8ucR7TpvNzWDqFaSeNVwvcpCW7ADqi2e/cfVvAqlYoM RxC4s2ZraxA+W4cHWn8Ok/jIIdiIPS9UnD9USoGJ5m5UxgqxdZ1cywUQHXtEVfQ7BSMQ woCAhNqlYvSGBak3CyMSUwsLNvsszWvhyaLnk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=CpH/P5qpJBU807JBOnNWmYowkfB91BubJ9QNuL0+L5Y=; b=tNYLK5lP2ungQ7cBoLeFW7xPBAOvJE6BSPkRGMVe+89xfJzb841secLU8GyswrgSO1 FUXg16Mah6FjViH14fs26e2MITcfnbRo4kcAod85FePHbtl5a3INb77MdUdBtOCXXOEf TzdFKLG4P4NSahQKYs+rsO0nvgKstwIfgG/a61k2AIpC8D636vqaObnf7VeT7YpstWRM EwMLMxqsB3KCX9toBMhLVjsJlc7mHS8VzUf4q9WolR4bU4QEmmC5rdWQU21nlOChuume flrmuLdF8mfEt66Jr4emHrbbMSI7Ps36jCemi0rhVEh+qokf4MuUOqDaH5ArjccKAz7c 9nLA== X-Gm-Message-State: APjAAAVJ1Mv6LCVj081gSMDBjyCkBEydwKbp6R2nTogpPVuu8QVQ61dF G3+pRRG9YkYvx334Vtbg3nkXDg== X-Received: by 2002:a17:90a:1a0d:: with SMTP id 13mr15238953pjk.99.1561763563935; Fri, 28 Jun 2019 16:12:43 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id j13sm3272068pfh.13.2019.06.28.16.12.42 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Fri, 28 Jun 2019 16:12:42 -0700 (PDT) Date: Fri, 28 Jun 2019 19:12:41 -0400 From: Joel Fernandes To: "Paul E. McKenney" Cc: Sebastian Andrzej Siewior , Peter Zijlstra , Steven Rostedt , rcu , LKML , Thomas Gleixner , Ingo Molnar , Josh Triplett , Mathieu Desnoyers , Lai Jiangshan Subject: Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs Message-ID: <20190628231241.GA9243@google.com> References: <20190627181638.GA209455@google.com> <20190627184107.GA26519@linux.ibm.com> <20190628135433.GE3402@hirez.programming.kicks-ass.net> <20190628153050.GU26519@linux.ibm.com> <20190628184026.fds6scgi2pnjnc5p@linutronix.de> <20190628185219.GA26519@linux.ibm.com> <20190628192407.GA89956@google.com> <20190628200423.GB26519@linux.ibm.com> <20190628214018.GB249127@google.com> <20190628222547.GE26519@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190628222547.GE26519@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 28, 2019 at 03:25:47PM -0700, Paul E. McKenney wrote: > On Fri, Jun 28, 2019 at 05:40:18PM -0400, Joel Fernandes wrote: > > Hi Paul, > > > > On Fri, Jun 28, 2019 at 01:04:23PM -0700, Paul E. McKenney wrote: > > [snip] > > > > > > Commit > > > > > > - 23634ebc1d946 ("rcu: Check for wakeup-safe conditions in > > > > > > rcu_read_unlock_special()") does not trigger the bug within 94 > > > > > > attempts. > > > > > > > > > > > > - 48d07c04b4cc1 ("rcu: Enable elimination of Tree-RCU softirq > > > > > > processing") needed 12 attempts to trigger the bug. > > > > > > > > > > That matches my belief that 23634ebc1d946 ("rcu: Check for wakeup-safe > > > > > conditions in rcu_read_unlock_special()") will at least greatly decrease > > > > > the probability of this bug occurring. > > > > > > > > I was just typing a reply that I can't reproduce it with: > > > > rcu: Check for wakeup-safe conditions in rcu_read_unlock_special() > > > > > > > > I am trying to revert enough of this patch to see what would break things, > > > > however I think a better exercise might be to understand more what the patch > > > > does why it fixes things in the first place ;-) It is probably the > > > > deferred_qs thing. > > > > > > The deferred_qs flag is part of it! Looking forward to hearing what > > > you come up with as being the critical piece of this commit. > > > > The new deferred_qs flag indeed saves the machine from the dead-lock. > > > > If we don't want the deferred_qs, then the below patch also fixes the issue. > > However, I am more sure than not that it does not handle all cases (such as > > what if we previously had an expedited grace period IPI in a previous reader > > section and had to to defer processing. Then it seems a similar deadlock > > would present. But anyway, the below patch does fix it for me! It is based on > > your -rcu tree commit 23634ebc1d946f19eb112d4455c1d84948875e31 (rcu: Check > > for wakeup-safe conditions in rcu_read_unlock_special()). > > The point here being that you rely on .b.blocked rather than > .b.deferred_qs. Hmmm... There are a number of places that check all > the bits via the .s leg of the rcu_special union. The .s check in > rcu_preempt_need_deferred_qs() should be OK because it is conditioned > on t->rcu_read_lock_nesting of zero or negative. > Do rest of those also work out OK? > > It would be nice to remove the flag, but doing so clearly needs careful > review and testing. Agreed. I am planning to do an audit of this code within the next couple of weeks so I will be on the look out for any optimization opportunities related to this. Will let you know if this can work. For now I like your patch better because it is more conservative and doesn't cause any space overhead. If you'd like, please free to included my Tested-by on it: Tested-by: Joel Fernandes (Google) If you had a chance, could you also point to me any tests that show performance improvement with the irqwork patch, on the expedited GP usecase? I'd like to try it out as well. I guess rcuperf should have some? thanks! - Joel