Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp8713759ybi; Tue, 23 Jul 2019 13:42:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqzaa/DM9ceLzffHo947ZwQcSm9cHag/4/KOxLRpepITtZSUsS2htSCoIun7VFRHHzAJqiUI X-Received: by 2002:a17:90a:4f0e:: with SMTP id p14mr81283509pjh.40.1563914571433; Tue, 23 Jul 2019 13:42:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563914571; cv=none; d=google.com; s=arc-20160816; b=1CMh+E1szW4lNu/gb3kF/v6ZhyLZmZszPz/uiMpq7qG6T6j2QVo3QDGIZGc/duV1vt Ub6dTc3pDSdpKyi+Q6KWtHQuZh+qW8fQHuQbH+CPrOjmnhKrKC5bn+I0h/LNPfUWqRqF 4iDxbk9pxNgOzalmwHJHocPtifDKU0DrdavnWMDG5kap5bJotS6YB0u9MzQA/TNz8cDP y+65JelE8UH5kGrwvVpsPWWXU2ol3YVd3Ap08//V8J4jVpWM47BpVBN4bD8XOxqIMdbS JLcJh2oJ410gvHLx89miobEKMewyJFkicsSA1kOxN4RhtP/DcILZ2Us3FWswB+PgCWAX YMSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=cs9vpS4y2qhqnmhwcqRtwglzMjruGwobbv046/wiKps=; b=s80Z3DPHNflf7dpnChThmZ2DWLsB7to7MX0Ae2lwlAB9hpwX3OfMeXnQyy09rM466w Lf4Jt+npkJYnjWdwcY0SkAzQqjeIFpGEER3WafYv1E3JAfIFkR22cse4KaQKxQ6e7+WR dXBnXpY0Kh62FQVsjVglsPukane4FiJv5gHTrSxSOptAMJRpMUEstah2IwJgpAlwtooi TmBFFa5cP8OdzTX0br0635hjAvOhwz2OSWT6ZYmBrYWncr//oJz2IFNBHyheommWEpVe OozDu7aWAw/Kf/pWNZk8y5Bgzk0tGJrH6S1saM5M/5ZEWXFAUOPyIIthTh5lAWit4ToM V//w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x3si10493478plv.26.2019.07.23.13.42.36; Tue, 23 Jul 2019 13:42:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733266AbfGWLGc (ORCPT + 99 others); Tue, 23 Jul 2019 07:06:32 -0400 Received: from lgeamrelo11.lge.com ([156.147.23.51]:57998 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730449AbfGWLGc (ORCPT ); Tue, 23 Jul 2019 07:06:32 -0400 Received: from unknown (HELO lgemrelse7q.lge.com) (156.147.1.151) by 156.147.23.51 with ESMTP; 23 Jul 2019 20:06:29 +0900 X-Original-SENDERIP: 156.147.1.151 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO X58A-UD3R) (10.177.222.33) by 156.147.1.151 with ESMTP; 23 Jul 2019 20:06:29 +0900 X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Tue, 23 Jul 2019 20:05:21 +0900 From: Byungchul Park To: Joel Fernandes Cc: "Paul E. McKenney" , Byungchul Park , rcu , LKML , kernel-team@lge.com Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable Message-ID: <20190723110521.GA28883@X58A-UD3R> References: <20190713151330.GE26519@linux.ibm.com> <20190713154257.GE133650@google.com> <20190713174111.GG26519@linux.ibm.com> <20190719003942.GA28226@X58A-UD3R> <20190719074329.GY14271@linux.ibm.com> <20190719195728.GF14271@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 19, 2019 at 04:33:56PM -0400, Joel Fernandes wrote: > On Fri, Jul 19, 2019 at 3:57 PM Paul E. McKenney wrote: > > > > On Fri, Jul 19, 2019 at 06:57:58PM +0900, Byungchul Park wrote: > > > On Fri, Jul 19, 2019 at 4:43 PM Paul E. McKenney wrote: > > > > > > > > On Thu, Jul 18, 2019 at 08:52:52PM -0400, Joel Fernandes wrote: > > > > > On Thu, Jul 18, 2019 at 8:40 PM Byungchul Park wrote: > > > > > [snip] > > > > > > > - There is a bug in the CPU stopper machinery itself preventing it > > > > > > > from scheduling the stopper on Y. Even though Y is not holding up the > > > > > > > grace period. > > > > > > > > > > > > Or any thread on Y is busy with preemption/irq disabled preventing the > > > > > > stopper from being scheduled on Y. > > > > > > > > > > > > Or something is stuck in ttwu() to wake up the stopper on Y due to any > > > > > > scheduler locks such as pi_lock or rq->lock or something. > > > > > > > > > > > > I think what you mentioned can happen easily. > > > > > > > > > > > > Basically we would need information about preemption/irq disabled > > > > > > sections on Y and scheduler's current activity on every cpu at that time. > > > > > > > > > > I think all that's needed is an NMI backtrace on all CPUs. An ARM we > > > > > don't have NMI solutions and only IPI or interrupt based backtrace > > > > > works which should at least catch and the preempt disable and softirq > > > > > disable cases. > > > > > > > > True, though people with systems having hundreds of CPUs might not > > > > thank you for forcing an NMI backtrace on each of them. Is it possible > > > > to NMI only the ones that are holding up the CPU stopper? > > > > > > What a good idea! I think it's possible! > > > > > > But we need to think about the case NMI doesn't work when the > > > holding-up was caused by IRQ disabled. > > > > > > Though it's just around the corner of weekend, I will keep thinking > > > on it during weekend! > > > > Very good! > > Me too will think more about it ;-) Agreed with point about 100s of > CPUs usecase, > > Thanks, have a great weekend, BTW, if there's any long code section with irq/preemption disabled, then the problem would be not only about RCU stall. And we can also use latency tracer or something to detect the bad situation. So in this case, sending ipi/nmi to the CPUs where the stoppers cannot to be scheduled does not give us additional meaningful information. I think Paul started to think about this to solve some real problem. I seriously love to help RCU and it's my pleasure to dig deep into kind of RCU stuff, but I've yet to define exactly what problem is. Sorry. Could you share the real issue? I think you don't have to reproduce it. Just sharing the issue that you got inspired from is enough. Then I might be able to develop 'how' with Joel! :-) It's our pleasure! Thanks, Byungchul