Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2949403ybi; Thu, 18 Jul 2019 17:56:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqxANZfWhxx4tYtS8nnz1f0oKu0aaOthuCWV/tOISwNjuYxxrE6KYaSTDeSL/DBn6DA8jmpH X-Received: by 2002:a17:90a:db08:: with SMTP id g8mr52526334pjv.39.1563497784074; Thu, 18 Jul 2019 17:56:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563497784; cv=none; d=google.com; s=arc-20160816; b=xbyT4MDAKTF/EvI9MerkyiWLUBTl88lBp8eXvpbteDisZzO4G4l6yhiK5Bjo2wPXxs yLF64NVyrLccxY65Hk+Cqh9yl88YZraTUxRmLecVpNzezZPQvSYXk2sDTEuf0OOMiHwB QK0YgMdGFt2UYeQ+mpEZFIAB9Kr+ZgMYp4VPMwebP1MXAyfGwRZRX9g5EDu+CDCq3ZyM sx7+VBDXEHFtVfjKKu4reLghSic6KPYNdbcbuAtGE+gL+6VTYfPxt6YcQynpIKu1nUQx nOog4upXHphfk0iw8gfxVF3slga+hHEZdMrKUa8p5RzLmcTwNWLqw6yr1I50MafYsCR1 VX6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Njs67UhTffM5wI5Db6LNWPkjioqgxSfvji14HbmLGV0=; b=sBIfI/C3Xys8w9j+bg+3TTYTmDm2EUwz8c9LzcXw97/DFbZ3/tNlGgMmr0wZHi/M5S Jb/2P8Oh8EY9Ri+FnDex5T8EYOW+UabIhX5Xt7aU2mVBFUflBfyKTuuGr6QWI49Mwz6H pkteLBZ2ATYGTf6vryhdyNBmwSdq9zFY1Am3CqyusJ+pdHZfES7rNrUsiB8k2nH467Dz XJ0ELrvzcjX/fqAhqdoFhzZ52uFOD9jXHzhiWiogXCWn719JvaW3zSfn4mJnlgXVVL0c XknXP6l4X0j3HYsaE/aTEHvxFV1gXErk5pOY58db+5i7Ac/sw47GZADuEedHCRG0xsB9 SmOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j18si2126331pjn.42.2019.07.18.17.56.08; Thu, 18 Jul 2019 17:56:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726545AbfGSAzK (ORCPT + 99 others); Thu, 18 Jul 2019 20:55:10 -0400 Received: from lgeamrelo12.lge.com ([156.147.23.52]:58056 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726243AbfGSAzK (ORCPT ); Thu, 18 Jul 2019 20:55:10 -0400 Received: from unknown (HELO lgeamrelo02.lge.com) (156.147.1.126) by 156.147.23.52 with ESMTP; 19 Jul 2019 09:55:07 +0900 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO X58A-UD3R) (10.177.222.33) by 156.147.1.126 with ESMTP; 19 Jul 2019 09:55:07 +0900 X-Original-SENDERIP: 10.177.222.33 X-Original-MAILFROM: byungchul.park@lge.com Date: Fri, 19 Jul 2019 09:54:03 +0900 From: Byungchul Park To: "Paul E. McKenney" Cc: Joel Fernandes , Byungchul Park , rcu , LKML , kernel-team@lge.com Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable Message-ID: <20190719005403.GB28226@X58A-UD3R> References: <20190711195839.GA163275@google.com> <20190712063240.GD7702@X58A-UD3R> <20190712125116.GB92297@google.com> <20190713151330.GE26519@linux.ibm.com> <20190713154257.GE133650@google.com> <20190713174111.GG26519@linux.ibm.com> <20190718213419.GV14271@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190718213419.GV14271@linux.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 18, 2019 at 02:34:19PM -0700, Paul E. McKenney wrote: > On Thu, Jul 18, 2019 at 12:14:22PM -0400, Joel Fernandes wrote: > > Trimming the list a bit to keep my noise level low, > > > > On Sat, Jul 13, 2019 at 1:41 PM Paul E. McKenney wrote: > > [snip] > > > > It still feels like you guys are hyperfocusing on this one particular > > > > > knob. I instead need you to look at the interrelating knobs as a group. > > > > > > > > Thanks for the hints, we'll do that. > > > > > > > > > On the debugging side, suppose someone gives you an RCU bug report. > > > > > What information will you need? How can you best get that information > > > > > without excessive numbers of over-and-back interactions with the guy > > > > > reporting the bug? As part of this last question, what information is > > > > > normally supplied with the bug? Alternatively, what information are > > > > > bug reporters normally expected to provide when asked? > > > > > > > > I suppose I could dig out some of our Android bug reports of the past where > > > > there were RCU issues but if there's any fires you are currently fighting do > > > > send it our way as debugging homework ;-) > > > > > > Suppose that you were getting RCU CPU stall > > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread(). > > > Of course, this really means that some other CPU/task is holding up > > > multi_cpu_stop() without also blocking the current grace period. > > > > > > > So I took a shot at this trying to learn how CPU stoppers work in > > relation to this problem. > > > > I am assuming here say CPU X has entered MULTI_STOP_DISABLE_IRQ state > > in multi_cpu_stop() but another CPU Y has not yet entered this state. > > So CPU X is stalling RCU but it is really because of CPU Y. Now in the > > problem statement, you mentioned CPU Y is not holding up the grace > > period, which means Y doesn't have any of IRQ, BH or preemption > > disabled ; but is still somehow stalling RCU indirectly by troubling > > X. > > > > This can only happen if : > > - CPU Y has a thread executing on it that is higher priority than CPU > > X's stopper thread which prevents it from getting scheduled. - but the > > CPU stopper thread (migration/..) is highest priority RT so this would > > be some kind of an odd scheduler bug. > > - There is a bug in the CPU stopper machinery itself preventing it > > from scheduling the stopper on Y. Even though Y is not holding up the > > grace period. > > - CPU Y might have already passed through its quiescent state for > the current grace period, then disabled IRQs indefinitely. Or for a longer time than the period that rcu considers as a stall. Or preemption disabled for that long time. Or the stopper on Y even has yet to be woken up inside scheduler because of any reasons but maybe locks. > Now, CPU Y would block a later grace period, but CPU X is > preventing the current grace period from ending, so no such > later grace period can start. > > > Did I get that right? Would be exciting to run the rcutorture test > > once Paul has it available to reproduce this problem. > > Working on it! Slow, I know! > > Thanx, Paul