Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp982169ybk; Wed, 20 May 2020 17:42:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxIb6HaDHKDjhqE4YwETeJoWGzV1Rv1pJpFsNNr5cIvzBP0FSuHBSHxRSKTmRoc8TFTJfPB X-Received: by 2002:a17:906:2b8a:: with SMTP id m10mr1467099ejg.183.1590021753053; Wed, 20 May 2020 17:42:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590021753; cv=none; d=google.com; s=arc-20160816; b=pwTEpxWdRKPpWEiSnQCUeUoKHMPWvu4vnLDjySB4s46o4y52ZYqo8Erdy9PCMoAz5b b6w3T/3nc7h+4hoaNzOsWGdQBHnp3xE26pT1GAz0SV0ejjA6ElraebjZZfUXk1zhskB0 HsqubEsWylVzSvlgbX4LeC/3XdsB9t4FaYiXG3s/3NRvKWnbVqLFsUBoeysN22sRXk0N OZkSVhf934c07Z6rjPbJtsraoIb7GksSFFVaSn++0HhPNMSlT4HAdT4PuCxBFwKElcfJ P0C9HiLKcYIdC104daB2aXOGxRR1S0pRrduMvvLCUQI53tUujzaIPnAqn2LNP67OH3tj zREg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=FspHysl/6hnUxnAHwoIGL7KXa2xOU/i/j/MWrubO3ic=; b=DqpgpLr/BGDS6R6ZdQ2la2MOa+Wyg1jgKppwJ4K0PR46dT/0Pn9m8Hk8N/U3Dvr94y 8botZTwJqe8FK6OSfcia4Htoxj19e8jbfebRmX2/AZmrJSoAipzpax8FK3wyW0EduNyE Z9TjIWzt/O9XVlAwH67Av0cAWGzfZBHLur2ppk9BEFvdq3j7Uv2vSx17wgmjZJZM8nD9 dws44uNglOiSBXX8RzukAQmjxRYO4n1Q0EuIXUViOKKn+IHtRiYt/b3tcmHJ3TW8q8Un sff6DDaPN/QKhd3mkspIbu5E1HH2PuEXPWlb6c/gSmUPb0S0y4IcnN2ABLdxMgaPvixf MmxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ZzSQm07E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e1si2383281ejr.438.2020.05.20.17.42.10; Wed, 20 May 2020 17:42:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=ZzSQm07E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726936AbgEUAkj (ORCPT + 99 others); Wed, 20 May 2020 20:40:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:60028 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726819AbgEUAki (ORCPT ); Wed, 20 May 2020 20:40:38 -0400 Received: from localhost (lfbn-ncy-1-985-231.w90-101.abo.wanadoo.fr [90.101.63.231]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2D58C20756; Thu, 21 May 2020 00:40:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1590021638; bh=bkp5EojXuJdRFg/NUpaj7S7uzkKntoSInnR8U3HN/1k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ZzSQm07Earmg9k84H+0M7AKm1RWBEh2THpEzki+xqIevfjuKMw7BLGjy9vkIM6L3g vg/eLkxzZ/4rmYAZK1T4/q88BdbFepb0OdiXhAOmP35tajLFhTeBIQ4CkbuZ76ZZBK 5iX1DKP2/WID6qyIQvZRaZTC5kcg0Ck+oRwhUOP0= Date: Thu, 21 May 2020 02:40:36 +0200 From: Frederic Weisbecker To: Peter Zijlstra Cc: Qian Cai , "Paul E. McKenney" , Linux Kernel Mailing List , Thomas Gleixner , Michael Ellerman , linuxppc-dev , Borislav Petkov Subject: Re: Endless soft-lockups for compiling workload since next-20200519 Message-ID: <20200521004035.GA15455@lenoir> References: <20200520125056.GC325280@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200520125056.GC325280@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote: > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote: > > Just a head up. Repeatedly compiling kernels for a while would trigger > > endless soft-lockups since next-20200519 on both x86_64 and powerpc. > > .config are in, > > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've > not seen anything like that myself. Let me go have a look. > > > In as far as the logs are readable (they're a wrapped mess, please don't > do that!), they contain very little useful, as is typical with IPIs :/ > > > [ 1167.993773][ C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127 > > flush_smp_call_function_queue+0x1fa/0x2e0 So I've tried to think of a race that could produce that and here is the only thing I could come up with. It's a bit complicated unfortunately: CPU 0 CPU 1 ----- ----- tick { trigger_load_balance() { raise_softirq(SCHED_SOFTIRQ); //but nohz_flags(0) = 0 } kick_ilb() { atomic_fetch_or(...., nohz_flags(0)) softirq() { #VMEXIT or anything that could stop a CPU for a while run_rebalance_domain() { nohz_idle_balance() { atomic_andnot(NOHZ_KICK_MASK, nohz_flag(0)) } } } } // schedule nohz_newidle_balance() { kick_ilb() { // pick current CPU atomic_fetch_or(...., nohz_flags(0)) #VMENTER smp_call_function_single_async() { smp_call_function_single_async() { // verified csd->flags != CSD_LOCK // verified csd->flags != CSD_LOCK csd->flags = CSD_LOCK csd->flags = CSD_LOCK //execute in place //queue and send IPI csd->flags = 0 nohz_csd_func() } } } IPI�{ flush_smp_call_function_queue() { csd_unlock() { WARN_ON(csd->flags != CSD_LOCK) <---------!!!!! The root cause here would be that trigger_load_balance() unconditionally raise the softirq. And I have to confess I'm not clear why since the softirq is essentially a no-op when nohz_flags() is 0. Thanks.