Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1901446ybk; Thu, 21 May 2020 19:03:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzlPjZab2NvqzLC2aoBpXUQPIjuRMjAVuuCFxiGXF7E6wCQs4z2zQsS92rrBDDVaPK/+Lep X-Received: by 2002:a17:906:bce3:: with SMTP id op3mr5784376ejb.230.1590112999311; Thu, 21 May 2020 19:03:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590112999; cv=none; d=google.com; s=arc-20160816; b=WLBtrEErQ4nhSmSzZWB4iC/X5fC9ET2EqPPZUnM5CIWXAeGAWdT2R0phEfpzZc3rYD MCjt/MHtu+nbO7vbEhFlWKblWF4lB0sCd7+ufP7FTPC4eVB3kmzZe8/ED/KboGBurG31 8+Q2J1BdjbDURRGtsPbx43w3PwTq/s28O/flS9ERpBFMOjTYoH/8qlU2teI12onIifPf 2TGYEoUXyhTnKflalw5u8D8ZsLp8vloZz3mMnvgw0+gzj6AEPE8uMBu2IRkBnEoR6OyM /xfAuXI8NxlFmlZWf4PlR6wCxuxA3Dj58EQjqQqsIfNIiE3UgSc2wNJZl2u0DfjM/HE4 Lj5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=7VXhRvEZ7AZfj1IqAb7ZWt33FE0XL8kfxx31Yf9Ecuw=; b=D0NACWrMicymR5aDn8UUpSpGDsmI2luAYgyoEC4m2PKDMs3kPzHHti6jiBBg1az/eT e/T+oEI0h2CPiTp41T63CoH9ic4n0M2xboP9kjzawlTCt6BJY5gBZ/TDOCMxOWZNSqmA 9Dhs3kpKQ0EZKaB69DUuJfqKyCvWm/Si74PVLe1a76ihIQ4jdgkP1BlBIpZMNsU//ihe 5kYYKyscbpbgp7BS229EHc4ydv74067xzB2QekBUhGlVF/R1GafjGtuY4AEEU2AC+C51 D9VrcT+ixS4gyB7GV/6ffQANfRKRdGYQK6sDqB+o3AiYHp3Gzu8XbR1eTWRGcL2OnNTL +0wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=WPC8+ucV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id oy4si4115677ejb.433.2020.05.21.19.02.56; Thu, 21 May 2020 19:03:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@lca.pw header.s=google header.b=WPC8+ucV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727954AbgEVCAq (ORCPT + 99 others); Thu, 21 May 2020 22:00:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50344 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726335AbgEVCAo (ORCPT ); Thu, 21 May 2020 22:00:44 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D1C4C061A0E for ; Thu, 21 May 2020 19:00:44 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id g185so9343281qke.7 for ; Thu, 21 May 2020 19:00:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=7VXhRvEZ7AZfj1IqAb7ZWt33FE0XL8kfxx31Yf9Ecuw=; b=WPC8+ucVfXY+1tpUgjR07S2UMMPw8h0/8LAGLBS8Znu29ayxkCmGkQpm2rNlHp681Y iEFsGHCj5SpuQl0hdoK/kk1vRohCokt3UVU3OzqsF9AVFV7syE3vg5b9x7AtbqhMBOFh 9VEPHONSG6IIavPXa3Wew7CX9MQHd+PzBOqQTKrZ2ej1Ok4DN3jqMad6rxuXVfI+tVvY VXONL7CzMbE6YarfqQ5be2TdAMby/zZXM2bF4ErdDFiS7uZMST40ntdFNm6UnAmd11qW YEWD8OXyEjMIgKs0hTUa13Sm+ogq/8Y9p1nmkbUBzUBUQr0L6zcqsHdfPakA4caM8sR5 /AvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7VXhRvEZ7AZfj1IqAb7ZWt33FE0XL8kfxx31Yf9Ecuw=; b=hkDa0LCNaENfRj5Jey+4Q/Amc3IW7V5uQMEajTPUqOsqJ0VXARExYOJcaIq8pJz53b aQmVPp9Vw2JD3DJ8riQfXNRMbZZH8obEds4PhqIGL7Xq70zkw8PPqgzeJ+cOOAEHhT8a SctsBSBSUqtBZftOR1tEtMo5O48mY3CFOIR85uuxfLwvFJ0qreKMFw1IxsfsBI+QLvga 0kyyIIQ3QQpbwsw6zmR7VsBUMu3dJ7YvMbfFZ8UIoqn71Yqt04jiZDr270xSFB8lbK2Y Xm6xjymApAzuhvWH0yKQi37KlieEZI6MnLLVBehe5wbtTAQ70xnG7WMqtMmuHg17Aepf WPHw== X-Gm-Message-State: AOAM530krcYxecwwuzOdyA2V8Ml9xun5rOrBsntRd1F2Sbd6MzEhKG9s bQXJo+DSOIeYyE/lOM/DoEAA1FcKuElvkw== X-Received: by 2002:a05:620a:a53:: with SMTP id j19mr13148335qka.183.1590112843194; Thu, 21 May 2020 19:00:43 -0700 (PDT) Received: from ovpn-112-192.phx2.redhat.com (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id s201sm949467qka.8.2020.05.21.19.00.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 May 2020 19:00:42 -0700 (PDT) Date: Thu, 21 May 2020 22:00:33 -0400 From: Qian Cai To: Peter Zijlstra Cc: Frederic Weisbecker , "Paul E. McKenney" , Linux Kernel Mailing List , Thomas Gleixner , Michael Ellerman , linuxppc-dev , Borislav Petkov Subject: Re: Endless soft-lockups for compiling workload since next-20200519 Message-ID: <20200522020033.GA17093@ovpn-112-192.phx2.redhat.com> References: <20200520125056.GC325280@hirez.programming.kicks-ass.net> <20200521004035.GA15455@lenoir> <20200521093938.GG325280@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200521093938.GG325280@hirez.programming.kicks-ass.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 21, 2020 at 11:39:38AM +0200, Peter Zijlstra wrote: > On Thu, May 21, 2020 at 02:40:36AM +0200, Frederic Weisbecker wrote: > > On Wed, May 20, 2020 at 02:50:56PM +0200, Peter Zijlstra wrote: > > > On Tue, May 19, 2020 at 11:58:17PM -0400, Qian Cai wrote: > > > > Just a head up. Repeatedly compiling kernels for a while would trigger > > > > endless soft-lockups since next-20200519 on both x86_64 and powerpc. > > > > .config are in, > > > > > > Could be 90b5363acd47 ("sched: Clean up scheduler_ipi()"), although I've > > > not seen anything like that myself. Let me go have a look. > > > > > > > > > In as far as the logs are readable (they're a wrapped mess, please don't > > > do that!), they contain very little useful, as is typical with IPIs :/ > > > > > > > [ 1167.993773][ C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127 > > > > flush_smp_call_function_queue+0x1fa/0x2e0 > > > > So I've tried to think of a race that could produce that and here is > > the only thing I could come up with. It's a bit complicated unfortunately: > > This: > > > smp_call_function_single_async() { smp_call_function_single_async() { > > // verified csd->flags != CSD_LOCK // verified csd->flags != CSD_LOCK > > csd->flags = CSD_LOCK csd->flags = CSD_LOCK > > concurrent smp_call_function_single_async() using the same csd is what > I'm looking at as well. Now in the ILB case there is an easy cure: > > (because there is only a single ilb target) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 01f94cf52783..b6d8a7b991f0 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10033,7 +10033,7 @@ static void kick_ilb(unsigned int flags) > * is idle. And the softirq performing nohz idle load balance > * will be run before returning from the IPI. > */ > - smp_call_function_single_async(ilb_cpu, &cpu_rq(ilb_cpu)->nohz_csd); > + smp_call_function_single_async(ilb_cpu, &this_rq()->nohz_csd); > } > > /* > > Qian, can you give that a spin? Running for a few hours now. It works fine.