Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp3551522pxy; Mon, 26 Apr 2021 04:36:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyJMteIILD1LkiIS7nm4ZsSfLpQh9JIh/LtNUgj6tbZU3Am3KmnmarCMWXsToNvMqQP01W1 X-Received: by 2002:a17:902:b28b:b029:ed:19aa:5dec with SMTP id u11-20020a170902b28bb02900ed19aa5decmr7351051plr.78.1619436981278; Mon, 26 Apr 2021 04:36:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619436981; cv=none; d=google.com; s=arc-20160816; b=DORtAEAppqzBIfvZDeBhcGVqOqFrrNRkx9+avPQV2WMHj2U8dL8iZRtAcH/v6wG+Hm A1o/DRvFUNu5fTcCdZZJ2de3YZUKF0yIK/GEDXWL45xpG+WO0dJ0ymeblpyfMb9NTc8t HttQD1xtC7/SuvzkfSupAPLrTYPYWA9wa4u6JZwWqWvqbvCRDsKd4YrGc/2mPEDJ3dmg ThzO5qTBiItZ9EGJe/JCF3Hbcody4Itl54Oz1x3D0EMjBxLCLaLDAbBnfoECzVKY6Pv9 JAddAJy/non3qUckKbsS9PCqBxFrGO6DlwA6/Rln8qTXO4pzOjIuW2P1tSkpXd/n8pnO fFqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=bY1CD9rn6fRaneMEdQrCnaZAJFmSjAP4KelNHSFtTx8=; b=xb65NnhC7ZWTSMx8MG0b0Lvwqq4CcLpx4aG+KcZLw+NTxWbIyuhh7feso9qi3CV4a4 DdHtFD8Pp5Y/DyZURBfMrXoZ3vKlvNNDxeKB+jfh20WxCCqSAKYaXigWtmJi1liyH+ve WwpKt3Q/3nsBAWR0UkY/FVrGVrH8CU00WHWI6t/jnuLGng6zNthh6CmxkCjIkDxXxHQB nZmYbQ1+Qbe7/CUwLcIraMyQJ8OlsiD17ZLUitdgE27Ou6IeUTMUk8/SPETL14kPQRFf tkwm0XKVH6BfvxpvGNQw0GiXZaHW82fisSU0cY8TM5WD8ofdw8I7YVpnlZ+SYe3hbBTc er+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t7si19419691plg.18.2021.04.26.04.36.08; Mon, 26 Apr 2021 04:36:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231862AbhDZLgR (ORCPT + 99 others); Mon, 26 Apr 2021 07:36:17 -0400 Received: from outbound-smtp44.blacknight.com ([46.22.136.52]:45247 "EHLO outbound-smtp44.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229554AbhDZLgR (ORCPT ); Mon, 26 Apr 2021 07:36:17 -0400 Received: from mail.blacknight.com (pemlinmail01.blacknight.ie [81.17.254.10]) by outbound-smtp44.blacknight.com (Postfix) with ESMTPS id EC017F850C for ; Mon, 26 Apr 2021 12:35:34 +0100 (IST) Received: (qmail 19014 invoked from network); 26 Apr 2021 11:35:34 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.17.248]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 26 Apr 2021 11:35:34 -0000 Date: Mon, 26 Apr 2021 12:35:33 +0100 From: Mel Gorman To: Srikar Dronamraju Cc: Ingo Molnar , Peter Zijlstra , LKML , Rik van Riel , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Dietmar Eggemann , Michael Ellerman , Gautham R Shenoy , Parth Shah Subject: Re: [PATCH 00/10] sched/fair: wake_affine improvements Message-ID: <20210426113533.GD4239@techsingularity.net> References: <20210422102326.35889-1-srikar@linux.vnet.ibm.com> <20210423082532.GA4239@techsingularity.net> <20210423103129.GH2633526@linux.vnet.ibm.com> <20210423123854.GC4239@techsingularity.net> <20210426103032.GI2633526@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20210426103032.GI2633526@linux.vnet.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 26, 2021 at 04:00:32PM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2021-04-23 13:38:55]: > > Hi Mel, > > > On Fri, Apr 23, 2021 at 04:01:29PM +0530, Srikar Dronamraju wrote: > > > > The series also oopses a *lot* and didn't get through a run of basic > > > > workloads on x86 on any of three machines. An example oops is > > > > > > > > > > Can you pass me your failing config. I am somehow not been seeing this > > > either on x86 or on Powerpc on multiple systems. > > > > The machines have since moved onto testing something else (Rik's patch > > for newidle) but the attached config should be close enough. > > > > > Also if possible cat /proc/schedstat and cat > > > /proc/sys/kernel/sched_domain/cpu0/domain*/name > > > > > > > For the vanilla kernel > > > > SMT > > MC > > NUMA > > I was able to reproduce the problem and analyze why it would panic in > cpus_share_cache. > > In my patch(es), we have code snippets like this. > > if (tsds->idle_core != -1) { > if (cpumask_test_cpu(tsds->idle_core, p->cpus_ptr)) > return tsds->idle_core; > return this_cpu; > } > > Here when we tested the idle_core and cpumask_test_cpu, > tsds->idle_core may not have been -1; However by the time it returns, > tsds->idle_core could be -1; > > cpus_share_cpus() then tries to find sd_llc_id for -1 and crashes. > > Its more easier to reproduce this on a machine with more cores in a > LLC than say a Power10/Power9. Hence we are hitting this more often > on x86. > > One way could be to save the idle_core to a local variable, but that > negates the whole purpose since we may end up choosing a busy CPU. I > will find a way to fix this problem. > As there is no locking that protects the variable, it's inherently race-prone. A READ_ONCE to a local variable may be your only choice -- Mel Gorman SUSE Labs