Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp70275pxy; Tue, 27 Apr 2021 23:06:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbHBdbtDssn9Q+WB7ffWirEeGuZDoDGLepiUlMLH5KtP4OC49av/Hq6voY/gYFVd2tzHFA X-Received: by 2002:a17:90a:5b0a:: with SMTP id o10mr2160238pji.82.1619589974605; Tue, 27 Apr 2021 23:06:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619589974; cv=none; d=google.com; s=arc-20160816; b=HNSFVaYeU8PmP1xIJMnVYxH8md25Fow9pnWw+v4PcTBisfvTwwXk/WHKG/xUW9LcnG WQejHKpsQRDVJRStUwzWT0O83jD0c4cKNKazoXO53jz9d/IbRRg+ZX5WuxhwK7ypvRFH yr1pU1bZXodLrw5v5QhjUM2+XNSzHYFWiAJU0aplef26SDpL8iifPfbIiL8DZ99sj0zL 3AISjRxEeveoom7sTXvUU+jsc8dd31L7bsVckJ/L78ENP6PalLBm66C+r2MSQsfyUlVk lVp298i5L6RfWFttNVcO2adCr8Ne7hi8j9phte8RAEXmdHX0VBaj8ypvXyDNWpIWx1tp 8hzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=oGnv0AfB5GwYZ14VQZF8F6KLLnB1z/8Ri0i2OmH79GA=; b=u7mmtDGUySBHy6a1avFhNm6Y0pzQrXcSqORU5fG0L00Dugo+pXb6Wd/h8/aWTcNPL5 gbY5SyPT9ZP0/MnKGKO0HLEojRplphX5MMYi8Kibg2I5/KYsCARwM6iDAOmKSMW5Ijim eecDbUHPiaRXYHNmUBN+a9W38+smML/gc2bKdh7lH1jOZa++Knqd7X6jfQsI6UE2GaLH yaNmhlJOrLwo6PUwYaZj9PBDRra32xJ5girH7L1ulaL+uryncT/5FgtrChp+eF+HMSOZ 6P1UGSQk2/xnA4Uz3qD+spHgFO7BpQ8OaM43MmokmnDQRNG56Hph5bQX23GtTzSkCZPy EziA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r18si2547851pls.423.2021.04.27.23.06.01; Tue, 27 Apr 2021 23:06:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235914AbhD1GGE (ORCPT + 99 others); Wed, 28 Apr 2021 02:06:04 -0400 Received: from mga06.intel.com ([134.134.136.31]:61543 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234334AbhD1GGD (ORCPT ); Wed, 28 Apr 2021 02:06:03 -0400 IronPort-SDR: 2tAuWH748QAcRXC+jH4GLUTY2EGxDZ9Nt6w9VjHPmIUmkJp9Baf+BXiQMsT7Zeag3E7GDbSAzy CtiReFyl1EOw== X-IronPort-AV: E=McAfee;i="6200,9189,9967"; a="257968207" X-IronPort-AV: E=Sophos;i="5.82,257,1613462400"; d="scan'208";a="257968207" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Apr 2021 23:05:17 -0700 IronPort-SDR: okLhxuR/iAreFUh6pLiqdMv1n14w3pDRc8ALNB6WVGJqOLzYE/vScdD2etknxwQV1qDXZmHHjq UvCSgHmD+kcw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.82,257,1613462400"; d="scan'208";a="447853452" Received: from aubrey-app.sh.intel.com (HELO [10.239.53.25]) ([10.239.53.25]) by fmsmga004.fm.intel.com with ESMTP; 27 Apr 2021 23:05:14 -0700 Subject: Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock To: Aubrey Li , Josh Don Cc: Don Hiatt , Peter Zijlstra , Joel Fernandes , "Hyser,Chris" , Ingo Molnar , Vincent Guittot , Valentin Schneider , Mel Gorman , linux-kernel , Thomas Gleixner References: <20210422120459.447350175@infradead.org> <20210422123308.196692074@infradead.org> From: Aubrey Li Message-ID: <5c289c5a-a120-a1d0-ca89-2724a1445fe8@linux.intel.com> Date: Wed, 28 Apr 2021 14:05:15 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/28/21 9:03 AM, Aubrey Li wrote: > On Wed, Apr 28, 2021 at 7:36 AM Josh Don wrote: >> >> On Tue, Apr 27, 2021 at 10:10 AM Don Hiatt wrote: >>> Hi Josh and Peter, >>> >>> I've been running into soft lookups and hard lockups when running a script >>> that just cycles setting the cookie of a group of processes over and over again. >>> >>> Unfortunately the only way I can reproduce this is by setting the cookies >>> on qemu. I've tried sysbench, stress-ng but those seem to work just fine. >>> >>> I'm running Peter's branch and even tried the suggested changes here but >>> still see the same behavior. I enabled panic on hard lockup and here below >>> is a snippet of the log. >>> >>> Is there anything you'd like me to try or have any debugging you'd like me to >>> do? I'd certainly like to get to the bottom of this. >> >> Hi Don, >> >> I tried to repro using qemu, but did not generate a lockup. Could you >> provide more details on what your script is doing (or better yet, >> share the script directly)? I would have expected you to potentially >> hit a lockup if you were cycling sched_core being enabled and >> disabled, but it sounds like you are just recreating the cookie for a >> process group over and over? >> > > I saw something similar on a bare metal hardware. Also tried the suggested > patch here and no luck. Panic stack attached with > softlockup_all_cpu_backtrace=1. > (sorry, my system has 192 cpus and somehow putting 184 cpus offline causes > system hang without any message...) Can you please try the following change to see if the problem is gone on your side? Thanks, -Aubrey diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f732642e3e09..1ef13b50dfcd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -493,14 +493,17 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2) { lockdep_assert_irqs_disabled(); - if (rq1->cpu > rq2->cpu) - swap(rq1, rq2); - - raw_spin_rq_lock(rq1); - if (__rq_lockp(rq1) == __rq_lockp(rq2)) - return; - - raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING); + if (__rq_lockp(rq1) == __rq_lockp(rq2)) { + raw_spin_rq_lock(rq1); + } else { + if (__rq_lockp(rq1) < __rq_lockp(rq2)) { + raw_spin_rq_lock(rq1); + raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING); + } else { + raw_spin_rq_lock(rq2); + raw_spin_rq_lock_nested(rq1, SINGLE_DEPTH_NESTING); + } + } } #endif