Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp408575pxy; Wed, 28 Apr 2021 06:47:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzIt+VNrYUTTI0aM2/mw6Uyv46QVZ+ILqYTkSKuPQrQo2P/CVQgw02eg8inkY3FhOhLMQVz X-Received: by 2002:a17:906:3c4e:: with SMTP id i14mr7974891ejg.245.1619617670387; Wed, 28 Apr 2021 06:47:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619617670; cv=none; d=google.com; s=arc-20160816; b=Ny/UqxIXoEApWZBzc7Pj1nZLYqkWqt9JMyUnogLkjenUo8rLxBBSWXPwcXpxPwI5+G 46voPTxP33N1swTXrnMe2DHlEVUotMlQNSHuzIrcHXU5kzZCJArtbV2TcMOBZ3qToCF1 /Yzxucg47TpyBY2vOg9GGZ79b6ANstMLBvr9g9XiJZWx2Hvwy3qDjUFMGHU8BPkm9q75 stAnhvqrKDFxp7XfTwWWBlbbise9a691uFKlYqs8hrdNFICQw/yBru83mtURgHsk0ZOv 5d6SZle1tOFtuR+yfATofm1igED5aP/uqz876P54PeS5Y836cLUsAnz6IX6ZmemVBOIu HgWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=uVsQoqxgVQQxizb2GTtYnxEmKQWFzBXa5NI/YG9jJOI=; b=AFWM89h0BfoRc/QWIklT/J3dM1itX2f/TFhptw6GMR6QgKTQ1EfLT12jJehU+8/tod PUtpf0t8SK0/Hy+KUEPHrhFjjBCVtKKvXepPCBk+8Mjs+DJwLy2WnEhvAaWIIDC9swcF K3GBAbEbVVqycLt2ibGkqWxCCr0Zfcg1qc9BMmgRjCzaVutpN8E/OvRGYiRiBJedB+vv hBQ/cUx+qtXxq9fVEXOaT1dgeCZvxjerrQp4Y9z9d2qka/+xeAXUBlDJueM0lxKfCL6W /gGurScDZZrcHiEqc+LBConEUcrVaIrV8IGDYomHbu8V3k9Nnatvp7KNDtqY3zBlWXCH 06OQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=q6BTBIQZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id op26si10311ejb.603.2021.04.28.06.47.26; Wed, 28 Apr 2021 06:47:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=q6BTBIQZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238886AbhD1K6T (ORCPT + 99 others); Wed, 28 Apr 2021 06:58:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238428AbhD1K6S (ORCPT ); Wed, 28 Apr 2021 06:58:18 -0400 Received: from mail-yb1-xb32.google.com (mail-yb1-xb32.google.com [IPv6:2607:f8b0:4864:20::b32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47F37C061574 for ; Wed, 28 Apr 2021 03:57:33 -0700 (PDT) Received: by mail-yb1-xb32.google.com with SMTP id g8so1361926ybo.0 for ; Wed, 28 Apr 2021 03:57:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uVsQoqxgVQQxizb2GTtYnxEmKQWFzBXa5NI/YG9jJOI=; b=q6BTBIQZdUXml2HO7nE7BhtXrDBju0r7rYV9z4zUA8ERla3/+v28aorCaEZOUpy9oD GqD92cWVvVhxn92GC90CIa8mCrbZOfBzVx1Kc8bPL6hy21H7kL3MZiQiP5B9WcefyvJF bfma/6hseLJq6jJn7rHYBWQ5WIR6LnvYMl9GJMUH0xpvolwqg35UYVIIr5mFXpn/6Xua nAEAP5QCdGSRrHq5OD+J8lCs264947S0B4Aa3bbIwBp0SAdrXJRYMJtehVbDFO06hWYr DWVNaqFiAtDQHpkx8eV3vd7GZJMBcu3IXf6U/6ZUy3e3hP3fWZ98fBTOmFOHub4IGxAT ATLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uVsQoqxgVQQxizb2GTtYnxEmKQWFzBXa5NI/YG9jJOI=; b=kVOZsz57/i7yPs4Gktb51s1PoDYFDCeKcS3fV1i/NaAfpPqVft1+TSSG7domSEf0Ac dXDl0AAqiFc6XNWJ/LNBYzw91ci5p3irFXritjM8o7kYYcsxZyZR9wGNvf4hmDkBs61l B7Y9/5rU+E5w2D1fJCHaRMQdlzt9Q0/5y8yNvEH5p43VW4Xxq91ejZeBG2/fk4kT837l 945LBkoAMqyIhTyb9OGYnfSTkwCx7hHJHaK/RrAVRHiZeq/RZvPfy/UGQKmwtZODO7HE 82EljS/XFbx5dpk32nPb4of3vok+MZVldCWwTYh2bIamc9JcPRxExbJPbBjrTaXYbcgE Uf4w== X-Gm-Message-State: AOAM531GcXBN6mh/z+6YoD53WP6BspQ/slBBCVuTIX5eg/QYQ69sTcie eLPshg0I1re/PwTxLee/5NFjvESj0GycKdMBeeM= X-Received: by 2002:a25:6b51:: with SMTP id o17mr40634150ybm.149.1619607452629; Wed, 28 Apr 2021 03:57:32 -0700 (PDT) MIME-Version: 1.0 References: <20210422120459.447350175@infradead.org> <20210422123308.196692074@infradead.org> <5c289c5a-a120-a1d0-ca89-2724a1445fe8@linux.intel.com> In-Reply-To: <5c289c5a-a120-a1d0-ca89-2724a1445fe8@linux.intel.com> From: Aubrey Li Date: Wed, 28 Apr 2021 18:57:21 +0800 Message-ID: Subject: Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock To: Aubrey Li Cc: Josh Don , Don Hiatt , Peter Zijlstra , Joel Fernandes , "Hyser,Chris" , Ingo Molnar , Vincent Guittot , Valentin Schneider , Mel Gorman , linux-kernel , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 28, 2021 at 2:05 PM Aubrey Li wrote: > > On 4/28/21 9:03 AM, Aubrey Li wrote: > > On Wed, Apr 28, 2021 at 7:36 AM Josh Don wrote: > >> > >> On Tue, Apr 27, 2021 at 10:10 AM Don Hiatt wrote: > >>> Hi Josh and Peter, > >>> > >>> I've been running into soft lookups and hard lockups when running a script > >>> that just cycles setting the cookie of a group of processes over and over again. > >>> > >>> Unfortunately the only way I can reproduce this is by setting the cookies > >>> on qemu. I've tried sysbench, stress-ng but those seem to work just fine. > >>> > >>> I'm running Peter's branch and even tried the suggested changes here but > >>> still see the same behavior. I enabled panic on hard lockup and here below > >>> is a snippet of the log. > >>> > >>> Is there anything you'd like me to try or have any debugging you'd like me to > >>> do? I'd certainly like to get to the bottom of this. > >> > >> Hi Don, > >> > >> I tried to repro using qemu, but did not generate a lockup. Could you > >> provide more details on what your script is doing (or better yet, > >> share the script directly)? I would have expected you to potentially > >> hit a lockup if you were cycling sched_core being enabled and > >> disabled, but it sounds like you are just recreating the cookie for a > >> process group over and over? > >> > > > > I saw something similar on a bare metal hardware. Also tried the suggested > > patch here and no luck. Panic stack attached with > > softlockup_all_cpu_backtrace=1. > > (sorry, my system has 192 cpus and somehow putting 184 cpus offline causes > > system hang without any message...) > > Can you please try the following change to see if the problem is gone on your side? > Please ignore this patch, as the change of double_rq_unlock() in Peter's last patch fixed the problem. Thanks, -Aubrey > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index f732642e3e09..1ef13b50dfcd 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -493,14 +493,17 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2) > { > lockdep_assert_irqs_disabled(); > > - if (rq1->cpu > rq2->cpu) > - swap(rq1, rq2); > - > - raw_spin_rq_lock(rq1); > - if (__rq_lockp(rq1) == __rq_lockp(rq2)) > - return; > - > - raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING); > + if (__rq_lockp(rq1) == __rq_lockp(rq2)) { > + raw_spin_rq_lock(rq1); > + } else { > + if (__rq_lockp(rq1) < __rq_lockp(rq2)) { > + raw_spin_rq_lock(rq1); > + raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING); > + } else { > + raw_spin_rq_lock(rq2); > + raw_spin_rq_lock_nested(rq1, SINGLE_DEPTH_NESTING); > + } > + } > } > #endif >