Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1292367pxk; Fri, 25 Sep 2020 10:53:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzWCK9ANJwutHy/Fnt9QKqFomIMiIx35Cg/cxb86lobiR+g0bVKZq6PSt1odFXRo+B/xqIi X-Received: by 2002:a17:906:6409:: with SMTP id d9mr3814728ejm.344.1601056405272; Fri, 25 Sep 2020 10:53:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601056405; cv=none; d=google.com; s=arc-20160816; b=EuM8UFYLDS+6hKziJBI70dvWO2vi3tjEzI/cwYvp12BmlaC0xGIssHc91H+FoSYfjM Rr2/8NrlAe1wwp8tSOdmM8PenByF38a7nyglLxGPfTCYBfxGW4/7/ris7PCtXcTGg6Ny hKVx2/5SYI9EKrSPWUx0vn5FlFoVji8ZKblhLQ8tNBn1OUkk9eDL1aXRA3tXHlIONf+r mo1gLeTwrb15aqwQ1dNeOcvV0LwhgzTWEdbzsj95BRe9Z3L/4sGfwf4/zMbCIYf7v/LD +mhO9D1c3Jkkne3hRPk6fX48IC2To7HK/tDWc2WsAab49J6886//X6iKKCwMLL3p8O9I 2C2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:in-reply-to:subject :cc:to:from:user-agent:references; bh=a4kIKd2vmtDf1+Aju/3pHExVggBHtmercdx/IsCvACE=; b=Dcw94xh0Q5B8TZ3Fhhz1L17qw11Pe8kABndVIi/W9HdT67VzRzRGpqPawyY8EryjC4 to/lUCVhBIb3xerGSrFov0b4Hl3coVcIZ3Ye09Tl028DRfYP3nWBwyG/PZ7JFXYY3P3D Enc8LIWh76b3nM1wj0BkCm1ZL8Tiylewf9p5a0hweZC1tOOjfzACcudxWMvLTOq4za1X uNzObKdqmHCGRZDddvLleQhtwybB6tv9Sc3lsLevgEmBLjDjENZ51qiM94qKgwzkgVzE N2fjmRwTwyLytgrIA/1lrSGMEhzHPAv+wOqNDkMg06xlwHNOAVWsqaB4yVhv2AS5ea1X MBOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id o11si2652531ejx.754.2020.09.25.10.53.02; Fri, 25 Sep 2020 10:53:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729713AbgIYRtj (ORCPT + 99 others); Fri, 25 Sep 2020 13:49:39 -0400 Received: from foss.arm.com ([217.140.110.172]:50282 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726990AbgIYRtj (ORCPT ); Fri, 25 Sep 2020 13:49:39 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8FB99101E; Fri, 25 Sep 2020 10:49:38 -0700 (PDT) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AF4DC3F718; Fri, 25 Sep 2020 10:49:36 -0700 (PDT) References: <20200921163557.234036895@infradead.org> <6f55a303-0e5c-8e84-65d3-798b589a5d75@arm.com> <20200925101030.GA2594@hirez.programming.kicks-ass.net> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: Dietmar Eggemann Cc: Peter Zijlstra , tglx@linutronix.de, mingo@kernel.org, linux-kernel@vger.kernel.org, bigeasy@linutronix.de, qais.yousef@arm.com, swood@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vincent.donnefort@arm.com Subject: Re: [PATCH 0/9] sched: Migrate disable support In-reply-to: Date: Fri, 25 Sep 2020 18:49:26 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25/09/20 13:19, Valentin Schneider wrote: > On 25/09/20 12:58, Dietmar Eggemann wrote: >> With Valentin's print_rq() inspired test snippet I always see one of the >> RT user tasks as the second guy? BTW, it has to be RT tasks, never >> triggered with CFS tasks. >> >> [ 57.849268] CPU2 nr_running=2 >> [ 57.852241] p=migration/2 >> [ 57.854967] p=task0-0 > > I can also trigger the BUG_ON() using the built-in locktorture module > (+enabling hotplug torture), and it happens very early on. I can't trigger > it under qemu sadly :/ Also, in my case it's always a kworker: > > [ 0.830462] CPU3 nr_running=2 > [ 0.833443] p=migration/3 > [ 0.836150] p=kworker/3:0 > > I'm looking into what workqueue.c is doing about hotplug... So with - The pending migration fixup (20200925095615.GA2651@hirez.programming.kicks-ass.net) - The workqueue set_cpus_allowed_ptr() change (from IRC) - The set_rq_offline() move + DL/RT pull && rq->online (also from IRC) my Juno survives rtmutex + hotplug locktorture, where it would previously explode < 1s after boot (mostly due to the workqueue thing). I stared a bit more at the rq_offline() + DL/RT bits and they look fine to me. The one thing I'm not entirely sure about is while you plugged the class->balance() hole, AIUI we might still get RT (DL?) pull callbacks enqueued - say if we just unthrottled an RT RQ and something changes the priority of one of the freshly-released tasks (user or rtmutex interaction), I don't see any stopgap preventing a pull from happening. I slapped the following on top of my kernel and it didn't die, although I'm not sure I'm correctly stressing this path. Perhaps we could limit that to the pull paths, since technically we're okay with pushing out of an !online RQ. --- diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 50aac5b6db26..00d1a7b85e97 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1403,7 +1403,7 @@ queue_balance_callback(struct rq *rq, { lockdep_assert_held(&rq->lock); - if (unlikely(head->next)) + if (unlikely(head->next || !rq->online)) return; head->func = (void (*)(struct callback_head *))func; ---