Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp19861691rwd; Wed, 28 Jun 2023 15:44:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7ZcKQqt6R4qMgSNHHRrgOj9VmSy0GsmBwB0sDoXv65owoUp522G0BAXJNvRSetMvJv1UvS X-Received: by 2002:a17:902:d2c1:b0:1b1:e6a4:2797 with SMTP id n1-20020a170902d2c100b001b1e6a42797mr10488820plc.45.1687992289796; Wed, 28 Jun 2023 15:44:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687992289; cv=none; d=google.com; s=arc-20160816; b=Nd7JWB6Q/D4AJxp9sqrP65H3glRHQPvVEC+PKLxjxruvRfjYJE4OCekwgVszfnewOT ODztb7BFUByZUWTTlvSpcdHmnRpkp2Vit40ytrWGMx/ReQIJ1xKGmF3bbaxpIPbnDFKN 0lqmAUYrshEjhQbDu6wRllGKsA+IYyCLW7W2ZX45oBe6IdK7yk0XNg2WSCDrEvRivc5y Yw5mjapbnLmmwRi+dalTMPaQBAeOt0GAlCaDRfSSkmjS0SBc9xftdJws53BZ5aQilvjb Dt/g7N9zah2kc6kkNN9K0wnTTPqPDlheiJJWsXUwHG2bdn5NarWW5WznhZkDpW4RZjQO Z4ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=JlPsTklnnnm336I84+lRNtdD8MZIAjpKQwAL33RrC/w=; fh=ZujIQKamoRzAKvOpRI0cB0x/Xp1g3r/fGafSA+1DRBE=; b=UcolpFm9nonJ7TcAkCefwwJJ66cRNfRJ9NRu6wiNJzx8t7LRmlG+DNf8zSIoLrk5Fe ikELaWaDnwtqbDhe/XjFbm1kXUARmHL39WY9Fs70txVVoSvsUCTRZMsrZ7j7ddAsjz3B ZNyeN9pb97fTxc0ozMVcyMF1+kDcpFFv1jZfLUD/rb7y7Pp0g86aqO161ED0WD69cc+j Xw6ATwFPoMfQ2neuxrVYf74X0sQpeGHH5Jv6eB8Hcf37pkvcL2JtIwmU5jaNm2qodxAp 49ROugQbii72AHd6kkfNQ292Dba1NSnYRD4e2dff0SLTK8RIAyVaVMJtsH1Ne9/k2FgR rldg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=sAyPGH1N; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=+pJqfi4H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b3-20020a170902d88300b001b55345f74csi9169461plz.132.2023.06.28.15.44.35; Wed, 28 Jun 2023 15:44:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=sAyPGH1N; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=+pJqfi4H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231133AbjF1WBs (ORCPT + 99 others); Wed, 28 Jun 2023 18:01:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230243AbjF1WBp (ORCPT ); Wed, 28 Jun 2023 18:01:45 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFF27107 for ; Wed, 28 Jun 2023 15:01:43 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687989702; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JlPsTklnnnm336I84+lRNtdD8MZIAjpKQwAL33RrC/w=; b=sAyPGH1N4pNf9UEOKZvn7yuH/1KRvFtkyyaApk2QwcEG5oHLCo5f10RKmpCky2SmhplW+M hoFbiM/5WoZ7UGKPwyCiZFzUfWW7sAsV1XDJBB/7X5ovLVH7oBkT4+c6iFASZntNZpwJsx 9YSFIMO/7P1LlZK2tGK10+fZmnWxAtlLdmrHPiNnaadK3uP9NNCrqgwwWHjgsXhCzHfOFi 7/zsV3T9mG5jJn1Y6b24WJfmXUK5z8IEc+UjUupjtKAGBpZZuLEUzS86kIVj5qdHBwzKjZ NfZpo8IHW6MSW4WBRjmKsXllUMASxu6ClBvyq07EpTQuXs6dWJ88s2JYXn9aXg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687989702; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JlPsTklnnnm336I84+lRNtdD8MZIAjpKQwAL33RrC/w=; b=+pJqfi4H7+8CUs2D8ZVkkYvaH8ya5QnNA3YBfTv3vRZ1KPfamcqHyNTCfEeTuPLd6P7BXQ GVeWnOR414OBDWBA== To: Vincent Guittot Cc: Xiongfeng Wang , vschneid@redhat.com, Phil Auld , vdonnefort@google.com, Linux Kernel Mailing List , Wei Li , "liaoyu (E)" , zhangqiao22@huawei.com, Peter Zijlstra , Dietmar Eggemann , Ingo Molnar Subject: Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling In-Reply-To: References: <8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com> <87mt18it1y.ffs@tglx> <68baeac9-9fa7-5594-b5e7-4baf8ac86b77@huawei.com> <875y774wvp.ffs@tglx> Date: Thu, 29 Jun 2023 00:01:41 +0200 Message-ID: <87pm5f2qm2.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 28 2023 at 14:35, Vincent Guittot wrote: > On Wed, 28 Jun 2023 at 14:03, Thomas Gleixner wrote: >> No, because this is fundamentally wrong. >> >> If the CPU is on the way out, then the scheduler hotplug machinery >> has to handle the period timer so that the problem Xiongfeng analyzed >> does not happen in the first place. > > But the hrtimer was enqueued before it starts to offline the cpu It does not really matter when it was enqueued. The important point is that it was enqueued on that outgoing CPU for whatever reason. > Then, hrtimers_dead_cpu should take care of migrating the hrtimer out > of the outgoing cpu but : > - it must run on another target cpu to migrate the hrtimer. > - it runs in the context of the caller which can be throttled. Sure. I completely understand the problem. The hrtimer hotplug callback does not run because the task is stuck and waits for the timer to expire. Circular dependency. >> sched_cpu_wait_empty() would be the obvious place to cleanup armed CFS >> timers, but let me look into whether we can migrate hrtimers early in >> general. > > but for that we must check if the timer is enqueued on the outgoing > cpu and we then need to choose a target cpu. You're right. I somehow assumed that cfs knows where it queued stuff, but obviously it does not. I think we can avoid all that by simply taking that user space task out of the picture completely, which avoids debating whether there are other possible weird conditions to consider alltogether. Something like the untested below should just work. Thanks, tglx --- --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1490,6 +1490,13 @@ static int cpu_down(unsigned int cpu, en return err; } +static long __cpu_device_down(void *arg) +{ + struct device *dev = arg; + + return cpu_down(dev->id, CPUHP_OFFLINE); +} + /** * cpu_device_down - Bring down a cpu device * @dev: Pointer to the cpu device to offline @@ -1502,7 +1509,12 @@ static int cpu_down(unsigned int cpu, en */ int cpu_device_down(struct device *dev) { - return cpu_down(dev->id, CPUHP_OFFLINE); + unsigned int cpu = cpumask_any_but(cpu_online_mask, dev->id); + + if (cpu >= nr_cpu_ids) + return -EBUSY; + + return work_on_cpu(cpu, __cpu_device_down, dev); } int remove_cpu(unsigned int cpu)