Received: by 2002:a05:7412:1703:b0:e2:908c:2ebd with SMTP id dm3csp165382rdb; Thu, 24 Aug 2023 02:19:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFCtHq+hjRAN0rZyef9av6gQ9CZkA0f6T2Ftvt9hvGn3XHqrW1nif44ptj1e47FUK7hEBI3 X-Received: by 2002:a17:902:c10c:b0:1c0:aa04:dc2f with SMTP id 12-20020a170902c10c00b001c0aa04dc2fmr3959748pli.11.1692868772318; Thu, 24 Aug 2023 02:19:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692868772; cv=none; d=google.com; s=arc-20160816; b=K6zlqpWTXIwUr4A/INP3In6LnK/n72VSEQnv09X90FLbpvzx1+0nOQsaod2LvIbbSk tKC8fNK/uPosXY1jjcmtm4T8jZ0Mfkrwf9/+Gr4LHGWSpypyGma+zk9yNMo/4AGk7lde N3Lw3ljuOc3Qa/Yt3sTpDC5jXN3jfBlQVoZYOrjYsgLMCqP6sRbcVS+EMTTFO35z3PkA FJ19OCCkzo1SPUQyLmcld0o7eIprqlySnHpTf5qzMh/LEHCXU+gNxLiQJVZ7l7wwQliV Ddthn8W/g0VKK5i5uLAV9ZlNxwEaK6gf2KFnlKnliWWBhYgT4cll+Wdf0lPxzSEsINZm XSOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=6QLYBVqx6DdtazHbRrXayRFblGrlx4GS2W+JQmy753c=; fh=9cAl8GilttHQ6LLIBEOy2sVJ7xF9iDT4FtFlQROrB5M=; b=EacX1IGuag0m4Co+cPu1z7yxdyu2hK/3jdCdew07nVKAF3fEY0w9ax4wCHrjWwhZTl SDBEdlm2LnKDaouvVPI7spskAHkJx53QrZ8RDAU9ZzHKlaHU863DVAThsSjTaZ8ng4p+ N1qceffICy6drSp/bd0Ak6396aQK2/AEdlPgpvi+wzM1NsIx0cJBHI07xA4YlMmEjrFd uGr3IndIHLcLutrhn397ucwCmPUJe8PYp1iJZbjguaXmbuPDQ4Y2VZrTuwv6nJpWU5Z5 fIl6m+FYGHcD4wyY3udJ6IOEtBbeFDR6xHD+JsGGZNQ18kIVnKsgY3j1cUCYAzW/NvDz dAvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o11-20020a170903300b00b001bf2931ccdesi12393234pla.232.2023.08.24.02.19.20; Thu, 24 Aug 2023 02:19:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240324AbjHXHZs (ORCPT + 99 others); Thu, 24 Aug 2023 03:25:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238047AbjHXHZa (ORCPT ); Thu, 24 Aug 2023 03:25:30 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D7ECE6C for ; Thu, 24 Aug 2023 00:25:26 -0700 (PDT) Received: from dggpeml500003.china.huawei.com (unknown [172.30.72.56]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4RWZMF3dX5z1L9LH; Thu, 24 Aug 2023 15:23:53 +0800 (CST) Received: from [10.174.177.173] (10.174.177.173) by dggpeml500003.china.huawei.com (7.185.36.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 24 Aug 2023 15:25:23 +0800 Message-ID: <3256f9c1-dccd-5995-5b14-afaae281be90@huawei.com> Date: Thu, 24 Aug 2023 15:25:22 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Subject: Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling Content-Language: en-US To: Thomas Gleixner , Vincent Guittot CC: Xiongfeng Wang , , Phil Auld , , Linux Kernel Mailing List , Wei Li , , Peter Zijlstra , Dietmar Eggemann , Ingo Molnar References: <87h6oqdq0i.ffs@tglx> From: Yu Liao In-Reply-To: <87h6oqdq0i.ffs@tglx> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.173] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500003.china.huawei.com (7.185.36.200) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,HK_RANDOM_ENVFROM, HK_RANDOM_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/8/23 18:14, Thomas Gleixner wrote: > Subject: cpu/hotplug: Prevent self deadlock on CPU hot-unplug > From: Thomas Gleixner > Date: Wed, 23 Aug 2023 10:47:02 +0200 > > Xiongfeng reported and debugged a self deadlock of the task which initiates > and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. > > CPU1 CPU2 > > T1 sets cfs_quota > starts hrtimer cfs_bandwidth 'period_timer' > T1 is migrated to CPU2 > T1 initiates offlining of CPU1 > Hotplug operation starts > ... > 'period_timer' expires and is re-enqueued on CPU1 > ... > take_cpu_down() > CPU1 shuts down and does not handle timers > anymore. They have to be migrated in the > post dead hotplug steps by the control task. > > T1 runs the post dead offline operation > T1 is scheduled out > T1 waits for 'period_timer' to expire > > T1 waits there forever if it is scheduled out before it can execute the hrtimer > offline callback hrtimers_dead_cpu(). > > Cure this by delegating the hotplug control operation to a worker thread on > an online CPU. This takes the initiating user space task, which might be > affected by the bandwidth timer, completely out of the picture. > > Reported-by: Xiongfeng Wang > Signed-off-by: Thomas Gleixner > Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com > --- > kernel/cpu.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -1467,8 +1467,22 @@ static int __ref _cpu_down(unsigned int > return ret; > } > > +struct cpu_down_work { > + unsigned int cpu; > + enum cpuhp_state target; > +}; > + > +static long __cpu_down_maps_locked(void *arg) > +{ > + struct cpu_down_work *work = arg; > + > + return _cpu_down(work->cpu, 0, work->target); > +} > + > static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) > { > + struct cpu_down_work work = { .cpu = cpu, .target = target, }; > + > /* > * If the platform does not support hotplug, report it explicitly to > * differentiate it from a transient offlining failure. > @@ -1477,7 +1491,15 @@ static int cpu_down_maps_locked(unsigned > return -EOPNOTSUPP; > if (cpu_hotplug_disabled) > return -EBUSY; > - return _cpu_down(cpu, 0, target); > + > + /* > + * Ensure that the control task does not run on the to be offlined > + * CPU to prevent a deadlock against cfs_b->period_timer. > + */ > + cpu = cpumask_any_but(cpu_online_mask, cpu); > + if (cpu >= nr_cpu_ids) > + return -EBUSY; > + return work_on_cpu(cpu, __cpu_down_maps_locked, &work); > } > > static int cpu_down(unsigned int cpu, enum cpuhp_state target) Thanks for the patch. Tested in v6.5-rc5 with test script provided by Xiongfeng, this patch works. Tested-by: Yu Liao Best regards, Yu