Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp19116676rwd; Wed, 28 Jun 2023 05:25:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Jv7J0nNo/8cMR0AzfLw5WRSKrNLUAl3F6bFkSMT+K2jOYKPBIcS3Y4PUDJ9OF9wX7lO3Q X-Received: by 2002:a17:903:2352:b0:1b6:4bbd:c3b6 with SMTP id c18-20020a170903235200b001b64bbdc3b6mr10070547plh.9.1687955109814; Wed, 28 Jun 2023 05:25:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687955109; cv=none; d=google.com; s=arc-20160816; b=QsmZaGR67F1c3Nfm/HJkkO3pxuWrNuzkmcWIuK8t62uE6EcRlBwZqOTfY7Rq86GLgz q3mjRfoDR5imLlbX8+zZGsy3T86gQCSxpB/HRHFu0qfeApUhtdmQjZY6dsF08IiKaAtO xgh/zZP1L+9X57yaXbySWGWi8XXINnaRJM7ZWl/OcZdHgHgjxHccWR/C8rFhWkucx72s ZpGS/oEa6luHAGKRHZx0X0IRap8iAYDGm1pV7KKtEZtdIZBRkpLEF5qPER8emzteP35A 46n1H2a4nptfJFICoeYMA0WK9EywiJZ09lxFl7lr8AKj+H5om5ljnzeJ32ZSo1V5lwWx CmOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=zOOSj0ZmVINPINrjbtNgl9V20Fv+s0m3sMfb5W+T5hc=; fh=3veSKpFo2ned+6Z2FiS6xLHue8+Qfh6n157ocd/zINg=; b=WA2k2X6Uc1TfqEn+bGsmddZOHeSTyMzA+aJ6PTRRGf+CIpGlDyaP0VwcWJCgOAtnTN ml7UodflI/0jThB6oBF+8mc5fKWGaiCvFuWx8Vde9U8dd5e4cMSQOXKm9C0EJjOTwPri lV+YhmdJndwqzDSNvhpNlncPYYC9PH72AVUE8OS+DqvNWruxkvUiXxWdj40ENHeFqy+/ trzAGG2bbj2mNgeXLMNcjZEBiWB/ymV/LGZ4/167pOe0wHAhQLqzM0IAW32TTR4ZcvT+ 1P4+KG7QTWmn7PLWYHb0Nz/06dcbCut9eL4O1kzQJ3K4MS/6lcfCj5gum1XpcgJ6Qngj l+vw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=rzggrGiw; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m18-20020a170902d19200b001b7fa022261si6491948plb.37.2023.06.28.05.24.56; Wed, 28 Jun 2023 05:25:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=rzggrGiw; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230526AbjF1MDZ (ORCPT + 99 others); Wed, 28 Jun 2023 08:03:25 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:53662 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbjF1MDY (ORCPT ); Wed, 28 Jun 2023 08:03:24 -0400 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687953803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zOOSj0ZmVINPINrjbtNgl9V20Fv+s0m3sMfb5W+T5hc=; b=rzggrGiwTdn0fKLMm+dgElQXxwykYUw5aNZXfuDPjo58b2vaoakHZLSm1/co44jAyNZSab ZHz9fDHn7EFwhn9DV1BjipdAass6N1sB4umjQlqvhqkD/sgsh+sJbc3fi/5yhli/nr6FvU gd5dLDhnDFySpO8w2I3X5X53XmkPMy7Dpqtl2sBbRtGodrNLv7NCc3iHqSWX5TKkMrQ5ov /ze5b3L4ZtyjEu2puHwmuhMB6jStXAqQduOxYmvvHoQXMz+FLqo+fLpoOrhIuK2BzUH/Fq mzEAf7Zsce5D7pgNFcp+ppaz0W+P5OMETYB4L5LrBZE7+CgBPp1LWVo9w9DqGg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687953803; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zOOSj0ZmVINPINrjbtNgl9V20Fv+s0m3sMfb5W+T5hc=; b=1Z81BeaXMl0zcEDkIOZx24EirSyt+X1Q1XmRGcV0tC/VFYcrD6RfOaGjECIpkau2T9P8ed UqeMziIGJdolGZCg== To: Vincent Guittot , Xiongfeng Wang Cc: vschneid@redhat.com, Phil Auld , vdonnefort@google.com, Linux Kernel Mailing List , Wei Li , "liaoyu (E)" , zhangqiao22@huawei.com, Peter Zijlstra , Dietmar Eggemann , Ingo Molnar Subject: Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling In-Reply-To: References: <8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com> <87mt18it1y.ffs@tglx> <68baeac9-9fa7-5594-b5e7-4baf8ac86b77@huawei.com> Date: Wed, 28 Jun 2023 14:03:22 +0200 Message-ID: <875y774wvp.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 27 2023 at 18:46, Vincent Guittot wrote: > On Mon, 26 Jun 2023 at 10:23, Xiongfeng Wang wrote: >> > diff --cc kernel/sched/fair.c >> > index d9d6519fae01,bd6624353608..000000000000 >> > --- a/kernel/sched/fair.c >> > +++ b/kernel/sched/fair.c >> > @@@ -5411,10 -5411,16 +5411,15 @@@ void start_cfs_bandwidth(struct cfs_ban >> > { >> > lockdep_assert_held(&cfs_b->lock); >> > >> > - if (cfs_b->period_active) >> > + if (cfs_b->period_active) { >> > + struct hrtimer_clock_base *clock_base = cfs_b->period_timer.base; >> > + int cpu = clock_base->cpu_base->cpu; >> > + if (!cpu_active(cpu) && cpu != smp_processor_id()) >> > + hrtimer_start_expires(&cfs_b->period_timer, >> > HRTIMER_MODE_ABS_PINNED); >> > return; >> > + } > > I have been able to reproduce your problem and run your fix on top. I > still wonder if there is a > Could we have a helper from hrtimer to get the cpu of the clock_base ? No, because this is fundamentally wrong. If the CPU is on the way out, then the scheduler hotplug machinery has to handle the period timer so that the problem Xiongfeng analyzed does not happen in the first place. sched_cpu_wait_empty() would be the obvious place to cleanup armed CFS timers, but let me look into whether we can migrate hrtimers early in general. Aside of that the above is wrong by itself. if (cfs_b->period_active) hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED); This only ends up on the outgoing CPU if either: 1) The code runs on the outgoing CPU or 2) The hrtimer is concurrently executing the hrtimer callback on the outgoing CPU. So this: if (cfs_b->period_active) { struct hrtimer_clock_base *clock_base = cfs_b->period_timer.base; int cpu = clock_base->cpu_base->cpu; if (!cpu_active(cpu) && cpu != smp_processor_id()) hrtimer_start_expires(&cfs_b->period_timer, HRTIMER_MODE_ABS_PINNED); return; } only works, if 1) The code runs _not_ on the outgoing CPU and 2) The hrtimer is _not_ concurrently executing the hrtimer callback on the outgoing CPU. If the callback is executing (it spins on cfs_b->lock), then the timer is requeued on the outgoing CPU. Not what you want, right? Plus accessing hrtimer->clock_base->cpu_base->cpu lockless is fragile at best. Thanks, tglx