Received: by 2002:a05:7412:8598:b0:f9:33c2:5753 with SMTP id n24csp355390rdh; Tue, 19 Dec 2023 00:29:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IFnDhMbR+kYCjgbOeNd+hSgKtVVsCgA3W0jpwcrZr3tOr8tQNljDvV7n8ub6Z7o79gH7008 X-Received: by 2002:a2e:7c12:0:b0:2cc:7849:f50 with SMTP id x18-20020a2e7c12000000b002cc78490f50mr1096093ljc.95.1702974577793; Tue, 19 Dec 2023 00:29:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702974577; cv=none; d=google.com; s=arc-20160816; b=tV1tJXIgeykT2J8jvWXbWtQfdRBOxoPFLmhMSkMcHy+gV1Tr/X/daWBWYkPKUHILSy O3aLYGcQwsz1sILhmeDHrJgf9Lt3gQi8K/UHomwFIciMefz22y0ld4YiqBELU0Jg1ih8 man7OC0UW8j5uvQz57WcMsYCqsGQPJfAAxDb1nl8lZmBDF8avnoeS3Tph2gnF6zB7LjO QPTS+zNVab7Ebr9blkJmyJKVfuhvAiU5+tQm541El1BdW7mMwLvde2dCOtYKvGbF4V5r rJhvWMfIuIYjHBUf4UjCPB2No5xtPmK7OMxCUmoli1xWkRBDbTYeX77cjKHYUykd9RXG vJpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=cc8x2WiRShJ4yQsoIxCZXgKrHzmCx0CyMazPUq1TCqY=; fh=Lssw4lVCeDqDzeGweQOFNMaDZ75L8/iSXByziTxFDJI=; b=QlN0VehNbgYNhnh9QwyiIkSigj79TO6Wmk1QIEamBLPzsqTUDLss00Jk2Oa/c6AG9e CqVk0UiDWziIGH07yHtm/7HQgVHUZgvrDHvWWqLoZMHkfBJd61/9j1tx8RV3QW3kt1+2 u65Z+K0J4bRA8img913URQp/7xQxwFCexx2Z08u6F+jfkBqNMvCBKHKz8Eprr1RRAOLH GDY7EyvUGv/TefQ1MzZodzsXzUMQHVKRGVaSuRC0zVgBYqBVjsTaXVHeUbr80cWipp32 CePSrk9HW9eXVCtAopuecpsK3RzAYWs73jxdKcRRi6fqtvZqvq6V/nlzsGG4SrG3Vh1b Wgvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-4888-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4888-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id c13-20020a50f60d000000b005539ed7d503si241497edn.5.2023.12.19.00.29.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 00:29:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4888-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-4888-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4888-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 860301F24752 for ; Tue, 19 Dec 2023 08:29:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0C4F6125BD; Tue, 19 Dec 2023 08:29:29 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D8AD125A8 for ; Tue, 19 Dec 2023 08:29:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.214]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4SvVGY3VX9z1wp0D; Tue, 19 Dec 2023 16:29:09 +0800 (CST) Received: from dggpeml500003.china.huawei.com (unknown [7.185.36.200]) by mail.maildlp.com (Postfix) with ESMTPS id EA9301A0192; Tue, 19 Dec 2023 16:29:22 +0800 (CST) Received: from huawei.com (10.44.142.84) by dggpeml500003.china.huawei.com (7.185.36.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 19 Dec 2023 16:29:22 +0800 From: Yu Liao To: , CC: , , , , Subject: [PATCH] tick/broadcast-hrtimer: Prevent the timer device on broadcast duty CPU from being disabled Date: Mon, 18 Dec 2023 10:58:44 +0800 Message-ID: <20231218025844.55675-1-liaoyu15@huawei.com> X-Mailer: git-send-email 2.33.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml500003.china.huawei.com (7.185.36.200) It was found that running the LTP hotplug stress test on a aarch64 system could produce rcu_sched stall warnings. The issue is the following: CPU1 (owns the broadcast hrtimer) CPU2 tick_broadcast_enter() //shut down local timer device ... tick_broadcast_exit() //exits with tick_broadcast_force_mask set, timer device remains disabled initiates offlining of CPU1 take_cpu_down() //CPU1 shuts down and does not send broadcast IPI anymore takedown_cpu() hotplug_cpu__broadcast_tick_pull() //move broadcast hrtimer to this CPU clockevents_program_event() bc_set_next() hrtimer_start() //does not call hrtimer_reprogram() to program timer device if expires equals dev->next_event, so the timer device remains disabled. CPU2 takes over the broadcast duty but local timer device is disabled, causing many CPUs to become stuck. Fix this by calling tick_program_event() to reprogram the local timer device in this scenario. Signed-off-by: Yu Liao --- kernel/time/tick-broadcast-hrtimer.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c index e28f9210f8a1..6a4a612581fb 100644 --- a/kernel/time/tick-broadcast-hrtimer.c +++ b/kernel/time/tick-broadcast-hrtimer.c @@ -42,10 +42,22 @@ static int bc_shutdown(struct clock_event_device *evt) */ static int bc_set_next(ktime_t expires, struct clock_event_device *bc) { + ktime_t next_event = this_cpu_ptr(&tick_cpu_device)->evtdev->next_event; + /* - * This is called either from enter/exit idle code or from the - * broadcast handler. In all cases tick_broadcast_lock is held. - * + * This can be called from CPU offline operation to move broadcast + * assignment. If tick_broadcast_force_mask is set, the CPU local + * timer device may be disabled. And hrtimer_reprogram() will not + * called if the timer is not the first expiring timer. Reprogram + * the cpu local timer device to ensure we can take over the + * broadcast duty. + */ + if (tick_check_broadcast_expired() && expires >= next_event) + tick_program_event(next_event, 1); + + /* + * This is called from enter/exit idle code, broadcast handler or + * CPU offline operation. In all cases tick_broadcast_lock is held. * hrtimer_cancel() cannot be called here neither from the * broadcast handler nor from the enter/exit idle code. The idle * code can run into the problem described in bc_shutdown() and the -- 2.33.0