Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp2137335rdd; Thu, 11 Jan 2024 23:40:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IH6r2808E7MJSLDU4FccxBo8XaBkp4fe6b9jk/L8FM5AbrKAAH2o1JpBPMEPX5AMG5rPipu X-Received: by 2002:a17:906:c047:b0:a2c:862a:434a with SMTP id bm7-20020a170906c04700b00a2c862a434amr279852ejb.111.1705045246257; Thu, 11 Jan 2024 23:40:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705045246; cv=none; d=google.com; s=arc-20160816; b=Kda2Vkk6NaDKCZps2IMIQUjU2kqd1bSIe4+h4YCOSCUebslmKGB+fGoCuugdF5QtF5 ZvP3H2+/dlaLB4BCCKidEJtNQgbrW/BhDUUKLnVo52RQO7TDvnTqtqwu1SA0D3VjemgC zgukxsEuSKjTt98c0Al41SlKa2yBJWHFsfQ8weinS8zGX5/kBKk2i6Y9NFqR1pBwHYPp oAneD7gsPjYDfFBn4x/tdeYbaxA+8QGOstzSgllw/nb1W0pVhCyioSJJsl8bZAZaKdnx TYnZLtZXxfTaj5xIK3Aw36Mei6gN1ibHJ9iDLv0ZBiYkoMw0yNLMtZGQuAsHqzAV5GN8 wPiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=BYp/ye7sXzEY/F1l8I5mDY2/ZzZfM3DlYtZiat2qdPw=; fh=Yhne3zNfL71TtFPsFr+3lePq9zEXO1mj6/hf+xuTdLo=; b=fS3Il3voVhC6+eKyWhdg1+moZzlgG94Br7FslVPuZcVf0km/nUVRtHCiSnCJL0WQH4 z03LRCb531h9M3nDXlZa0M3QEupUPPuPsswHq+t+epB+z85jJGnLjPvRxuYbqm61jDbH 0/5051GN5766tGmL7rhUE9Q4D3kQjXVBXBcX484Yge+NXBPlA8eVyNCuwTZZQKzIh/mo 1W9mvRTP3mFTxCVj1I9KqzlXXgzl1+0nD92tmES0bEuvT2DNQ5EyadV3qAp79QiKXlCf lE5Y8Z0FSEXFbagluKVVQjl5sVSGC79lPFU7OeP40rbrSJH3EEgbg0yVI0pRYC/UWlfN bbhQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-24388-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-24388-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id i27-20020a170906251b00b00a28b7e247b1si1145363ejb.116.2024.01.11.23.40.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 23:40:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-24388-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-24388-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-24388-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 03C5C1F2358D for ; Fri, 12 Jan 2024 07:40:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BC77C5D732; Fri, 12 Jan 2024 07:40:38 +0000 (UTC) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BC6F5D726 for ; Fri, 12 Jan 2024 07:40:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4TBD2G4yTNzSnT4; Fri, 12 Jan 2024 15:39:34 +0800 (CST) Received: from dggpeml500003.china.huawei.com (unknown [7.185.36.200]) by mail.maildlp.com (Postfix) with ESMTPS id BE6651402C7; Fri, 12 Jan 2024 15:40:28 +0800 (CST) Received: from [10.174.177.173] (10.174.177.173) by dggpeml500003.china.huawei.com (7.185.36.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 12 Jan 2024 15:40:28 +0800 Message-ID: <44fa61a6-9ceb-0ebb-141f-0e2e703db47d@huawei.com> Date: Fri, 12 Jan 2024 15:40:27 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Subject: Re: [PATCH] tick/broadcast-hrtimer: Prevent the timer device on broadcast duty CPU from being disabled Content-Language: en-US To: , CC: , , , References: <20231218025844.55675-1-liaoyu15@huawei.com> From: Yu Liao In-Reply-To: <20231218025844.55675-1-liaoyu15@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500003.china.huawei.com (7.185.36.200) Hi Thomas, Kindly ping.. On 2023/12/18 10:58, Yu Liao wrote: > It was found that running the LTP hotplug stress test on a aarch64 > system could produce rcu_sched stall warnings. > > The issue is the following: > > CPU1 (owns the broadcast hrtimer) CPU2 > > tick_broadcast_enter() > //shut down local timer device > ... > tick_broadcast_exit() > //exits with tick_broadcast_force_mask set, > timer device remains disabled > > initiates offlining of CPU1 > take_cpu_down() > //CPU1 shuts down and does > not send broadcast IPI anymore > takedown_cpu() > hotplug_cpu__broadcast_tick_pull() > //move broadcast hrtimer to this CPU > clockevents_program_event() > bc_set_next() > hrtimer_start() > //does not call hrtimer_reprogram() > to program timer device if expires > equals dev->next_event, so the timer > device remains disabled. > > CPU2 takes over the broadcast duty but local timer device is disabled, > causing many CPUs to become stuck. > > Fix this by calling tick_program_event() to reprogram the local timer > device in this scenario. > > Signed-off-by: Yu Liao > --- > kernel/time/tick-broadcast-hrtimer.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c > index e28f9210f8a1..6a4a612581fb 100644 > --- a/kernel/time/tick-broadcast-hrtimer.c > +++ b/kernel/time/tick-broadcast-hrtimer.c > @@ -42,10 +42,22 @@ static int bc_shutdown(struct clock_event_device *evt) > */ > static int bc_set_next(ktime_t expires, struct clock_event_device *bc) > { > + ktime_t next_event = this_cpu_ptr(&tick_cpu_device)->evtdev->next_event; > + > /* > - * This is called either from enter/exit idle code or from the > - * broadcast handler. In all cases tick_broadcast_lock is held. > - * > + * This can be called from CPU offline operation to move broadcast > + * assignment. If tick_broadcast_force_mask is set, the CPU local > + * timer device may be disabled. And hrtimer_reprogram() will not > + * called if the timer is not the first expiring timer. Reprogram > + * the cpu local timer device to ensure we can take over the > + * broadcast duty. > + */ > + if (tick_check_broadcast_expired() && expires >= next_event) > + tick_program_event(next_event, 1); > + > + /* > + * This is called from enter/exit idle code, broadcast handler or > + * CPU offline operation. In all cases tick_broadcast_lock is held. > * hrtimer_cancel() cannot be called here neither from the > * broadcast handler nor from the enter/exit idle code. The idle > * code can run into the problem described in bc_shutdown() and the