Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp8345519rwi; Tue, 25 Oct 2022 05:43:08 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6SVjE2K9G1hxtjqTo3rqVYlWs6IhHJe2FajdJ4NwE9SUuXIv3fBLoUBCs+AzSUO/hinnpy X-Received: by 2002:a17:906:db02:b0:780:24e:cf9 with SMTP id xj2-20020a170906db0200b00780024e0cf9mr32595882ejb.460.1666701787881; Tue, 25 Oct 2022 05:43:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666701787; cv=none; d=google.com; s=arc-20160816; b=LIgGRbYXD7TwVa2TX7r0IBFa9sR7NRCW8gLhXb87xffWeFgFG6m201ukZ5hKn+Jb56 yNNso8DPR7Or+az22vvzxYJ+DuPAxWXEAu8txCRD+PAGYheJGK0hZpP6mdlqSwlHFsRh 7NJRcf1pOXhfYYPTlD9uP/ipPLRwMxHw1puBH0wsZUmsNHAw39qiJnNAsius7D8cGW5A BArCOBmaLoFgfMq8eN0GlyZlEt/i1MyAS8qe4m17/MtOKtgo0qa9/pVXkH+gAna12NVw jl68Fqk6D3+F1p45o7EnAYjcD8qX1gUUXf/nac9giKM60yYa3HabxL/ZJhONHqqIZvjh PwmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:in-reply-to :subject:cc:to:from:message-id:date:dkim-signature; bh=sxOqD+u+7zXPz+jndRBh4NEJR7B95Ul7TDMceG/rap0=; b=fEuSw3tkQSrOH8vbA0DurkOeeSxgolE46kGcys2kCgWPwO/zb7aN2hmP7IWKdaAy5z vekGKInHVdtaopLu5L5PSc2cJweBVQt9z49fKwWWBM0KwKsUKfBpyKN7lZeFxAaa8kcx Bu2ZEKpE/ZvvERrOkhwG3Swsm+L8phMAPBnSVnly6OJz9L/fbC56RxrY+ss+xtskVKjg a69NUD6Q51kAIP2wz2fkJ8SzSC5FQg2QrXMqc6JmFPX0DAmuoFM2eoGVihO9rlb7IHFs E5IlG3frHdYrKJAWY37pQ+hzePUJ+ISM11wFwieXqleB/e45Mhx+IRXfdsiOeZqU32G2 JA7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HniQmcAd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hc10-20020a170907168a00b007a7d37e4681si3075877ejc.845.2022.10.25.05.42.42; Tue, 25 Oct 2022 05:43:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=HniQmcAd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231414AbiJYMbg (ORCPT + 99 others); Tue, 25 Oct 2022 08:31:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231226AbiJYMbe (ORCPT ); Tue, 25 Oct 2022 08:31:34 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF409183E0E for ; Tue, 25 Oct 2022 05:31:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 75707B81CF3 for ; Tue, 25 Oct 2022 12:31:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 25CF4C433C1; Tue, 25 Oct 2022 12:31:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1666701091; bh=b1zNicorc/HJVH0BSmMidSsDFebw2PrHkiVA0QaNePk=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=HniQmcAdqSD66Y8GZLLlRuh6NTDDU1kC3cINJWFci86/lZp+WdfpuXxdm9d6XnWek uTB2lrJ0B2Stfr2qichoVX1PiaLL3xrVHokw5BYQNBI3e8Kr83HjplC9ENXxt+Y7FE CgvBnBp6dYBZSzYd4O/XkFDGY87u6By4cQIq7olJS5vzhKu4vmk891hlTNtIusXU7O z5HFmRyd+5oCWjZrsMXm82jPaP3Bo3C5tMB4A/u2WqBqHq6lt08f6MRlWsIEUFsrom wVBRr9CGW520c8HYaY8c/77YUgxv35h6G+EsbB+EmEFX3N8idyL122QAWF5WM2GgMJ MIunwl+mqPc5Q== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1onJ5c-001WR2-N7; Tue, 25 Oct 2022 13:31:28 +0100 Date: Tue, 25 Oct 2022 13:31:28 +0100 Message-ID: <86o7u0dqzj.wl-maz@kernel.org> From: Marc Zyngier To: Joe Korty Cc: Mark Rutland , Daniel Lezcano , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation. In-Reply-To: <20221024165422.GA51107@zipoli.concurrent-rt.com> References: <20221024165422.GA51107@zipoli.concurrent-rt.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: joe.korty@concurrent-rt.com, mark.rutland@arm.com, daniel.lezcano@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Spam-Status: No, score=-7.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Joe, Thanks for respinning this. Some comments below. Nothing major, but worth keeping in mind for your next patches. On Mon, 24 Oct 2022 17:54:22 +0100, Joe Korty wrote: > > arm64: arch_timer: XGene-1: TVAL register math error breaks timer expiry calculation. > nit: this line is already in the email subject which will be captured when the patch is applied. No need to repeat it. Also, no final '.' at the end of the patch subject. > The TVAL register is 32 bit signed. Thus only the lower 31 bits are > available to specify when an interrupt is to occur at some time in the > near future. Attempting to specify a larger interval with TVAL results > in a negative time delta which means the timer fires immediately upon > being programmed, rather than firing at that expected future time. > > The solution is for linux to declare that TVAL is a 31 bit register rather nit: s/linux/Linux/ > than give its true size of 32 bits. This prevents linux from programming > TVAL with a too-large value. Note that, prior to 5.16, this little trick > was the standard way to handle TVAL in linux, so there is nothing new > happening here on that front. > > Test procedure: for some reason, the lockup watchdog is sensitive to > this bug. My interpretation is that the softlockup detector hides the issue, because it keeps generating short timer deadlines that are within the scope of the broken timer. Disable it, and you start using NO_HZ with much longer timer deadlines, which turns into an interrupt flood: 11: 1124855130 949168462 758009394 76417474 104782230 30210281 310890 1734323687 GICv2 29 Level arch_timer And "much longer" isn't that long: it takes less than 43s to underflow TVAL at 50MHz (the frequency of the counter on XGene-1). > When we turn the watchdog off, then run a little 'hello world' > program in each of the CPUs, there will often be one that 1) hangs up > forever, 2) hangs up what seems like forever, but actully contines after > a few minutes. In either case, the program cannot be freed by a ^C. > This test sequence requires CONFIG_SOFTLOCKUP_DETECTOR, and probably > requires that one of the NO_HZ Kconfig options be specified. > > The sequence is, for an 8 cpu Mustang XGene-1: > > echo 0 >/proc/sys/kernel/watchdog > for i in {0..7}; do taskset -c $i echo hi there $i; done > > Note that though the hangup usually happens, it does not > always happen. The first line is enough to kill the machine here. Nice one! :D > > Some comments on the v1 version of this patch by Marc Zyngier: nit: mentioning previous versions in the commit message isn't very helpful, as the git tree won't carry that patch. Just saying "additional comment from John Doe:" is enough. > > XGene implements CVAL (a 64bit comparator) in terms of TVAL (a countdown > register) instead of the other way around. TVAL being a 32bit register, > the width of the counter should equally be 32. However, TVAL is a > *signed* value, and keeps counting down in the negative range once the > timer fires. > > It means that any TVAL value with bit 31 set will fire immediately, > as it cannot be distinguished from an already expired timer. Reducing > the timer range back to a paltry 31 bits papers over the issue. > > Another problem cannot be fixed though, which is that the timer interrupt > *must* be handled within the negative countdown period, or the interrupt > will be lost (TVAL will rollover to a positive value, indicative of a > new timer deadline). > > [ v2: Expanded CC list - jak ] > [ v2: Revamped changelog - jak ] > [ v2: streamlined inlined comments - jak ] This information should be stashed below the '---' line so that it isn't captured in the commit when applying the patch. > > Cc: stable@vger.kernel.org # 5.16+ > Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") > Signed-off-by: Joe Korty > --- > base-commit: v6.0 > Index: b/drivers/clocksource/arm_arch_timer.c > =================================================================== > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -804,6 +804,9 @@ static u64 __arch_timer_check_delta(void > /* > * XGene-1 implements CVAL in terms of TVAL, meaning > * that the maximum timer range is 32bit. Shame on them. > + * > + * Note that TVAL is signed, thus has only 31 of its > + * 32 bits to express magnitude. > */ > MIDR_ALL_VERSIONS(MIDR_CPU_MODEL(ARM_CPU_IMP_APM, > APM_CPU_PART_POTENZA)), > @@ -811,8 +814,8 @@ static u64 __arch_timer_check_delta(void > }; > > if (is_midr_in_range_list(read_cpuid_id(), broken_cval_midrs)) { > - pr_warn_once("Broken CNTx_CVAL_EL1, limiting width to 32bits"); > - return CLOCKSOURCE_MASK(32); > + pr_warn_once("Broken CNTx_CVAL_EL1, using 32 bit TVAL instead.\n"); s/TVAL/CNTx_TVAL_EL1/ instead. > + return CLOCKSOURCE_MASK(31); > } > #endif > return CLOCKSOURCE_MASK(arch_counter_get_width()); > With the commit message suitably amended: Reviewed-by: Marc Zyngier Daniel, would you mind fixing it up when applying this patch? XGene is trivially broken without this fix, and it would be good if it could make it in one of the 6.1-rc. Thanks, M. -- Without deviation from the norm, progress is not possible.