Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1497784rwi; Thu, 27 Oct 2022 17:03:37 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4HpYQsCChInsWnyX3po+d70XEexVzRpdP39P3sbS48PO3qSRaTebmouV0jh7PLuQROdQ47 X-Received: by 2002:a65:4909:0:b0:46b:2752:e4ab with SMTP id p9-20020a654909000000b0046b2752e4abmr45034201pgs.293.1666915417084; Thu, 27 Oct 2022 17:03:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666915417; cv=none; d=google.com; s=arc-20160816; b=Rpg00MnUQzvvbzyz9oyTA+/qIxpFa6O8aLlXD1l0aGHyRaIgfJFibUh//ItGKzjduI MEfC1uwIUvnl7L81KblRJfCEkrxDhNZ1+HCZQ8Xq4Dfw0p4d5NyEwMX/3K+SS3wBrc6R 21ItuWP3iVLVYwoAwQSFMQChzEeyeZqCelHuTuGb5mKdrxjlWADj3rNzkoGbdNbsQ21k d8YyJtGlXWIbSCxd2U3J09mWC0pACrLGmzTnDLDYD8bl8D2vV2RcQtvvtFa/uD86V/VW yhHMrh/TgkpK5tJffjmZSgM4qTmBAx2vp1G07KxACliSZdtjpVbkGvvdN9n0UXSFJwC+ DnYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:feedback-id:dkim-signature:dkim-signature; bh=G032cFkkJJwDvICoMjP1OrBVYRlJZIJ+SIsojS+9x8Q=; b=ai9SFKn6g0xRLofga5GHDEfl51kDhSpRYEGjRRA3Cbn1cd7DZ6K5vA0WWfu/oVeJys E1saiYIGWWpcEtg9LUqhqsKr2MmeLw07OYkEtzq7x8TaD+dTw37HUw5WkPuCTsawkyzY 52b57VghVe/LwHjcX8qpaMXzWMdKNz4L984QXO3EOarfWWKWH6ziNCkGbK12mZY45ud/ +E1eynBXbCSTqh6Ool9fCMHG+JzBkPrB3auX3CKgp8N+uKWynnmgmRqIf3rxRrsJw5qG x6wrGkFLRT7boBvxkXd9D1+T78FDv+LMMYVGkMJnA/qDQkbQPPdzXEa1G/n2W7rBkpZf PYZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sholland.org header.s=fm1 header.b=S5jJYcKg; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=EAmnHEuK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sholland.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c11-20020a170903234b00b001869fd74e0csi3889935plh.311.2022.10.27.17.03.23; Thu, 27 Oct 2022 17:03:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sholland.org header.s=fm1 header.b=S5jJYcKg; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=EAmnHEuK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sholland.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235289AbiJ0Xly (ORCPT + 99 others); Thu, 27 Oct 2022 19:41:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235011AbiJ0Xlw (ORCPT ); Thu, 27 Oct 2022 19:41:52 -0400 Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02B7913D1B for ; Thu, 27 Oct 2022 16:41:50 -0700 (PDT) Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 611015C0105; Thu, 27 Oct 2022 19:41:47 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Thu, 27 Oct 2022 19:41:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sholland.org; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; t=1666914107; x= 1667000507; bh=G032cFkkJJwDvICoMjP1OrBVYRlJZIJ+SIsojS+9x8Q=; b=S 5jJYcKgTkqxDorI3XYDq2oMfz9bR+58651vMVKwlzyS4yWoTbJxsOnYSDCVI73sB wQvfbs/vFp7iA199XA1pwKxEOuFXZLLxZzBd3c0I61Op3Ya8jw+Ugn2Vn5xylQZq h7kCUA1bWrcZs8s12YYsyBf/4s/frGfv5oqdT64Jlj8B44Dhfyh5EjCDkgKGCiEW ib6/6C0FqUfp2esRF6p7korNijLR4BNZ5nr5jRCrv+H3e3hD/+Zz1Dkmu3AOznxk PAlwhkh+DQSDzN4T8AKVPiAL6TbHNA7QDgyCnrQpqbWhUNZJfcnLAeIXk9rXAHkn TQ8tfivbwIGVyA/9g89wg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1666914107; x= 1667000507; bh=G032cFkkJJwDvICoMjP1OrBVYRlJZIJ+SIsojS+9x8Q=; b=E AmnHEuKu/FUCuOG2z9q2ekXp6i2iG8VN6hMudQAzXXLMH2aPW+e0uNR5g4U1gbjH ysWCmfQ4YXnRpom9DEyIXtDF0eOTevVAmcRDl1D6srVQDIn+DZWWIsk7NA3acw/l G+AJp2p4e9mjCcYUMyEHKCbSe/0QgPl1LNm3vmqX7uHqHKQRaxs6Rf00A4U/mmDI NixaC7xiMAk8jqUseoLJOecl8FJfNOrl2ctTy/MG5sQslDjAv6KwdbFFD4MbQl7m Qr4OyBy4HENhvMUadfIib+kR4Ck7UtmQ8vIZscaqxBbR5SQs6SgkxosNbskMzQ5u spS3Qt309OJ3uPU1QptFw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvgedrtdehgddvgecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfvvehfhffujggtgfesthekredttdefjeenucfhrhhomhepufgrmhhu vghlucfjohhllhgrnhguuceoshgrmhhuvghlsehshhholhhlrghnugdrohhrgheqnecugg ftrfgrthhtvghrnheptdeitdejvdejveffveejffffjeekfefhgeehjeeijefgkedtfeeh leetfeekvdfgnecuffhomhgrihhnpehgihhthhhusgdrtghomhdpkhgvrhhnvghlrdhorh hgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepshgr mhhuvghlsehshhholhhlrghnugdrohhrgh X-ME-Proxy: Feedback-ID: i0ad843c9:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 27 Oct 2022 19:41:46 -0400 (EDT) Message-ID: <40abce35-b8de-cd5c-f544-fcf344e7057c@sholland.org> Date: Thu, 27 Oct 2022 18:41:45 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux ppc64le; rv:102.0) Gecko/20100101 Thunderbird/102.3.3 Content-Language: en-US To: Palmer Dabbelt , apatel@ventanamicro.com Cc: Conor Dooley , daniel.lezcano@linaro.org, tglx@linutronix.de, aou@eecs.berkeley.edu, atishp@atishpatra.org, dmitriy@oss-tech.org, Paul Walmsley , linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org References: From: Samuel Holland Subject: Re: [PATCH] clocksource/drivers/riscv: Events are stopped during CPU suspend In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/27/22 18:07, Palmer Dabbelt wrote: > On Mon, 24 Oct 2022 01:04:55 PDT (-0700), apatel@ventanamicro.com wrote: >> On Mon, Oct 24, 2022 at 10:31 AM Samuel Holland >> wrote: >>> >>> On 10/9/22 18:45, Palmer Dabbelt wrote: >>> > On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote: >>> >> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote: >>> >>> Some implementations of the SBI time extension depend on hart-local >>> >>> state (for example, CSRs) that are lost or hardware that is powered >>> >>> down when a CPU is suspended. To be safe, the clockevents driver >>> >>> cannot assume that timer IRQs will be received during CPU suspend. >>> >>> >>> >>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver") >>> >>> Signed-off-by: Samuel Holland >>> >>> --- >>> >>> >>> >>>  drivers/clocksource/timer-riscv.c | 2 +- >>> >>>  1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> >>> >>> diff --git a/drivers/clocksource/timer-riscv.c >>> >>> b/drivers/clocksource/timer-riscv.c >>> >>> index 1767f8bf2013..593d5a957b69 100644 >>> >>> --- a/drivers/clocksource/timer-riscv.c >>> >>> +++ b/drivers/clocksource/timer-riscv.c >>> >>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long >>> delta, >>> >>>  static unsigned int riscv_clock_event_irq; >>> >>>  static DEFINE_PER_CPU(struct clock_event_device, >>> riscv_clock_event) = { >>> >>>      .name            = "riscv_timer_clockevent", >>> >>> -    .features        = CLOCK_EVT_FEAT_ONESHOT, >>> >>> +    .features        = CLOCK_EVT_FEAT_ONESHOT | >>> CLOCK_EVT_FEAT_C3STOP, >>> > >>> > This is listed as being x86-specific in the header, but there's a >>> hanful >>> > of other ports that enable it for timers as well.  Looks like arm is >>> > setting this based on DT, which seems reasonable to me: we're working >>> > around a firmware bug, there should be some way to turn off that >>> > workaround for firmware that doesn't have the bug. Looks like Intel >>> already >>> > turns this off when ARAT is supported, which seems to be the case for >>> > anything modern, so maybe we're just tripping up on some untested >>> behavior here? >>> > I'm not sure exactly how we should probe this, but having it only >>> enabled >>> > when we need the workaround seems like the right way to go. >>> >>> I opened an issue against the SBI spec about what exactly it requires, >>> but I got no responses: >>> >>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98 >>> >>> My interpretation of the SBI specification is that it does not require >>> maintaining any hart-local state across a non-retentive hart suspend. >>> Unless the SBI spec says the timer must fire during/after suspend, then >>> there is no firmware bug. > > IMO this is a grey area in the spec: it says both "The hart will > automatically come out of suspended state and resume normal execution > when it receives an interrupt or platform specific hardware event." but > "sstatus.SIE = 0" on resume.  It's not clear _when_ "sstatus.SIE" must > take the value 0 (ie, is it before sleeping or after taking up) and if > "receives an interrupt" means _any_ interrupt, or just enabled interrupts. > > I agree we can't say it's a firmware bug, though.  There's certainly > some reading of the spec that allows for this -- even if there wasn't > we'd still have to live with whatever the firmware does, but here I > think it's just a Linux bug. > >> SBI spec only defines the mechanism to enter HART suspend state. All >> other details (such as timer interrupt firing during/after suspend) >> are platform >> or implementation specific which needs to be discovered from DT/ACPI. > > From that bug it sounds like it's really platform-specific whether or > not it's possible to wake up from non-retentive suspend, so maybe we > should just add some sort of DT node that says "non-retentive suspend > works" and then only use it on those systems? No, this isn't about whether non-retentive suspend works generally. The only behavior in question is specifically whether or not the _SBI timer extension_ can wake from non-retentive suspend. For example, on Allwinner D1, all PLIC inputs can wake from non-retentive suspend. This includes several MMIO timers. But notably it does not include the CLINT. And the CLINT is what OpenSBI chooses as the backend for the SBI timer extension. You could make an argument that OpenSBI should use one of those other MMIO timers, and then the C3STOP flag would not be needed, but the SBI spec does not _require_ doing this. And (in the absence of a DT/ACPI property declaring otherwise), the driver can only rely on what the standard requires. >>> > That said, I'm not actually sure this C3STOP feature does what we want >>> > given the commit description.  The timers on RISC-V are sort of in >>> this >>> > odd middle-ground between being per-CPU timers and system timers: the >>> > time they produce is global (or at least close, due to the mtime >>> > synchronization rules) but the actual interrupts are only one-shot and >>> > only local. >>> >>> And if we cannot rely on the interrupt being delivered, we cannot rely >>> on the SBI time extension to work across cpuidle entry. >> >> Just like ARM, we need a DT property to discover this platform specific >> behaviour. >> >> I had sent out a small series to do this for DT. >> Refer, "[PATCH v2 0/2] Improve CLOCK_EVT_FEAT_C3STOP feature setting" >> https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/ This is a reasonable solution for me, if we can agree on the binding. > I broadly agree with the "we should split out the timer node" stuff > there.  It used to be part of the core, but it's not any more.  Looks > like that was the hangup, though I'm not sure setting C3STOP is even the > right fix any more... > >>> > From poking around the code I think this just tries to >>> > setup a periodic broadcast timer, but since we use software >>> fallbacks to >>> > emulate those we'll still end up losing the interrupts/ticks if the >>> CPU >>> > that was asked for an interrupt has gone to sleep and lost that state. >>> >>> So by extension, non-retentive cpuidle states cannot be used if the SBI >>> timer is the only available timer, since there is no hardware broadcast >>> timer to use as a backup. >>> >>> > I'm not sure if I'm just misunderstanding what's going on here, >>> though. >>> > Is there something that describes the behavior this fixes in more >>> detail? >>> >>> The motivating scenario for this patch is the C906, where the MTIMER is >>> in the same reset domain as the CPU, so the timer state is lost during >>> non-retentive suspend. Without this patch, if riscv_timer_clockevent is >>> the current clockevent driver, then the CPU fails to wake up from >>> suspend. However, this same problem would occur on any CPU where the >>> timer or interrupt delivery stops working during suspend. >> >> Yes, I recall CLOCK_EVT_FEAT_C3STOP was added for C906 but >> we should go the DT/ACPI route. > > I agree, though I think this should disable non-retentive suspend as > opposed to setting C3STOP. Please see above. Disabling non-retentive suspend is a much bigger hammer than needed. >> Regards, >> Anup >> >>> >>> >>>      .rating            = 100, >>> >>>      .set_next_event        = riscv_clock_next_event, >>> >>>  }; >>> >> >>> >> After a bit of a painful bisection (with a misdirection into the >>> v5.19 >>> >> printk reverts along the way) I have arrived at this commit for >>> causing >>> >> me some issues. >>> >> >>> >> If an AXI read to the PCIe controller on PolarFire SoC times out, the >>> >> system will stall, with an expected: >>> >>      io scheduler mq-deadline registered >>> >>      io scheduler kyber registered >>> >>      microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 >>> >> ranges: >>> >>      microchip-pcie 2000000000.pcie:      MEM >>> >> 0x2008000000..0x2087ffffff -> 0x0008000000 >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: axi read request error >>> >>      microchip-pcie 2000000000.pcie: axi read timeout >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      Freeing initrd memory: 7336K >>> >>      mc_event_handler: 667402 callbacks suppressed >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>      mc_event_handler: 666588 callbacks suppressed >>> >> >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     mc_event_handler: 666748 callbacks suppressed >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: >>> >>     rcu:     0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002 >>> >> softirq=34/36 fqs=2626 >>> >>         (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4) >>> >>     Task dump for CPU 0: >>> >>     task:swapper/0       state:R  running task     stack:    0 pid: >>> >> 1 ppid:     0 flags:0x00000008 >>> >>     Call Trace: >>> >>     mc_event_handler: 666648 callbacks suppressed >>> >> >>> >>  With this patch applied, the system just locks up without RCU >>> stalling: >>> >>     io scheduler mq-deadline registered >>> >>     io scheduler kyber registered >>> >>     microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 >>> >> ranges: >>> >>     microchip-pcie 2000000000.pcie:      MEM >>> >> 0x2008000000..0x2087ffffff -> 0x0008000000 >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: axi read request error >>> >>     microchip-pcie 2000000000.pcie: axi read timeout >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer >>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer >>> >>     Freeing initrd memory: 7332K >>> >> >>> >> As of yet, I have no idea if RCU stalls for other reasons would >>> also be >>> >> lost. >>> > >>> > Sorry this broke stuff.  I'm not entirely sure why this would mask RCU >>> > stalls, but it seems like we're hitting some pretty odd paths here and >>> > I'm not sure this is expected to work at all for us. >>> >>> I'm confused here. The RCU stall is itself a bug, right? Are you sure >>> this patch is wrongly masking the stall, or is it possibly just avoiding >>> some buggy code and not causing a stall in the first place? I still don't see why you consider this patch to have broken anything. Can anyone explain why getting an RCU stall is better than not getting one? Regards, Samuel