Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp936377rwl; Thu, 5 Jan 2023 06:39:48 -0800 (PST) X-Google-Smtp-Source: AMrXdXtF6dPxIAej7n8elsZe2jKpZ1DAeUzOks1nMBsYkXWHwkFvEDq8QXZrSAUznKMWoY3pFxNI X-Received: by 2002:a05:6a20:a6a7:b0:9d:efbe:e601 with SMTP id ba39-20020a056a20a6a700b0009defbee601mr70654490pzb.29.1672929588236; Thu, 05 Jan 2023 06:39:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672929588; cv=none; d=google.com; s=arc-20160816; b=kGZkIsPjD73VrnV8cEgJjpZedn47dgt09MCitfl4kviWKoorQ4yrEZ9LRu7Dqv5f4/ XmKHEfDWkjyW1+t8+FH6ornGQDniWF6weqeNKeNpaw15LKcCCeYlAh/gnMQzh64f/xAH 4vG5KOnJujvEZPBGGQMxCwjMLonfaOQvjIteUT/vDsg2Q/rCLeAsJ+c/PJhQ0JjnH25/ 20FxosmIbpwny+3BNvaO/WrKUN6eucnIim3agGOVo9RyRs62605InsipVHBMFjMRjEUq nc1BrjNj0dJT8FXBjVcUp0V+a4Gt+XRgLtHfuYTNSarDQta/IYB5gq8PHpcb1/5YzCUD XA0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DO1j61Cs0CODajynNCN4RwSHEY4mAfxjiSsZSDDGAQE=; b=tnX2MHOhVUz7TKv+T2eATXbksZLsCabCoxFIAhtuoKbPScuFx5stVXI1Maar1mvmXW Zow0LQHcw8rNf5FeJfO5+qrIggYOkQUHo7ZghwEzAPy9dwzTtb1MN01WYWRYU7t6lTtC 7rqaYERl91aSfBPq5lek7sqHdL3w6oqrHhHhQVZrQ09gRSvvy39OePut+mPAClBbODzR o1Sd03Tt9IVhivmWVqP+vsnNhF7TxU2P4EJ1sEmWgpBeE+Y26nmt0gxsexgGpCpgzUke u6Bb/lnXeb9bGFOcFZcg0jd7Cp8w9LuR/+P8JaOs9QmzRuXNJo9+Iq6CXLsmGzXWEoOs g8UA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e7-20020a630f07000000b00479139fe810si39446067pgl.619.2023.01.05.06.39.40; Thu, 05 Jan 2023 06:39:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234432AbjAEONa (ORCPT + 55 others); Thu, 5 Jan 2023 09:13:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234356AbjAEOMi (ORCPT ); Thu, 5 Jan 2023 09:12:38 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E94EE59319; Thu, 5 Jan 2023 06:12:22 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73CE315BF; Thu, 5 Jan 2023 06:13:04 -0800 (PST) Received: from FVFF77S0Q05N (unknown [10.57.45.56]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 78B243F71A; Thu, 5 Jan 2023 06:12:21 -0800 (PST) Date: Thu, 5 Jan 2023 14:12:11 +0000 From: Mark Rutland To: Yogesh Lal Cc: maz@kernel.org, daniel.lezcano@linaro.org, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, "linux-kernel@vger.kernel.org" , "linux-arm-msm@vger.kernel.org" Subject: Re: ERRATUM_858921 is broken on 5.15 kernel Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 05, 2023 at 07:03:48PM +0530, Yogesh Lal wrote: > Hi, > > We are observing issue on A73 core where ERRATUM_858921 is broken. Do you *only* see this issue on v5.15.y, or is mainline (e.g. v6.2-rc2) also broken? I don't see any fix that fits your exact description below, but I do see that we've made a bunch of changes in this area since. > > On 5.15 kernel arch_timer_enable_workaround is set by reading > arm64_858921_read_cntpct_el0 and arm64_858921_read_cntvct_el0 during timer > register using following path. > > arch_timer_enable_workaround->atomic_set(&timer_unstable_counter_workaround_in_use, > 1); > > [code snap] > 564 static > 565 void arch_timer_enable_workaround(const struct > arch_timer_erratum_workaround *wa, > 566                               bool local) > 567 { > 568     int i; > 569 > 570     if (local) { > 571 __this_cpu_write(timer_unstable_counter_workaround, wa); > 572     } else { > 573             for_each_possible_cpu(i) > 574                     per_cpu(timer_unstable_counter_workaround, i) = wa; > 575     } > 576 > 577     if (wa->read_cntvct_el0 || wa->read_cntpct_el0) > 578 atomic_set(&timer_unstable_counter_workaround_in_use, 1); > > > and based on above workaround enablement , appropriate function to get > counter is used. > > 1008 static void __init arch_counter_register(unsigned type) > 1009 { > 1010     u64 start_count; > 1011 > 1012     /* Register the CP15 based counter if we have one */ > 1013     if (type & ARCH_TIMER_TYPE_CP15) { > 1014         u64 (*rd)(void); > 1015 > 1016         if ((IS_ENABLED(CONFIG_ARM64) && !is_hyp_mode_available()) || > 1017             arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) { > 1018             if (arch_timer_counter_has_wa()) > 1019                 rd = arch_counter_get_cntvct_stable; > 1020             else > 1021                 rd = arch_counter_get_cntvct; > 1022         } else { > 1023             if (arch_timer_counter_has_wa()) > 1024                 rd = arch_counter_get_cntpct_stable; > 1025             else > 1026                 rd = arch_counter_get_cntpct; > 1027         } > [snap] > 1043     /* 56 bits minimum, so we assume worst case rollover */ > 1044     sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate); > > > As our boot cores are not impacted by errata sched_clock_register() will > register !arch_timer_counter_has_wa() callback. It would be helpful to mention this fact (that the system is big.LITTLE, and the boot cores are not Cortex-A73) earlier in the report. > Now when errata impacted core boots up and sched_clock_register already > register will !arch_timer_counter_has_wa() path. > As sched_clock_register is not per_cpu bases so arch_timer_read_counter will > always point to !arch_timer_counter_has_wa() function calls. Hmm... yes, AFAICT this cannot work unless the affected CPUs are up before we probe, and it doesn't make much sense for arch_counter_register() to look at arch_timer_counter_has_wa() since it can be called before all CPUs are up. > Looks like this bug is side effect of following patch: > > commit 0ea415390cd345b7d09e8c9ebd4b68adfe873043 > Author: Marc Zyngier > Date:   Mon Apr 8 16:49:07 2019 +0100 > >     clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable > counters > >     Instead of always going via arch_counter_get_cntvct_stable to access the >     counter workaround, let's have arch_timer_read_counter point to the >     right method. > >     For that, we need to track whether any CPU in the system has a >     workaround for the counter. This is done by having an atomic variable >     tracking this. > >     Acked-by: Mark Rutland >     Signed-off-by: Marc Zyngier >     Signed-off-by: Will Deacon > Yeah, that does look to be broken, but I think there are futher issues anyway (e.g. late onlining). AFAICT we need to detect this *stupidly early* in the CPU bringup path in order to handle this safely, which is quite painful. What a great. Thanks, Mark.