Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp433082rwd; Tue, 16 May 2023 03:11:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4fJ67zhaWxS0VwaPBbYr1rGJuGIAatm/SqzXCuriAn1eWPD7+C8nU5fflXboWmmelrF7XJ X-Received: by 2002:a05:6a00:2401:b0:63d:3339:e967 with SMTP id z1-20020a056a00240100b0063d3339e967mr46450331pfh.19.1684231872843; Tue, 16 May 2023 03:11:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684231872; cv=none; d=google.com; s=arc-20160816; b=ONv8ppnksUHXh0scIDr3CYzyTe5DrprGjE0ekX4+PNV7WNrYGVJDKouSsmZuj3NBtt ai1O96iaxIL1DCSwDSrackGhB2B3+sClEuUu59zzEDCAUGYUvJNJcE3L1OOje6t9xFxb gfm83W5/Cc/4zV4nym2onNSSbMLJETqlTl5GreHezyjo9lARKQfCeTNJIzBIo2CeHjtp UoIi2R4BKmWZhKYxKEOsU1sVPCqFPAJPn+CdI8XieKlexQDmhDmxZT/uL8xrszkJfgPi QYz7mg0IhoSy8IxAQVBvyyC1nj5y56xriWjIUI33MeIRPwebyxlqxwaY8zXKn1ZUi5kz fRuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=FYpRoAmF/+q4sCPIK0jarqSUeCIpXaaCVVpqesAXSwbzrTN6T8V0vBtqwye+yalzbT J5oOTpKEGvLpAphTkhiKQkiJOHIyvgxBSO+dLc/QFtMi8T7ipDV5q1rDxK+W+G2JiLZc vvtIOQVj0Io6T1WIy09UGmnCl0J4OG8McupOyCoHCsR96kZcfk5AiwxPpO2+SMyn8of0 kxs7GHBu8vLlSf/2w8AlTfq/U4oUPOOZiUIe49f7emHYVO1OpfUE2+g13VOjggsVFe/W 5gjjcPyXVxoxBXd6pOdM+SMCoqJpUHXpeC5nDpPqSSUT+m6BGgeGE1Id8IsOrt2aUPyZ I+/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="gK8/96Mw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u133-20020a63798b000000b0053415af4631si5121795pgc.73.2023.05.16.03.11.00; Tue, 16 May 2023 03:11:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="gK8/96Mw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232135AbjEPKJr (ORCPT + 99 others); Tue, 16 May 2023 06:09:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232270AbjEPKJo (ORCPT ); Tue, 16 May 2023 06:09:44 -0400 Received: from mail-vk1-xa2e.google.com (mail-vk1-xa2e.google.com [IPv6:2607:f8b0:4864:20::a2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA0FF1BF6 for ; Tue, 16 May 2023 03:09:33 -0700 (PDT) Received: by mail-vk1-xa2e.google.com with SMTP id 71dfb90a1353d-44fa585ad7aso8631882e0c.0 for ; Tue, 16 May 2023 03:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684231773; x=1686823773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=gK8/96Mwc9pJtoGDw27JnaeCATyvSkorVev6aES+s3evOWGv2eZ4S4ZQmr3dLrS1ji P75ZaYIR630JDnG938lXkgTpAzoCDm25UQTJicgxSxrC3C5xMLuqUAPnjpjFrMhTltvf 5nOShkedMWMDsdiQC31T8R7CGcauuP/Smnv++IlvHsPbYsldV+8gK/muWzPtBUnpiWnm Edf/xMXcnyiN3A+Aoitl4bcEl0LNRK/bxRLNRAcSY4buq6wTKN2GYV7kQg0AX0DercuO 4x6z9zmJyS5hNtZA48bVcCbR5vHvKIoMRTebZWJTjQO8cX5BYJGXGALHEndzBJd2BeEM QaIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684231773; x=1686823773; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/lEOYeeU5eTLm7de7UnRvbY+9wcsAqP7XOcqCDjWADA=; b=RNNJw/GsibINlFpa+syXZ3xA39eaaY6z+a4fO4GclOXRjSDMPeGPpJRYgE9KxK1VHm /HUE0uagaXIEP5FtmbEHcb9ITy85fJCN2LyeDx/Q3gxggJIrw/ASUlVDymnfEPdmDwXn nVNCJHzEAxfJvobazQY6/3BlCsr1g3V8leW4UJDn+dyfMsEfWNzXzjTliUswzXF7XLiQ AANnfWyvOUdMLTj5QIMPdB2I2wDkuKm4L322YoXvrk7VNTzjToFZckqpRs2n9OPEBvXG sh6hgTyio9XT+S4X1Bhdj65bdXZt06vCcVJu0VbCV4Iy+BU2F7o9TNKSc+O/6cD1fTa2 Af0A== X-Gm-Message-State: AC+VfDxabhEaX6Sro9qXCnYohU7fDnsu2UYVXSmjaoOt9QOTNPPtsg58 XN3gi3kq7XfKtuk9FoXMBBgHRiLh9JdbuJmpl1gREA== X-Received: by 2002:a1f:6dc6:0:b0:44f:eb0a:77db with SMTP id i189-20020a1f6dc6000000b0044feb0a77dbmr13395898vkc.11.1684231772863; Tue, 16 May 2023 03:09:32 -0700 (PDT) MIME-Version: 1.0 References: <20230419225604.21204-1-dianders@chromium.org> In-Reply-To: From: Sumit Garg Date: Tue, 16 May 2023 15:39:21 +0530 Message-ID: Subject: Re: [PATCH v8 00/10] arm64: Add framework to turn an IPI as NMI To: Doug Anderson Cc: Mark Rutland , Catalin Marinas , Will Deacon , Daniel Thompson , Marc Zyngier , ito-yuichi@fujitsu.com, kgdb-bugreport@lists.sourceforge.net, Chen-Yu Tsai , Masayoshi Mizuma , Peter Zijlstra , Ard Biesheuvel , "Rafael J . Wysocki" , linux-arm-kernel@lists.infradead.org, Stephen Boyd , Lecopzer Chen , Thomas Gleixner , linux-perf-users@vger.kernel.org, Alexandru Elisei , Andrey Konovalov , Ben Dooks , Borislav Petkov , Christophe Leroy , "Darrick J. Wong" , Dave Hansen , "David S. Miller" , "Eric W. Biederman" , Frederic Weisbecker , Gaosheng Cui , "Gautham R. Shenoy" , Greg Kroah-Hartman , "Guilherme G. Piccoli" , Guo Ren , "H. Peter Anvin" , Huacai Chen , Ingo Molnar , Ingo Molnar , "Jason A. Donenfeld" , Jason Wessel , Jianmin Lv , Jiaxun Yang , Jinyang He , Joey Gouly , Kees Cook , Laurent Dufour , Masahiro Yamada , Masayoshi Mizuma , Michael Ellerman , Nicholas Piggin , "Paul E. McKenney" , =?UTF-8?Q?Philippe_Mathieu=2DDaud=C3=A9?= , Pierre Gondois , Qing Zhang , "Russell King (Oracle)" , Russell King , Thomas Bogendoerfer , Ulf Hansson , WANG Xuerui , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, loongarch@lists.linux.dev, sparclinux@vger.kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 10 May 2023 at 22:20, Doug Anderson wrote: > > Hi, > > On Wed, May 10, 2023 at 9:30=E2=80=AFAM Mark Rutland wrote: > > > > On Wed, May 10, 2023 at 08:28:17AM -0700, Doug Anderson wrote: > > > Hi, > > > > Hi Doug, > > > > > On Wed, Apr 19, 2023 at 3:57=E2=80=AFPM Douglas Anderson wrote: > > > > This is an attempt to resurrect Sumit's old patch series [1] that > > > > allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs an= d > > > > also to round up CPUs in kdb/kgdb. The last post from Sumit that I > > > > could find was v7, so I called this series v8. I haven't copied all= of > > > > his old changelongs here, but you can find them from the link. > > > > Thanks Doug for picking up this work and for all your additions/improvement= s. > > > > Since v7, I have: > > > > * Addressed the small amount of feedback that was there for v7. > > > > * Rebased. > > > > * Added a new patch that prevents us from spamming the logs with id= le > > > > tasks. > > > > * Added an extra patch to gracefully fall back to regular IPIs if > > > > pseudo-NMIs aren't there. > > > > > > > > Since there appear to be a few different patches series related to > > > > being able to use NMIs to get stack traces of crashed systems, let = me > > > > try to organize them to the best of my understanding: > > > > > > > > a) This series. On its own, a) will (among other things) enable sta= ck > > > > traces of all running processes with the soft lockup detector if > > > > you've enabled the sysctl "kernel.softlockup_all_cpu_backtrace".= On > > > > its own, a) doesn't give a hard lockup detector. > > > > > > > > b) A different recently-posted series [2] that adds a hard lockup > > > > detector based on perf. On its own, b) gives a stack crawl of th= e > > > > locked up CPU but no stack crawls of other CPUs (even if they're > > > > locked too). Together with a) + b) we get everything (full locku= p > > > > detect, full ability to get stack crawls). > > > > > > > > c) The old Android "buddy" hard lockup detector [3] that I'm > > > > considering trying to upstream. If b) lands then I believe c) wo= uld > > > > be redundant (at least for arm64). c) on its own is really only > > > > useful on arm64 for platforms that can print CPU_DBGPCSR somehow > > > > (see [4]). a) + c) is roughly as good as a) + b). > > > > > It's been 3 weeks and I haven't heard a peep on this series. That > > > means nobody has any objections and it's all good to land, right? > > > Right? :-P For me it was months waiting without any feedback. So I think you are lucky :) or atleast better than me at poking arm64 maintainers. > > > > FWIW, there are still longstanding soundness issues in the arm64 pseudo= -NMI > > support (and fixing that requires an overhaul of our DAIF / IRQ flag > > management, which I've been chipping away at for a number of releases),= so I > > hadn't looked at this in detail yet because the foundations are still s= omewhat > > dodgy. > > > > I appreciate that this has been around for a while, and it's on my queu= e to > > look at. > > Ah, thanks for the heads up! We've been thinking about turning this on > in production in ChromeOS because it will help us track down a whole > class of field-generated crash reports that are otherwise opaque to > us. It sounds as if maybe that's not a good idea quite yet? Do you > have any idea of how much farther along this needs to go? ...of > course, we've also run into issues with Mediatek devices because they > don't save/restore GICR registers properly [1]. In theory, we might be > able to work around that in the kernel. > > In any case, even if there are bugs that would prevent turning this on > for production, it still seems like we could still land this series. > It simply wouldn't do anything until someone turned on pseudo NMIs, > which wouldn't happen till the kinks are worked out. I agree here. We should be able to make the foundations robust later on. IMHO, until we turn on features surrounding pseudo NMIs, I am not sure how we can have true confidence in the underlying robustness. -Sumit > > ...actually, I guess I should say that if all the patches of the > current series do land then it actually _would_ still do something, > even without pseudo-NMI. Assuming the last patch looks OK, it would at > least start falling back to using regular IPIs to do backtraces. That > wouldn't get backtraces on hard locked up CPUs but it would be better > than what we have today where we don't get any backtraces. This would > get arm64 on par with arm32... > > [1] https://issuetracker.google.com/281831288