Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp928972iob; Wed, 4 May 2022 10:52:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJztEDpYSxHrV9y4bAH4PcJ9teyVSLwB33rLznwFYBi3osOtY7o5wlClq13Oj5kDLsProKAJ X-Received: by 2002:a17:90a:e7c3:b0:1dc:5d85:9fd0 with SMTP id kb3-20020a17090ae7c300b001dc5d859fd0mr734584pjb.168.1651686733651; Wed, 04 May 2022 10:52:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651686733; cv=none; d=google.com; s=arc-20160816; b=uR3BvCtNUykpRjQ0JvIe3+HHZ56asedsJx0hxFpUrexAAtN971qQQrx0PhNsSLns1k YET4QM6EnLcZxJTGMi/GVrLUebt4d23dw5KbW83uSOmlsS0spqWmSVLFwU5RQbbJCJvA q2o0kQZsUOP9HI3IVxCt54I26/F2r7WLnfRd1sFPneTguhV1/A497Kq05nSuurzlDNWS 6PfsZ8MyZc/Q5pcnkE+DMv9j1s1n6almx4bGeCtMfUJE0ktEKI4yPlDysTWztRaFH39R uPO9vBYjNSxbJbHhGQKLdD/nAndey93gW0Sc539hUnN4mszfxgdv7Eo1tbZkpfD46otj Tw+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=czLZHU7MG6H1spBLHXY5jaZbzV59tXkMuN+Ck8Z7wc4=; b=R+INfquq33qqNAxSdl7aUU4pBvoO6tIeKICgz0d1N529HbePMmO+d+ElvdXZnB7nv9 pb5ZVgQOf9lPbsYGg9lwD/+y3odCSkPK3ZetLCj/W5PG/ApqWIsykipH5BOyTXIvj4s8 uDHN56cpUiGC33unPQdaS8DgDPaWLeDQOgnBwX+2f5qiwLnaADmBAA9l5xIWSmJCYjSi Jh39dIgiy7rBzgQK+5MT/zqDa8WMtQZAhzlm/e8SSI9GlWWA5zNFqrAxdz3PAg6KD4wZ AtwQRMyHKvh/wzSXtWZlif12BdCWgh/+alEW13S8pjzVKJYHDPxO3rq5mA+VcNARP5Yu sF1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=hhWjsHnm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 17-20020a630a11000000b003c14501027asi14104417pgk.774.2022.05.04.10.51.56; Wed, 04 May 2022 10:52:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=hhWjsHnm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243473AbiECWHy (ORCPT + 99 others); Tue, 3 May 2022 18:07:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234803AbiECWHx (ORCPT ); Tue, 3 May 2022 18:07:53 -0400 Received: from mail-oa1-x2d.google.com (mail-oa1-x2d.google.com [IPv6:2001:4860:4864:20::2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85FB9424B5 for ; Tue, 3 May 2022 15:04:19 -0700 (PDT) Received: by mail-oa1-x2d.google.com with SMTP id 586e51a60fabf-e2fa360f6dso18598641fac.2 for ; Tue, 03 May 2022 15:04:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=czLZHU7MG6H1spBLHXY5jaZbzV59tXkMuN+Ck8Z7wc4=; b=hhWjsHnmVba6J3iaRJaO09nZL4LxjIq8C9W91LOfNC4XgTIwgTHymewopsSR7xhrbE wJfMPDofFT9Orm5X1Xmnq6csdeG70AmAjQHsJ6oidSvnu+c5NqT0zTpfwvGt6c7imJaY Yx4JIofQGY7s+yKOtcqFuF1p2zpjcELmsl0YE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=czLZHU7MG6H1spBLHXY5jaZbzV59tXkMuN+Ck8Z7wc4=; b=wQRguaFNdmzFsqJ2OgQd9uSHQYMqJu/+kp2fFKGqum10vQjkTnzWwGpp8VYOqWjFdb 0ZLQ90qLVMQZwJ17/8JslDxi6dYObyOg3LKuMakH3NmktoI1Jp1p1KI6h7qGgjgtkH5v bIqM0/njMSa5kam3ZVjo0TJPC9cyVYyYzUXpKSdqj2refYtzb6Y+0vmLHoaR4bGAqxcb cWOi8QzxvqX1dW+YmDCzy67BC8uGGWJtrVF/ksfu4Bs5wTlKpBEgcNWQIPNbYiZWDvPb SJ8fNIeOXelPr+47BwDppjKJQWRQavAV2eG8KhZZjFUgFaD8+yIxhFvh8PBc+GYneZxQ VQFg== X-Gm-Message-State: AOAM532oEsiBoZI2xEVgK+NTxvVzwdMVxOUeSiMeWXbiNRFlrBy5eu54 6RAbdkJGBAjo7LGf+oXrOuHEhZXTJ3mzYCcE X-Received: by 2002:a05:6870:d68e:b0:e2:af08:6cc3 with SMTP id z14-20020a056870d68e00b000e2af086cc3mr2682022oap.189.1651615458636; Tue, 03 May 2022 15:04:18 -0700 (PDT) Received: from mail-oi1-f175.google.com (mail-oi1-f175.google.com. [209.85.167.175]) by smtp.gmail.com with ESMTPSA id r4-20020a056830120400b0060603221240sm4405948otp.16.2022.05.03.15.04.18 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 03 May 2022 15:04:18 -0700 (PDT) Received: by mail-oi1-f175.google.com with SMTP id m11so19573516oib.11 for ; Tue, 03 May 2022 15:04:18 -0700 (PDT) X-Received: by 2002:a05:6808:219a:b0:325:93fc:e0fd with SMTP id be26-20020a056808219a00b0032593fce0fdmr2775646oib.241.1651615054192; Tue, 03 May 2022 14:57:34 -0700 (PDT) MIME-Version: 1.0 References: <20220427224924.592546-1-gpiccoli@igalia.com> <20220427224924.592546-5-gpiccoli@igalia.com> In-Reply-To: From: Evan Green Date: Tue, 3 May 2022 14:56:58 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 04/30] firmware: google: Convert regular spinlock into trylock on panic path To: "Guilherme G. Piccoli" Cc: Andrew Morton , bhe@redhat.com, pmladek@suse.com, kexec@lists.infradead.org, LKML , bcm-kernel-feedback-list@broadcom.com, linuxppc-dev@lists.ozlabs.org, linux-alpha@vger.kernel.org, linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org, linux-leds@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, Linux PM , linux-remoteproc@vger.kernel.org, linux-s390@vger.kernel.org, linux-tegra@vger.kernel.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, netdev@vger.kernel.org, openipmi-developer@lists.sourceforge.net, rcu@vger.kernel.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org, x86@kernel.org, kernel-dev@igalia.com, kernel@gpiccoli.net, halves@canonical.com, fabiomirmar@gmail.com, alejandro.j.jimenez@oracle.com, Andy Shevchenko , Arnd Bergmann , Borislav Petkov , Jonathan Corbet , d.hatayama@jp.fujitsu.com, dave.hansen@linux.intel.com, dyoung@redhat.com, feng.tang@intel.com, Greg Kroah-Hartman , mikelley@microsoft.com, hidehiro.kawai.ez@hitachi.com, jgross@suse.com, john.ogness@linutronix.de, Kees Cook , luto@kernel.org, mhiramat@kernel.org, mingo@redhat.com, paulmck@kernel.org, peterz@infradead.org, rostedt@goodmis.org, senozhatsky@chromium.org, Alan Stern , Thomas Gleixner , vgoyal@redhat.com, vkuznets@redhat.com, Will Deacon , Ard Biesheuvel , David Gow , Julius Werner Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Guilherme, On Tue, May 3, 2022 at 12:12 PM Guilherme G. Piccoli wrote: > > On 03/05/2022 15:03, Evan Green wrote: > > [...] > > gsmi_shutdown_reason() is a common function called in other scenarios > > as well, like reboot and thermal trip, where it may still make sense > > to wait to acquire a spinlock. Maybe we should add a parameter to > > gsmi_shutdown_reason() so that you can get your change on panic, but > > we don't convert other callbacks into try-fail scenarios causing us to > > miss logs. > > > > Hi Evan, thanks for your feedback, much appreciated! > What I've done in other cases like this was to have a helper checking > the spinlock in the panic notifier - if we can acquire that, go ahead > but if not, bail out. For a proper example of an implementation, check > patch 13 of the series: > https://lore.kernel.org/lkml/20220427224924.592546-14-gpiccoli@igalia.com/ . > > Do you agree with that, or prefer really a parameter in > gsmi_shutdown_reason() ? I'll follow your choice =) I'm fine with either, thanks for the link. Mostly I want to make sure other paths to gsmi_shutdown_reason() aren't also converted to a try. > > > > Though thinking more about it, is this really a Good Change (TM)? The > > spinlock itself already disables interrupts, meaning the only case > > where this change makes a difference is if the panic happens from > > within the function that grabbed the spinlock (in which case the > > callback is also likely to panic), or in an NMI that panics within > > that window. The downside of this change is that if one core was > > politely working through an event with the lock held, and another core > > panics, we now might lose the panic log, even though it probably would > > have gone through fine assuming the other core has a chance to > > continue. > > My feeling is that this is a good change, indeed - a lot of places are > getting changed like this, in this series. > > Reasoning: the problem with your example is that, by default, secondary > CPUs are disabled in the panic path, through an IPI mechanism. IPIs take > precedence and interrupt the work in these CPUs, effectively > interrupting the "polite work" with the lock held heh The IPI can only interrupt a CPU with irqs disabled if the IPI is an NMI. I haven't looked before to see if we use NMI IPIs to corral the other CPUs on panic. On x86, I grepped my way down to native_stop_other_cpus(), which looks like it does a normal IPI, waits 1 second, then does an NMI IPI. So, if a secondary CPU has the lock held, on x86 it has roughly 1s to finish what it's doing and re-enable interrupts before smp_send_stop() brings the NMI hammer down. I think this should be more than enough time for the secondary CPU to get out and release the lock. So then it makes sense to me that you're fixing cases where we panicked with the lock held, or hung with the lock held. Given the 1 second grace period x86 gives us, I'm on board, as that helps mitigate the risk that we bailed out early with the try and should have spun a bit longer instead. Thanks. -Evan > > Then, such CPU is put to sleep and we finally reach the panic notifier > hereby discussed, in the main CPU. If the other CPU was shut-off *with > the lock held*, it's never finishing such work, so the lock is never to > be released. Conclusion: the spinlock can't be acquired, hence we broke > the machine (which is already broken, given it's panic) in the path of > this notifier. > This should be really rare, but..possible. So I think we should protect > against this scenario. > > We can grab others' feedback if you prefer, and of course you have the > rights to refuse this change in the gsmi code, but from my > point-of-view, I don't see any advantage in just assume the risk, > specially since the change is very very simple. > > Cheers, > > > Guilherme