Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1471272pxp; Thu, 10 Mar 2022 06:10:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJxrlYi/dQP41FXH0OjrrWlU4GX1zx0LJGyMq3VoLvF4/PpKBcdBfG82ZvmzS1pEHjdLM4yD X-Received: by 2002:a17:907:76f7:b0:6d6:e922:3cd with SMTP id kg23-20020a17090776f700b006d6e92203cdmr4676744ejc.386.1646921453162; Thu, 10 Mar 2022 06:10:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646921453; cv=none; d=google.com; s=arc-20160816; b=YX/L7Lxs8izTHi8YYi0DVtG2OuS0Z9XyfiW2Ys7qw/3n3t+2/fIPNeSL+DbSauTEKO eHYtKU8njs1GOMKM4xDs8miNMy9vUlx9vtPNKHCI05M0bYcJleW7ms11WvoF0t7/NuV6 o5x3h7n5Yy0/zi5Qe/zcFmM2np2cahHHMEzGOm+Rkp6OW3CeoqA5KgL60KUnupVR1Qqt 6CkM8tWObWYGoJ8gCAEOFCk/yevcr7LItyiT+EM4jW4ln8qshpdI6TWlqmqKEJ1T+MSY XgjiZLl3QSQJeqQgcvZvKP03sfYPU4L4t4F68EPbcR1pjaoXPDWmWJs1YgsXPs6ewYGb lgqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=zmYAbKnVCCZBlerHQbcYZ1CzE/Z7tVO/OGbPWC3kXis=; b=YUNyxWdcu2ymPJrdIvNtmsq/XX8/V2bs7FCE6cI4bZzULX9/+ne2EXkZIzYEC4Hgu0 MQUftOnPYm8Ck2K7phOepudlub5o1HF7dBAhdSW693mf/qnH8yX4sn1qRdd2B5dWxIfg N8NSMiAVXQ5xaB7OpPfMrZBZgC8o3i/sZco+C9WjkwxmzYKnmE841KUHY0OXscoHg+n1 YoQZJpLMyAd9bdckcNp/E3T08r/+g1V9hLvHy9RtZxqm9i4QBG6SeVcOgp+M4I8J9UN8 KfM5yX93vMSwSRwzpewQGFKljBTCebtz4HyV06qyTZGlH3IW5DxUbcW0G6rDUk0yNu25 YC4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=1JYrbN1E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b8-20020a056402350800b00416a29c6644si2542042edd.279.2022.03.10.06.10.28; Thu, 10 Mar 2022 06:10:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=1JYrbN1E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242036AbiCJMhF (ORCPT + 99 others); Thu, 10 Mar 2022 07:37:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232156AbiCJMhE (ORCPT ); Thu, 10 Mar 2022 07:37:04 -0500 Received: from pandora.armlinux.org.uk (pandora.armlinux.org.uk [IPv6:2001:4d48:ad52:32c8:5054:ff:fe00:142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCBA6141E23; Thu, 10 Mar 2022 04:36:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=zmYAbKnVCCZBlerHQbcYZ1CzE/Z7tVO/OGbPWC3kXis=; b=1JYrbN1ExafiKL3d3v99aHh4Nt 1kEo2q+AuKnq8xNXYr66yfslkv8hP+eRsPnkhrel/uVcxEIFtKdue2bC/DZ5qRQDbOnuYud3KV2ix ZM5vAiMmEdACb7S8FMNjB44ablaxRhEiV9ypbq8bkDyz4W+57dZzf4cZYA2ZGglyCYLTO8MAS1yAn /dbn9xVhVVVF+QhQ3GiC9/F7Ja4Fxc7q/8ZpfWtIFw2sTAjKKQf5UoJrOA+1NxwMKP/g8HXvrrDH0 RM3kNsF6GJzyeLQWzGiZsqx0j52ZgqP0+T3w0JCjn2pXTIdj8sc7EnQDqOrV4ax0qv6xeclYqq8oJ csfoehfw==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:57764) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nSI1N-00018O-TJ; Thu, 10 Mar 2022 12:35:58 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1nSI1L-0000Yc-38; Thu, 10 Mar 2022 12:35:55 +0000 Date: Thu, 10 Mar 2022 12:35:55 +0000 From: "Russell King (Oracle)" To: Ard Biesheuvel Cc: Naresh Kamboju , open list , Linux-Next Mailing List , Linux ARM , Linus Walleij , Arnd Bergmann , Corentin Labbe , Stephen Rothwell Subject: Re: [next] arm: Internal error: Oops: 5 PC is at __read_once_word_nocheck Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Russell King (Oracle) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 09, 2022 at 09:42:29PM +0100, Ard Biesheuvel wrote: > On Wed, 9 Mar 2022 at 20:39, Russell King (Oracle) > wrote: > > > > On Wed, Mar 09, 2022 at 08:14:30PM +0100, Ard Biesheuvel wrote: > > > The backtrace dumped by __die() uses the pt_regs from the exception > > > context as the starting point, so the exception entry code that deals > > > with the condition that triggered the oops is omitted, and does not > > > have to be unwound. > > > > That is true, but that's not really the case I was thinking about. > > I was thinking more about cases such as RCU stalls, soft lockups, > > etc. > > > > For example: > > > > https://www.linuxquestions.org/questions/linux-kernel-70/kenel-v4-4-60-panic-in-igmp6_send-and-and-__neigh_create-4175704721/ > > > > In that stack trace, the interesting bits are not the beginning of > > the stack trace down to __irq_svc, but everything beyond __irq_svc, > > since the lockup is probably caused by being stuck in > > _raw_write_lock_bh(). > > > > It's these situations that we will totally destroy debuggability for, > > and the only way around that would be to force frame pointers and > > ARM builds (not Thumb-2 as that requires the unwinder... which means > > a Thumb-2 kernel soft lockup would be undebuggable. > > > > Indeed. > > But that means that the only other choice we have is to retain the > imprecise nature of the current solution (which usually works fine > btw), and simply deal with the faulting double dereference of vsp in > the unwinder code. We simply don't know whether the exception was > taken at a point where the stack frame is consistent with the unwind > data. Okay, further analysis (for the record, since I've said much of this on IRC): What we have currently is a robust unwinder that will cope when things go wrong, such as an interrupt taken during the prologue of a function. The way it copes is by two mechanisms: /* store the highest address on the stack to avoid crossing it*/ low = frame->sp; ctrl.sp_high = ALIGN(low, THREAD_SIZE); These two represent the allowable bounds of the kernel stack. When we run the unwinder, before each unwind instruction we check whether the current SP value is getting close to the top of the kernel stack, and if so, turn on additional checking: if ((ctrl.sp_high - ctrl.vrs[SP]) < sizeof(ctrl.vrs)) ctrl.check_each_pop = 1; that will ensure if we go off the top of the kernel stack, the unwinder will report failure, and not access those addresses. After each instruction, we check whether the SP value is within the above bounds: if (ctrl.vrs[SP] < low || ctrl.vrs[SP] >= ctrl.sp_high) return -URC_FAILURE; This means that the unwinder can never modify SP to point outside of the kernel stack region identified by low..ctrl.sp_high, thereby protecting the load inside unwind_pop_register() from ever dereferencing something outside of the kernel stack. Moreover, it also prevents the unwinder modifying SP to point below the current stack frame. The problem has been introduced by trying to make the unwinder cope with IRQ stacks in b6506981f880 ("ARM: unwind: support unwinding across multiple stacks"): - if (!load_sp) + if (!load_sp) { ctrl->vrs[SP] = (unsigned long)vsp; + } else { + ctrl->sp_low = ctrl->vrs[SP]; + ctrl->sp_high = ALIGN(ctrl->sp_low, THREAD_SIZE); + } Now, whenever SP is loaded, we reset the allowable range for the SP value, and this completely defeats the protections we previously had which were ensuring that: 1) the SP value doesn't jump back _down_ the kernel stack resulting in an infinite unwind loop. 2) the SP value doesn't end up outside the kernel stack. We need those protections to prevent these problems that are being reported - and the most efficient way I can think of doing that is to somehow valudate the new SP value _before_ we modify sp_low and sp_high, so these two limits are always valid. Merely changing the READ_ONCE_NOCHECK() to be get_kernel_nocheck() will only partly fix this problem - it will stop the unwinder oopsing the kernel, but definitely doesn't protect against (1) and doesn't protect against SP pointing at some thing that is accessible (e.g. a device or other kernel memory.) We're yet again at Thursday, with the last linux-next prior to the merge window being created this evening, which really doesn't leave much time to get this sorted... and we can't say "this code should have been in earlier in the cycle" this time around, because these changes to the unwinder have been present in linux-next prior to 5.17-rc2. Annoyingly, it seems merging stuff earlier in the cycle doesn't actually solve the problem of these last minute debugging panics. Any suggestions for what we should do? Can we come up with some way to validate the new SP value before 6pm UTC this evening? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!