Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3590329pxp; Tue, 8 Mar 2022 18:14:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJypSqkOukONfQgawIYn1vLSl1+zlU3b1HHrV9nasf1KiWqNbsugVFkN5uLrjnk9A37+exEo X-Received: by 2002:a63:3482:0:b0:37d:101e:a93d with SMTP id b124-20020a633482000000b0037d101ea93dmr16547343pga.425.1646792081830; Tue, 08 Mar 2022 18:14:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646792081; cv=none; d=google.com; s=arc-20160816; b=a6N3+YqPogj6VLinZL2qpY0JTu9haaQLb54C+FienGBaOQy2TEOd9QGjCfo/n14yKU Dwwl4w2GPKWvmBFA/97+JeHfNgNC6+b3ERh1ac2qXJg7P9dKi7FpU26IZLiDA4MZgN7M 3uH+zDvKDTyHt4L4Z965bxP/mnKAP8Oa4u1gIsAfQuxBuegYmfj4DwKuWrBYpjbv+jA1 so+ZWR2X8qikOTGJircpcDwtI4M/ttD+RuG/UlFGNKVXNBglV0L7uya9+IdYVsCtVY3i /PQ4XCyCXCBQoHRhnjjiGeEtew21A1ujnnHvaN6bUZHAJcmRdWTypD9qCDmRm6HwHPNb OIEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=JWRnRAXSj1KYbRLgyTpp+qOfp9s6oy6noKV18zlkBxs=; b=N9Npx2TmFHz4X0mO7gwEiyKwJWOfW5+/oJKj6XR70oH0zWyNuXISCkGxKIpqUawGNU 1FZ2+GE105+Nv7RgzAd/z6DNY1qrGjG4us/3eswaQWwwOhusti86IKT8Z3olp7PSFFt4 2lSfjpUttt4bTwPuQ0GPutQPy7O5gFFH29clZiyxDyr/hNorMb3hmEV5kgmGCRqCFzIA 0g9ndYcCFEOchGYtI5yb3nqWGC/p/qW9POHSPoTIFlx1Og0nCizlj90re4+aGCgMje4C mgBeMKy+Y+Ua4lJGdC/jC78fAZZQKyw0jaMRf39Hpa46cpjwLUwknpIYUzQyhEaMVd5T PCKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=jmgnrLWJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id z20-20020a1709028f9400b0014d5b6f56c2si599610plo.191.2022.03.08.18.14.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Mar 2022 18:14:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=jmgnrLWJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BF22913C273; Tue, 8 Mar 2022 17:05:14 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229848AbiCIBGF (ORCPT + 99 others); Tue, 8 Mar 2022 20:06:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229760AbiCIBF4 (ORCPT ); Tue, 8 Mar 2022 20:05:56 -0500 Received: from pandora.armlinux.org.uk (pandora.armlinux.org.uk [IPv6:2001:4d48:ad52:32c8:5054:ff:fe00:142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53A1FD5566 for ; Tue, 8 Mar 2022 16:44:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=JWRnRAXSj1KYbRLgyTpp+qOfp9s6oy6noKV18zlkBxs=; b=jmgnrLWJT3+j+/K9nTLQtOmVv1 0Pjv/cN3hSSqgvioOZSUKVk/HXwdjcxqRqKSGNSECikcc4AswCoR7XStRH2wpxCthhZ/u0kRhmd90 98O0UjPACa4S3R4BGnOPCzO8J80TAsSr0xprVpKBwQcimy3bYLCcw/WCIYlDNq9aclfvdMZ9LJBKZ UuUf4qGA/rZ7Z+Mtw1TGzQHbyEtR95sefLNs5nVT6uuV24ZejUie70qhx+SqbLszuYMdNmCnxs8iq 78wDY7gh/74kW9zx0RYI+XFVu1U2iet5gGJxwvutEpNw/w59rYWiON6QZ+1apwqsb9LFXDsiWHPC9 +5AfYJaQ==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:57726) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nRjlg-00017j-CG; Wed, 09 Mar 2022 00:01:28 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1nRjle-0007LU-Gs; Wed, 09 Mar 2022 00:01:26 +0000 Date: Wed, 9 Mar 2022 00:01:26 +0000 From: "Russell King (Oracle)" To: Ard Biesheuvel Cc: Corentin Labbe , Linus Walleij , Arnd Bergmann , Linux ARM , Linux Kernel Mailing List Subject: Re: boot flooded with unwind: Index not found Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Russell King (Oracle) X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 02, 2022 at 11:22:29AM +0000, Russell King (Oracle) wrote: > On Wed, Mar 02, 2022 at 12:19:40PM +0100, Ard Biesheuvel wrote: > > On Wed, 2 Mar 2022 at 12:12, Russell King (Oracle) > > wrote: > > > > > > On Wed, Mar 02, 2022 at 11:09:49AM +0100, Corentin Labbe wrote: > > > > The crash disappeared (but the suspicious RCU usage is still here). > > > > > > As the trace on those is: > > > > > > [ 0.239629] unwind_backtrace from show_stack+0x10/0x14 > > > [ 0.239654] show_stack from init_stack+0x1c54/0x2000 > > > > > > unwind_backtrace() and show_stack() are both C code, the compiler will > > > emit the unwind information for it. show_stack() isn't called from > > > assembly code, only from C code, so the next function's unwind > > > information should also be generated by the compiler. > > > > > > However, init_stack is not a function - it's an array of unsigned long. > > > There is no way this should appear in the trace, and this suggests that > > > the unwind of show_stack() has gone wrong. > > > > > > I don't see anything obvious in Ard's changes that would cause that > > > though. > > > > > > Did it used to work fine with previous versions of linux-next - those > > > versions where we had Ard's "arm-vmap-stacks-v6" tag merged in > > > (commit 2fa394824493) and did this only appear when I merged > > > "arm-ftrace-for-rmk" (commit 74aaaa1e9bba) ? Did merging > > > "arm-ftrace-for-rmk" cause any change in your .config? > > > > > > > I can reproduce the RCU warnings, and I have tracked this down to the > > change I made to return_address() for the graph tracer, which I > > thought was justified after removing the call to > > kernel_text_address(): > > > > --- a/arch/arm/include/asm/ftrace.h > > +++ b/arch/arm/include/asm/ftrace.h > > @@ -35,26 +35,8 @@ static inline unsigned long > > ftrace_call_adjust(unsigned long addr) > > > > #ifndef __ASSEMBLY__ > > > > -#if defined(CONFIG_FRAME_POINTER) && !defined(CONFIG_ARM_UNWIND) > > -/* > > - * return_address uses walk_stackframe to do it's work. If both > > - * CONFIG_FRAME_POINTER=y and CONFIG_ARM_UNWIND=y walk_stackframe uses unwind > > - * information. For this to work in the function tracer many functions would > > - * have to be marked with __notrace. So for now just depend on > > - * !CONFIG_ARM_UNWIND. > > - */ > > - > > void *return_address(unsigned int); > > > > -#else > > - > > -static inline void *return_address(unsigned int level) > > -{ > > - return NULL; > > -} > > - > > -#endif > > - > > #define ftrace_return_address(n) return_address(n) > > > > #define ARCH_HAS_SYSCALL_MATCH_SYM_NAME > > > > However, the function graph tracer works happily with this bit > > reverted, and so that is probably the best course of action here. > > > > I have already sent the patch that reintroduces the > > kernel_text_address() check - would you prefer a v2 of that one with > > this change incorporated? Or a second patch that just reverts the > > above? (Given that the bogus dereference was invoked from > > return_address() as well, I suspect that this change would make the > > get_kernel_nofault() change I proposed in this thread redundant) > > I'd prefer patches on top of my devel-stable branch, thanks. To reinterate what I've just put on IRC - we have not got to the bottom of this problem yet - it still very much exists. There seems to be something of a fundamental issue with the unwinder, it now appears to be going wrong and failing to unwind beyond a couple of functions, and the address it's coming out with appears to be incorrect. I've only just discovered this because I created my very own bug, and yet again, the timing sucks with the proximity of the merge window. I'm getting: [ 13.198803] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [ 13.198820] [] (show_stack) from [] (0xc2be78d4) for the WARN_ON() stacktrace, and that address that apparently called show_stack() is most definitely rubbish and incorrect. This makes any WARN_ON() condition undebuggable. This is with both 9183/1 and 9184/1 applied on top of pulling your "arm-ftrace-for-rmk" tag and also with just the "arm-vmap-stacks-v6" tag. This seems to point at one of these patches breaking the unwinder: a1c510d0adc6 ARM: implement support for vmap'ed stacks 532319b9c418 ARM: unwind: disregard unwind info before stack frame is set up 4ab6827081c6 ARM: unwind: dump exception stack from calling frame b6506981f880 ARM: unwind: support unwinding across multiple stacks Given that the unwinder is broken, I wonder whether 0183/1 and 9184/1 are actually required. I did try to point this problem out a few emails back: "As the trace on those is: [ 0.239629] unwind_backtrace from show_stack+0x10/0x14 [ 0.239654] show_stack from init_stack+0x1c54/0x2000 unwind_backtrace() and show_stack() are both C code, the compiler will emit the unwind information for it. show_stack() isn't called from assembly code, only from C code, so the next function's unwind information should also be generated by the compiler. However, init_stack is not a function - it's an array of unsigned long. There is no way this should appear in the trace, and this suggests that the unwind of show_stack() has gone wrong." In Corentin's case, there is no way init_stack should ever appear in the stack trace. In my case, it's not init_stack, but 0xc2be78d4. Can you try testing out a dummy WARN_ON(1) test in your kernel please? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!