Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2064936pxp; Thu, 10 Mar 2022 18:55:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJx1A9kgoAbcgjEsgESHIGSRTxmrJeNvjgSntr9sfzey4TBmpfK1lJfmL8o9GaC0aSYygwrg X-Received: by 2002:a17:906:7312:b0:6db:5729:f11 with SMTP id di18-20020a170906731200b006db57290f11mr6638723ejc.623.1646967306038; Thu, 10 Mar 2022 18:55:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646967306; cv=none; d=google.com; s=arc-20160816; b=QjonaM/SlpA748pe8j2T65DNExeGnufjAIyX7bnn/4WaBIbwJADwtxC9ekARoC/xg1 dx11WnuGhAn3uQf8fPFmP4k11WIcQisLIV4SJpIlgITqVdIld4eAQw1QLr0dClb/L4JS RXRfZBsY7Z5LMt2EVw+sgMvJk6o62W4iJTFGq3atXKJ5LgDlNofW51Jc6+WB2mfVrQvE Co2kK/aus5fa7dWq+DFGeaQqZRJBWEdblyibcaAiVee2LGsZfw00U3jaagT9ha2oRYMR +R9ZW5OILFyGmiXiSq4UhaVMFEVx4bet3XaEiGSSE3Nt+C151ZOQKf2X8RUTF3ntZSZV IIvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=0XOxoRhCxn4RuwbyrCrk6t3swVtPgJmu2MdScDyPqbQ=; b=gwTdh79+CHtowLR5eXyIi2JawKqVHCK6GGrABOb1eb5p8kNXjAmSXgvx31Zaot04V6 4QP/p2axnkHZ18EwW0lGfqNk2v7NrjM0PZ+oIN1F4XZ6t3BrjswTqWcaUNUj2VarfXHM HEcn6ZjHyvgMlc8T/teRbEXtnZmdkdFDwf/Bf5Z5yA55EcPUS0ryEmiyM83BCSNNf+Ci eFfUsveDnPCTSv1IFVAQRkbzm7Q9VHoEt+p7orV/NKSeJe8IgqMA5e4Z7m6IbDpAAmoy Ia/lmTYG+qB2GqLJAIum2TZIbTz2c8/YXbPR0sQufzkLcDTIM9zhLce9/dH/XAe7qFHl fpwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=mcg579Pj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h17-20020a05640250d100b00415ed5193d3si4809829edb.498.2022.03.10.18.54.44; Thu, 10 Mar 2022 18:55:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail (test mode) header.i=@armlinux.org.uk header.s=pandora-2019 header.b=mcg579Pj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=armlinux.org.uk Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237267AbiCIRQB (ORCPT + 99 others); Wed, 9 Mar 2022 12:16:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238893AbiCIRPM (ORCPT ); Wed, 9 Mar 2022 12:15:12 -0500 Received: from pandora.armlinux.org.uk (pandora.armlinux.org.uk [IPv6:2001:4d48:ad52:32c8:5054:ff:fe00:142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3193015697C; Wed, 9 Mar 2022 09:11:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0XOxoRhCxn4RuwbyrCrk6t3swVtPgJmu2MdScDyPqbQ=; b=mcg579PjLxcqqpzqmD4TRDh0KO Xt0CJkuEF3rXPN1ImSzHh6Zt+thXoFDboFMnj8ugMCZZKUaIEVUXKcmvirOBNEMELY3Pxu/2zX5YG r+VBiq0o+GX45wMXXny2eh9NxP71vd1ZpJNQPeG63R8Z/+SsqlFBYAWCIQMcENH6J/E51sljsf88X E5oU7FtUDYgMnVs0JQzM4JfuM/paCLg9B7YP48F0cCiITKZ16aD2+nYxfZTuuu10ARHL0Oy04RsPn Rv8nuvFfBpUBzV8AL/5DUKmbw9UWRXkqjXosDRTNuPtwZYpHZivlxTgEXuftAgwAYeeH0TFqL1YcO SF5eQmsg==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:57748) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nRzqi-0002Gq-67; Wed, 09 Mar 2022 17:11:44 +0000 Received: from linux by shell.armlinux.org.uk with local (Exim 4.94.2) (envelope-from ) id 1nRzqe-00088O-RE; Wed, 09 Mar 2022 17:11:40 +0000 Date: Wed, 9 Mar 2022 17:11:40 +0000 From: "Russell King (Oracle)" To: Naresh Kamboju Cc: Ard Biesheuvel , open list , Linux-Next Mailing List , Linux ARM , Linus Walleij , Arnd Bergmann , Corentin Labbe , Stephen Rothwell Subject: Re: [next] arm: Internal error: Oops: 5 PC is at __read_once_word_nocheck Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Russell King (Oracle) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 09, 2022 at 10:08:25PM +0530, Naresh Kamboju wrote: > Hi Russell, > > On Wed, 9 Mar 2022 at 20:37, Russell King (Oracle) > wrote: > > > > On Wed, Mar 09, 2022 at 03:57:32PM +0100, Ard Biesheuvel wrote: > > > On Wed, 9 Mar 2022 at 15:44, Naresh Kamboju wrote: > > > > > > > Well, we unwound until: > > > > __irq_svc from migrate_disable+0x0/0x70 > > > > and then crashed - and the key thing there is that we're at the start > > of migrate_disable() when we took an interrupt. > > > > For some reason, this triggers an access to address 0x10, which faults. > > We then try unwinding again, and successfully unwind all the way back > > to the same point (the line above) which then causes the unwinder to > > again access address 0x10, and the cycle repeats with the stack > > growing bigger and bigger. > > > > I'd suggest also testing without the revert but with my patch. > > I have tested your patch on top of linux next-20220309 and still see kernel > crash as below [1]. build link [2]. > > [ 26.812060] 8<--- cut here --- > [ 26.813459] Unhandled fault: page domain fault (0x01b) at 0xb6a3ab70 > [ 26.816139] [b6a3ab70] *pgd=fb28a835 > [ 26.817770] Internal error: : 1b [#1] SMP ARM > [ 26.819636] Modules linked in: > [ 26.820956] CPU: 0 PID: 211 Comm: haveged Not tainted > 5.17.0-rc7-next-20220309 #1 > [ 26.824519] Hardware name: Generic DT based system > [ 26.827148] PC is at __read_once_word_nocheck+0x0/0x8 > [ 26.829856] LR is at unwind_frame+0x7dc/0xab4 > > - Naresh > > [1] https://lkft.validation.linaro.org/scheduler/job/4688599#L596 > [2] https://builds.tuxbuild.com/269gYLGuAdmltuLhIUDAjS2fg1Q/ I think the problem has just moved: [ 27.113085] __irq_svc from __copy_to_user_std+0x24/0x378 The code at the start of __copy_to_user_std is: 0: e3a034bf mov r3, #-1090519040 ; 0xbf000000 4: e243c001 sub ip, r3, #1 8: e05cc000 subs ip, ip, r0 c: 228cc001 addcs ip, ip, #1 10: 205cc002 subscs ip, ip, r2 14: 33a00000 movcc r0, #0 18: e320f014 csdb 1c: e3a03000 mov r3, #0 20: e92d481d push {r0, r2, r3, r4, fp, lr} 24: e1a0b00d mov fp, sp and the unwind information will be: 0xc056f14c : @0xc0b89b84 Compact model index: 1 0x9b vsp = r11 0xb1 0x0d pop {r0, r2, r3} 0x84 0x81 pop {r4, r11, r14} 0xb0 finish The problem is that the unwind information says "starting at offset 0x1c, to unwind do the following operations". The first of which is to move r11 (fp) to the stack pointer. However, r11 isn't setup until function offset 0x24. You've hit that instruction, which hasn't executed yet, but the stack has been modified by pushing r0, r2-r4, fp and lr onto it. Given this, there is no way that the unwinder (as it currently stands) can do its job properly between 0x1c and 0x24. I don't think this is specifically caused by Ard's patches, but by the addition of KASAN, which has the effect of calling the unwinder at random points in the kernel (when an interrupt happens) and it's clear from the above that there are windows in the code where, if we attempt to unwind using the unwind information, we faill fail because the program state is not consistent with the unwind information. Ard's patch that changes: ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp)); to use get_kernel_nofault() should have the effect of protecting against the oops, but the side effect is that it is fundamentally not possible with the way these things are to unwind at these points - which means its not possible to get a stacktrace there. So, I don't think this is a "new" problem, but a weakness of using the unwinder to get a backtrace for KASAN. Do you have any way to work out exactly when this problem first appeared? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!