Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp1791172ybk; Mon, 11 May 2020 04:28:42 -0700 (PDT) X-Google-Smtp-Source: APiQypLQtth1gPiOJtEqt7nwRHS7TjAx07iLU8hyhWnn7FNRe+PJS0+iNp/fr+5oY7u2/KgnzGwD X-Received: by 2002:a17:906:6b1b:: with SMTP id q27mr12521615ejr.158.1589196522360; Mon, 11 May 2020 04:28:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589196522; cv=none; d=google.com; s=arc-20160816; b=zmDzgPBXiuej/LhP/I/Zd/7G40Sytglv5Y0UcZOHTmZbKUl+qkjWztjE/QM8nKTWI9 3lgYGAlHd/5YrXXW4XhPCaKrtS+a4k2XB0OxIua2DrZtyiO5Ee3Smx2XqXVt9/rG/QOH anWWDJdupwGY95jd4v3lIjvfkgyq/urRBX8ZTMt+I5B2Y12Vgo/vXDusuh4nZwS1rBtj qj0T5rNNbxQW/4AKKF7iCIjauO/A+uJhgsAkltZIyksT1sDyOSjq+EwA9i5dAB6XFwWs Anng31B0XwQEXI/ldcPCMBQWPqjJNPpz7/5IzLRVdxMp54vdJY+wnKkEGWwVqFr4qnq/ unOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=UQoUMFJEDqNsaZqucFPwIlK+TSleEV8JzzUokEa2gUA=; b=vC6+U4yJzp5TH8wEYpMKK+BTW9pXpmow59j02xksXXg+iXluUTxoC7Y43SrNebCIhx 5D964O/uG8+LMom0Tpt5YmyVnS3K1bCSLtwvlGw3QL3FI6HiHvQ76ZOR6eLEp08MYBcn 6J3VrSn7leHq8XWg231FolANrJufZo/DC5KcyUbnwR1QqYxHxbAq0OjTZcDfuNLtPyI2 m9233m/zHNiDzNhyApvrVxqTx5Bebp4+wDGlBIZ1r/ZLavuRdeQZtCJLRfN1HIpyVgLX 456dzK2mHS7emTXhC34ZG0NWOmUroL1F4kjee+iI6lcsaq5a3HHtd0xr8FOUU8OpDFce qdnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y5si5658556edr.90.2020.05.11.04.28.19; Mon, 11 May 2020 04:28:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729570AbgEKLZI (ORCPT + 99 others); Mon, 11 May 2020 07:25:08 -0400 Received: from foss.arm.com ([217.140.110.172]:57958 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726068AbgEKLZH (ORCPT ); Mon, 11 May 2020 07:25:07 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D3D7F101E; Mon, 11 May 2020 04:25:06 -0700 (PDT) Received: from [10.57.36.85] (unknown [10.57.36.85]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 823AC3F305; Mon, 11 May 2020 04:25:05 -0700 (PDT) Subject: Re: arm64: tegra186: bpmp: kernel crash while decompressing initrd To: Mian Yousaf Kaukab , talho@nvidia.com, thierry.reding@gmail.com, jonathanh@nvidia.com, linux-tegra@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, afaerber@suse.de References: <20200508084041.23366-1-ykaukab@suse.de> From: Robin Murphy Message-ID: Date: Mon, 11 May 2020 12:25:00 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <20200508084041.23366-1-ykaukab@suse.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-05-08 9:40 am, Mian Yousaf Kaukab wrote: > I am seeing following kernel crash on Jetson TX2. Board is flashed with > firmware bits from L4T R32.4.2 with upstream u-boot. Crash always > happens while decompressing initrd. Initrd is approximately 80 MiB in > size and compressed with xz (xz --check=crc32 --lzma2=dict=32MiB). > Crash is not observed if the same initrd is compressed with gzip. > [1] was a previous attempt to workaround the same issue. > > [ 0.651168] Trying to unpack rootfs image as initramfs... > [ 2.890171] SError Interrupt on CPU0, code 0xbf40c000 -- SError > [ 2.890174] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G S 5.7.0-rc4-next-20200505 #22 > [ 2.890175] Hardware name: nvidia p2771-0000/p2771-0000, BIOS 2020.04-rc3 03/25/2020 > [ 2.890176] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) > [ 2.890177] pc : lzma_main+0x648/0x908 > [ 2.890178] lr : lzma_main+0x330/0x908 > [ 2.890179] sp : ffff80001003bb70 > [ 2.890180] x29: ffff80001003bb70 x28: 0000000004d794a4 > [ 2.890183] x27: 0000000004769941 x26: ffff0001eb064000 > [ 2.890185] x25: ffff0001eb060028 x24: 0000000000000002 > [ 2.890187] x23: 0000000000000003 x22: 0000000000000007 > [ 2.890189] x21: 0000000000611f4b x20: ffff0001eb060000 > [ 2.890192] x19: ffff80001003bcb8 x18: 0000000000000068 > [ 2.890194] x17: 00000000000000c0 x16: fffffe00076b2108 > [ 2.890196] x15: 0000000000000800 x14: 0000000000ffffff > [ 2.890198] x13: 0000000000000001 x12: ffff0001eb060000 > [ 2.890200] x11: 0000000000000600 x10: ffff0001eb060028 > [ 2.890202] x9 : 00000000ffbb2a08 x8 : 0000000000000ed0 > [ 2.890204] x7 : 00000000011553ec x6 : 0000000000000000 > [ 2.890206] x5 : 0000000000000000 x4 : 0000000000000006 > [ 2.890208] x3 : 00000000015a29e4 x2 : ffff0001eb062d0c > [ 2.890210] x1 : 000000000000000c x0 : 000000000263de44 > > With some debugging aid ported from Nvidia downstream kernel [2] the > actual cause was found: > > [ 0.761525] Trying to unpack rootfs image as initramfs... > [ 2.955499] CPU0: SError: mpidr=0x80000100, esr=0xbf40c000 > [ 2.955502] CPU1: SError: mpidr=0x80000000, esr=0xbe000000 > [ 2.955505] CPU2: SError: mpidr=0x80000001, esr=0xbe000000 > [ 2.955506] CPU3: SError: mpidr=0x80000101, esr=0xbf40c000 > [ 2.955507] ROC:CCE Machine Check Error: > [ 2.955508] ROC:CCE Registers: > [ 2.955509] STAT: 0xb400000000400415 > [ 2.955510] ADDR: 0x400c00e7a00c > [ 2.955511] MSC1: 0x80ffc > [ 2.955512] MSC2: 0x3900000000800 > [ 2.955513] -------------------------------------- > [ 2.955514] Decoded ROC:CCE Machine Check: > [ 2.955515] Uncorrected (this is fatal) > [ 2.955516] Error reporting enabled when error arrived > [ 2.955517] Error Code = 0x415 > [ 2.955518] Poison Error > [ 2.955518] Command = NCRd (0xc) > [ 2.955519] Address Type = Non-Secure DRAM > [ 2.955521] Address = 0x30039e80 -- 30000000.sysram + 0x39e80 > [ 2.955521] TLimit = 0x3ff > [ 2.955522] Poison Error Mask = 0x80 > [ 2.955523] More Info = 0x800 > [ 2.955524] Timeout Info = 0x0 > [ 2.955525] Poison Info = 0x800 > [ 2.955526] Read Request failed GSC checks > [ 2.955527] Source = L2_1 (A57) (0x1) > [ 2.955528] TID = 0xe > > IIUC, there was read request for 0x30039e80 from EL1/2 which failed. > This address falls in the sysram security aperture and hence a read > from normal mode failed. > > sysram is mapped at 0x3000_0000 to 0x3004_ffff and is managed by the > sram driver (drivers/misc/sram.c). There are two reserved pools for > BPMP driver communication at 0x3004_e000 and 0x3004_f000 of 0x1000 > bytes each. > > sram driver maps complete 0x3000_0000 to 0x3004_ffff range as normal > memory. That's your problem. It's not really worth attempting to reason about, the architecture says that anything mapped as Normal memory may be speculatively accessed at any time, so no amount of second-guessing is going to save you in general. Don't make stuff accessible to the kernel that it doesn't need to access, and especially don't make stuff accessible to the kernel if accessing it will kill the system. > However, only the BPMP reserved pools (0x3004_e000 - 0x3004_ffff) > are accessible from the kernel. Address 0x3003_9e80 is inaccessible > from the kernel and a read to it (which I believe is speculative) > causes the SError. Only driver which uses sysram is not initialized at > this point (rootfs_initcall level). As since > commit d70f5e541ab3 ("firmware: tegra: Make BPMP a regular driver") > bpmp driver is initialized at device_initcall level. > > If none of the drivers on the kernel side using 0x3003_9e80 address > range. Why a read to it occurs even speculatively? Could it be that > some EL3 software didn’t cleanup after itself properly? Any > suggestions on debugging this issue further? > > Another solution suggested in [1] was to add no-memory-wc in sysram > node in device-tree so that sysram is mapped as device-memory. Thus > preventing any speculative access. However, it causes another set of > issues with the bpmp driver. That's may be a discussion for another > time. AFAICS the truly correct solution is what Stephen initially suggested there - for the boot process to somehow describe which parts of SRAM have been reserved by Secure software and/or which parts remain Non-Secure, and for the kernel driver to only map and use the latter. Robin.