Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp199844pxy; Fri, 7 May 2021 00:55:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyvi6UQ8HuPi3Mejim1Ei9TD9yNOzMOzwiMt+gPWRHHN9okJzf3DKuUtemPaZCr56OQ6UQf X-Received: by 2002:a17:90a:1708:: with SMTP id z8mr22180841pjd.51.1620374129733; Fri, 07 May 2021 00:55:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620374129; cv=none; d=google.com; s=arc-20160816; b=Da4H32DfK4172nkDOQlI0KJBHNRqkoOTIAtGuhUq/0EbUrhvWGMsZCrvyZza2Af6i2 H19d2T6vkgWmPHEWXyfcW91mFRCz/PfLr58T5iG7urkMF1XV1SGqVtzGJ97jgpE8AnjI ZL/Sb5f7aC4bOtVca91562a5QhEE7kfDZ1UPoqI65dGhNdfEW44kLdHL9wdlOtPQqm4W uj0tGL/9IHKUh5szHD6MetymC1hHKrURwObxUZ39bwymOfBBaofwH6JyBBNHR/Gah7O5 NQX2mGKRwmLYsazj+FlmivaBFxXsqu3M9U+uILBXA54+b1TlJaFVhqVY85HMqiVb9cd7 Xsyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:dkim-filter; bh=juupss22foYcpZymKqUksNf88DXt4O8bI54oKjTR0n0=; b=H0m1wDSHlxAyRu68xli4q8Q76uFMVaNoOOHl5iSmMKXtCya5Dxgp5LOTp2Sclzr5Dn kcJfUCcPUdjTtXCYqsr67//hmbKSN+vbXVZMI0nGqZE+J2RbSwVHgjssLAwMW4EOMUbS amMt446lU18rVwJH4mmFu8uJXWugMOefpEuhlm400ok6GANoSctg0Dk2AT/+egIDd2Jj u2XJqN0qxxFCt9co1xE3GvFWHf2yoKpaL17U+FW7WNmEGku+EuT0hhCXz12QCZJJ8Pw3 TpQ5tnFpdRSAf2MBSUhf8AE2CTMf4W0whkfFO7ExktjTekNyiFAENdrJTWntKxN/s0lw QyGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=SgEaDOIr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b22si6177291plz.247.2021.05.07.00.55.17; Fri, 07 May 2021 00:55:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.microsoft.com header.s=default header.b=SgEaDOIr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235042AbhEGHBk (ORCPT + 99 others); Fri, 7 May 2021 03:01:40 -0400 Received: from linux.microsoft.com ([13.77.154.182]:44780 "EHLO linux.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231355AbhEGHBi (ORCPT ); Fri, 7 May 2021 03:01:38 -0400 Received: from minint-m3g9p8n.europe.corp.microsoft.com (unknown [49.207.195.141]) by linux.microsoft.com (Postfix) with ESMTPSA id 5829E20B7178; Fri, 7 May 2021 00:00:36 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5829E20B7178 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1620370839; bh=juupss22foYcpZymKqUksNf88DXt4O8bI54oKjTR0n0=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=SgEaDOIrDrLcHRtsDBjt4RVrhDZ/e6wH8+LXCy+JMPTK09FJu+/eZTYb0SroIGaZ/ mqolOktnY9YRGlcjr43R4EiHJ4v6ccX2vLFYFFfVSEVHGC/HCzYO3oOulTeqWhFNB6 fcORXmQkpKr5jdIF7e0SKX6IQbtgoSKnwP6CPNh0= Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: [PATCH] optee: Disable shm cache when booting the crash kernel From: Allen Pais In-Reply-To: <20210507035816.426585-1-tyhicks@linux.microsoft.com> Date: Fri, 7 May 2021 12:30:33 +0530 Cc: jens.wiklander@linaro.org, zajec5@gmail.com, Allen Pais , bcm-kernel-feedback-list@broadcom.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, op-tee@lists.trustedfirmware.org Content-Transfer-Encoding: quoted-printable Message-Id: <720CDF03-42F9-43C3-B3B3-999E4A5E2864@linux.microsoft.com> References: <20210225090610.242623-1-allen.lkml@gmail.com> <20210507035816.426585-1-tyhicks@linux.microsoft.com> To: Tyler Hicks X-Mailer: Apple Mail (2.3654.60.0.2.21) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 07-May-2021, at 9:28 AM, Tyler Hicks = wrote: >=20 > The .shutdown hook is not called after a kernel crash when a kdump > kernel is pre-loaded. A kexec into the kdump kernel takes place as > quickly as possible without allowing drivers to clean up. >=20 > That means that the OP-TEE shared memory cache, which was initialized = by > the kernel that crashed, is still in place when the kdump kernel is > booted. As the kdump kernel is shutdown, the .shutdown hook is called, > which calls optee_disable_shm_cache(), and OP-TEE's > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not > mapped for the kdump kernel since the cache was set up by the previous > kernel. Trying to dereference the tee_shm pointer or otherwise = translate > the address results in a fault that cannot be handled: >=20 > Unable to handle kernel paging request at virtual address = ffff4317b9c09744 > Mem abort info: > ESR =3D 0x96000004 > EC =3D 0x25: DABT (current EL), IL =3D 32 bits > SET =3D 0, FnV =3D 0 > EA =3D 0, S1PTW =3D 0 > Data abort info: > ISV =3D 0, ISS =3D 0x00000004 > CM =3D 0, WnR =3D 0 > swapper pgtable: 4k pages, 48-bit VAs, pgdp=3D0000000970b1e000 > [ffff4317b9c09744] pgd=3D0000000000000000, p4d=3D0000000000000000 > Internal error: Oops: 96000004 [#1] SMP > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O) > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G O = 5.10.19.8 #1 > Hardware name: Redacted (DT) > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=3D--) > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363) > lr : optee_disable_shm_cache = (/usr/src/kernel/drivers/tee/optee/call.c:441) > sp : ffff80001005bb70 > x29: ffff80001005bb70 x28: ffff608e74648e00 > x27: ffff80001005bb98 x26: dead000000000100 > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa > x23: ffff608e74cf8818 x22: ffff608e738be600 > x21: ffff80001005bbc8 x20: ffff608e738be638 > x19: ffff4317b9c09700 x18: ffffffffffffffff > x17: 0000000000000041 x16: ffffba61b5171764 > x15: 0000000000000004 x14: 0000000000000fff > x13: ffffba61b5c9dfc8 x12: 0000000000000003 > x11: 0000000000000000 x10: 0000000000000000 > x9 : ffffba61b5413824 x8 : 00000000ffff4317 > x7 : 0000000000000000 x6 : 0000000000000000 > x5 : 0000000000000000 x4 : 0000000000000000 > x3 : 0000000000000000 x2 : ffff4317b9c09700 > x1 : 00000000ffff4317 x0 : ffff4317b9c09700 > Call trace: > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363) > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441) > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636) > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800) > device_shutdown (/usr/src/kernel/include/linux/device.h:758 = /usr/src/kernel/drivers/base/core.c:4078) > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 = /usr/src/kernel/kernel/reboot.c:248) > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 = /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312) > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 = /usr/src/kernel/arch/arm64/kernel/syscall.c:158 = /usr/src/kernel/arch/arm64/kernel/syscall.c:197) > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368) > el0_sync_handler = (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428) > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671) > Code: aa0003f3 b5000060 12800003 14000002 (b9404663) >=20 > When booting the kdump kernel, drain the shared memory cache while = being > careful to not translate the addresses returned from > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are = drained > and the cache is disabled, proceed with re-enabling the cache so that = we > aren't dealing with invalid addresses while shutting down the kdump > kernel. >=20 > Signed-off-by: Tyler Hicks > --- >=20 > This patch fixes a crash introduced by "optee: fix tee out of memory > failure seen during kexec reboot"[1]. However, I don't think that the > original two patch series[2] plus this patch is the full solution to > properly handling OP-TEE shared memory across kexec. >=20 > While testing this fix, I did about 10 kexec reboots and then = triggered > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel > became unresponsive during boot while steadily streaming the following > errors to the serial console: >=20 > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with = "arm-smmu.disable_bypass=3D0" to allow, but this may have security = implications > arm-smmu 64000000.mmu: GFSR 0x00000002, GFSYNR0 0x00000002, = GFSYNR1 0x00002000, GFSYNR2 0x00000000 >=20 > I suspect that this is related to the problems of OP-TEE shared memory > handling across kexec. My current hunch is that while we've disabled = the > shared memory cache with this patch, we haven't unregistered all of = the > addresses that the previous kernel (which crashed) had registered with > OP-TEE and that perhaps OP-TEE OS is still trying to make use those > addresses? >=20 > I'm still pretty early in investigating that assumption and > I'm learning about OP-TEE as I go but I wanted to get this initial > fix-of-the-fix out so that it was clear that the v2 of the series[2] = is > not complete. >=20 > [1] = https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/= > [2] = https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/= #t >=20 > drivers/tee/optee/call.c | 11 ++++++++++- > drivers/tee/optee/core.c | 13 +++++++++++-- > drivers/tee/optee/optee_private.h | 2 +- > 3 files changed, 22 insertions(+), 4 deletions(-) >=20 > diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c > index 6132cc8d014c..799e84bec63d 100644 > --- a/drivers/tee/optee/call.c > +++ b/drivers/tee/optee/call.c > @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee) > * optee_disable_shm_cache() - Disables caching of some shared memory = allocation > * in OP-TEE > * @optee: main service struct > + * @is_mapped: true if the cached shared memory addresses were = mapped by this > + * kernel, are safe to dereference, and should be freed > */ > -void optee_disable_shm_cache(struct optee *optee) > +void optee_disable_shm_cache(struct optee *optee, bool is_mapped) > { > struct optee_call_waiter w; >=20 > @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee) > if (res.result.status =3D=3D OPTEE_SMC_RETURN_OK) { > struct tee_shm *shm; >=20 Thanks Tyler. =46rom what I understand from my email exchange with Jens, I don=E2=80=99= t Think we want to touch optee_disable_shm_cache(), I could be wrong too, @Jens, comments? > + /* > + * Shared memory references that were not mapped = by > + * this kernel must be ignored to prevent a = crash. > + */ > + if (!is_mapped) > + continue; > + > shm =3D reg_pair_to_ptr(res.result.shm_upper32, > res.result.shm_lower32); > tee_shm_free(shm); > diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c > index 69d1f698907c..9985c671bd1f 100644 > --- a/drivers/tee/optee/core.c > +++ b/drivers/tee/optee/core.c > @@ -6,6 +6,7 @@ > #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt >=20 > #include > +#include > #include > #include > #include > @@ -588,7 +589,7 @@ static int optee_remove(struct platform_device = *pdev) > * reference counters and also avoid wild pointers in secure = world > * into the old shared memory range. > */ > - optee_disable_shm_cache(optee); > + optee_disable_shm_cache(optee, true); >=20 > /* > * The two devices have to be unregistered before we can free = the > @@ -618,7 +619,7 @@ static int optee_remove(struct platform_device = *pdev) > */ > static void optee_shutdown(struct platform_device *pdev) > { > - optee_disable_shm_cache(platform_get_drvdata(pdev)); > + optee_disable_shm_cache(platform_get_drvdata(pdev), true); > } >=20 > static int optee_probe(struct platform_device *pdev) > @@ -705,6 +706,14 @@ static int optee_probe(struct platform_device = *pdev) > optee->memremaped_shm =3D memremaped_shm; > optee->pool =3D pool; >=20 > + /* > + * The kexec into the crash kernel did not call our .shutdown = hook. The > + * shm cache objects registered with OP-TEE are not valid for = the crash > + * kernel. > + */ > + if (is_kdump_kernel()) > + optee_disable_shm_cache(optee, false); > + Am glad this solves the kdump crash that we have been seeing. - Allen > optee_enable_shm_cache(optee); >=20 > if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM) > diff --git a/drivers/tee/optee/optee_private.h = b/drivers/tee/optee/optee_private.h > index e25b216a14ef..16d8c82213e7 100644 > --- a/drivers/tee/optee/optee_private.h > +++ b/drivers/tee/optee/optee_private.h > @@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, = struct tee_ioctl_invoke_arg *arg, > int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 = session); >=20 > void optee_enable_shm_cache(struct optee *optee); > -void optee_disable_shm_cache(struct optee *optee); > +void optee_disable_shm_cache(struct optee *optee, bool is_mapped); >=20 > int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm, > struct page **pages, size_t num_pages, > --=20 > 2.25.1