Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp956515pxb; Wed, 3 Mar 2021 22:27:36 -0800 (PST) X-Google-Smtp-Source: ABdhPJzrDyh+rxKdK1uEjaeDbyiD9aFFFRrmQF1c7ZaH7qnWERG3Db8RHfhnoXZglfAipRe/ew/t X-Received: by 2002:a05:6402:510f:: with SMTP id m15mr2743254edd.78.1614839256797; Wed, 03 Mar 2021 22:27:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614839256; cv=none; d=google.com; s=arc-20160816; b=k1IJ1IgTL8wnjTuYOGQA0aO2YjSkbncxDCczbsCP/RMNR2lIrDZqsdDk165PxO7qgq QaZn/LzVV6yiDWde/BP7JNBkMoNgh6HE0it7SB77g0f2aV2QY2RZSy58znLLQNu0tREK Fg278twbPMrFav9exXQEiThhWxRBOpygez6e5leVx5lo7cFSJ8C4N7t8jCKZmRBtnuTF XSx8y4eUao0GALCjgKRRPNyLtOngrUYv7Pu9ZLd6EbeL6jdlOdfJSM8GEIrtcI27BAMQ DB+Bgdud4rQ75vdlahupcbFqJrjmlBCfpg2vPAqwm+XdDROm7lS1TiQVpusqsbMMcFv6 astg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:references:in-reply-to:subject:cc:to:from :dkim-signature; bh=qnXEMb74LFjJdDFTchm+rFQQSCK63cIEqJKvUj+9GG8=; b=oacROqT9xB7eGBCfWGIbxY3WYl/tYMXy4ynwRW1rFrgSBnhNhyJDgHkxKQGAdjTeFY pg8W4KGxcJpxHzjcq+SSUh9ROiu+qo6Nc9bmREuxKNXRHrcKtDlVNXR/IoMFPu5tNbJ5 siQVivZDgegsVN6CUdxNnlKw7rzyMQn1L+JnEZ73hvxtPbWaDQYH9Kied6eK7C6bCCc4 XnabkBJeVfY74vI6aAIKhH6yqaMbfOyXl28VjijtaDD97RlFTYdt51eGV/cd53ttIa7z KJW9BJrNrYss/ZMfE5YqfKoUMPcI3hxaiMDXjYlXvscUe8xf8tfGo5agp98prw5zpTc3 ffQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ellerman.id.au header.s=201909 header.b="gz/h8ua+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r18si16855237eds.390.2021.03.03.22.27.14; Wed, 03 Mar 2021 22:27:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ellerman.id.au header.s=201909 header.b="gz/h8ua+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243412AbhCBP4x (ORCPT + 99 others); Tue, 2 Mar 2021 10:56:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351205AbhCBNgn (ORCPT ); Tue, 2 Mar 2021 08:36:43 -0500 Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22C76C061788 for ; Tue, 2 Mar 2021 03:42:06 -0800 (PST) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4DqZtX1LwJz9sVt; Tue, 2 Mar 2021 22:40:04 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ellerman.id.au; s=201909; t=1614685205; bh=zOGT8bj8ifBQh24/0m1Tf2gT7cTgyGIpmxD6cBaKfmk=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=gz/h8ua+vCSnhQX5pZeLs0F7oBJEBvvfLAA3vx+vrC7aFcX8YkmXFGwuwdGdn6Lwm 7Peej5u/D0rBl1VfOKX/SaNFsJkFTawze2yEQpwiGucrjlfsb5ExB2dZYYWKgRrFyK xId3CqZWs+4Sbe3Z6ESiSaYRzkD89rs5gf0tNMX6X6T4csNAkoJz8oQqvSsgZOe9jn fflE9j8Vd3t3QcBvdOAmESipqsXXuK0fQPzOGgxA2h9rW8nslZJGtGHaGevD9nRfVC uvWf3KVX/ePWPqXyP060I97XNoT4QHbELGfDkJwhxmcK/7r1I5L/c7Z1DYojYrVy98 lqzDK0gborrEg== From: Michael Ellerman To: Christophe Leroy , Marco Elver Cc: Alexander Potapenko , Benjamin Herrenschmidt , Paul Mackerras , Dmitry Vyukov , LKML , linuxppc-dev@lists.ozlabs.org, kasan-dev Subject: Re: [RFC PATCH v1] powerpc: Enable KFENCE for PPC32 In-Reply-To: <08a96c5d-4ae7-03b4-208f-956226dee6bb@csgroup.eu> References: <51c397a23631d8bb2e2a6515c63440d88bf74afd.1614674144.git.christophe.leroy@csgroup.eu> <08a96c5d-4ae7-03b4-208f-956226dee6bb@csgroup.eu> Date: Tue, 02 Mar 2021 22:40:03 +1100 Message-ID: <87h7ltss18.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Christophe Leroy writes: > Le 02/03/2021 =C3=A0 10:53, Marco Elver a =C3=A9crit=C2=A0: >> On Tue, 2 Mar 2021 at 10:27, Christophe Leroy >> wrote: >>> Le 02/03/2021 =C3=A0 10:21, Alexander Potapenko a =C3=A9crit : >>>>> [ 14.998426] BUG: KFENCE: invalid read in finish_task_switch.isra.0= +0x54/0x23c >>>>> [ 14.998426] >>>>> [ 15.007061] Invalid read at 0x(ptrval): >>>>> [ 15.010906] finish_task_switch.isra.0+0x54/0x23c >>>>> [ 15.015633] kunit_try_run_case+0x5c/0xd0 >>>>> [ 15.019682] kunit_generic_run_threadfn_adapter+0x24/0x30 >>>>> [ 15.025099] kthread+0x15c/0x174 >>>>> [ 15.028359] ret_from_kernel_thread+0x14/0x1c >>>>> [ 15.032747] >>>>> [ 15.034251] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: G B >>>>> 5.12.0-rc1-s3k-dev-01534-g4f14ae75edf0-dirty #4674 >>>>> [ 15.045811] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>>> [ 15.053324] # test_invalid_access: EXPECTATION FAILED at mm/kf= ence/kfence_test.c:636 >>>>> [ 15.053324] Expected report_matches(&expect) to be true, but i= s false >>>>> [ 15.068359] not ok 21 - test_invalid_access >>>> >>>> The test expects the function name to be test_invalid_access, i. e. >>>> the first line should be "BUG: KFENCE: invalid read in >>>> test_invalid_access". >>>> The error reporting function unwinds the stack, skips a couple of >>>> "uninteresting" frames >>>> (https://elixir.bootlin.com/linux/v5.12-rc1/source/mm/kfence/report.c#= L43) >>>> and uses the first "interesting" one frame to print the report header >>>> (https://elixir.bootlin.com/linux/v5.12-rc1/source/mm/kfence/report.c#= L226). >>>> >>>> It's strange that test_invalid_access is missing altogether from the >>>> stack trace - is that expected? >>>> Can you try printing the whole stacktrace without skipping any frames >>>> to see if that function is there? >>>> >>> >>> Booting with 'no_hash_pointers" I get the following. Does it helps ? >>> >>> [ 16.837198] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> [ 16.848521] BUG: KFENCE: invalid read in finish_task_switch.isra.0+0= x54/0x23c >>> [ 16.848521] >>> [ 16.857158] Invalid read at 0xdf98800a: >>> [ 16.861004] finish_task_switch.isra.0+0x54/0x23c >>> [ 16.865731] kunit_try_run_case+0x5c/0xd0 >>> [ 16.869780] kunit_generic_run_threadfn_adapter+0x24/0x30 >>> [ 16.875199] kthread+0x15c/0x174 >>> [ 16.878460] ret_from_kernel_thread+0x14/0x1c >>> [ 16.882847] >>> [ 16.884351] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: G B >>> 5.12.0-rc1-s3k-dev-01534-g4f14ae75edf0-dirty #4674 >>> [ 16.895908] NIP: c016eb8c LR: c02f50dc CTR: c016eb38 >>> [ 16.900963] REGS: e2449d90 TRAP: 0301 Tainted: G B >>> (5.12.0-rc1-s3k-dev-01534-g4f14ae75edf0-dirty) >>> [ 16.911386] MSR: 00009032 CR: 22000004 XER: 0000= 0000 >>> [ 16.918153] DAR: df98800a DSISR: 20000000 >>> [ 16.918153] GPR00: c02f50dc e2449e50 c1140d00 e100dd24 c084b13c 0000= 0008 c084b32b c016eb38 >>> [ 16.918153] GPR08: c0850000 df988000 c0d10000 e2449eb0 22000288 >>> [ 16.936695] NIP [c016eb8c] test_invalid_access+0x54/0x108 >>> [ 16.942125] LR [c02f50dc] kunit_try_run_case+0x5c/0xd0 >>> [ 16.947292] Call Trace: >>> [ 16.949746] [e2449e50] [c005a5ec] finish_task_switch.isra.0+0x54/0x2= 3c (unreliable) >>=20 >> The "(unreliable)" might be a clue that it's related to ppc32 stack >> unwinding. Any ppc expert know what this is about? >>=20 >>> [ 16.957443] [e2449eb0] [c02f50dc] kunit_try_run_case+0x5c/0xd0 >>> [ 16.963319] [e2449ed0] [c02f63ec] kunit_generic_run_threadfn_adapter= +0x24/0x30 >>> [ 16.970574] [e2449ef0] [c004e710] kthread+0x15c/0x174 >>> [ 16.975670] [e2449f30] [c001317c] ret_from_kernel_thread+0x14/0x1c >>> [ 16.981896] Instruction dump: >>> [ 16.984879] 8129d608 38e7eb38 81020280 911f004c 39000000 995f0024 90= 7f0028 90ff001c >>> [ 16.992710] 3949000a 915f0020 3d40c0d1 3d00c085 <8929000a> 3908adb0 = 812a4b98 3d40c02f >>> [ 17.000711] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> [ 17.008223] # test_invalid_access: EXPECTATION FAILED at mm/kfen= ce/kfence_test.c:636 >>> [ 17.008223] Expected report_matches(&expect) to be true, but is = false >>> [ 17.023243] not ok 21 - test_invalid_access >>=20 >> On a fault in test_invalid_access, KFENCE prints the stack trace based >> on the information in pt_regs. So we do not think there's anything we >> can do to improve stack printing pe-se. > > stack printing, probably not. Would be good anyway to mark the last level= [unreliable] as the ppc does. > > IIUC, on ppc the address in the stack frame of the caller is written by t= he caller. In most tests,=20 > there is some function call being done before the fault, for instance=20 > test_kmalloc_aligned_oob_read() does a call to kunit_do_assertion which p= opulates the address of the=20 > call in the stack. However this is fragile. > > This works for function calls because in order to call a subfunction, a f= unction has to set up a=20 > stack frame in order to same the value in the Link Register, which contai= ns the address of the=20 > function's parent and that will be clobbered by the sub-function call. > > However, it cannot be done by exceptions, because exceptions can happen i= n a function that has no=20 > stack frame (because that function has no need to call a subfunction and = doesn't need to same=20 > anything on the stack). If the exception handler was writting the caller'= s address in the stack=20 > frame, it would in fact write it in the parent's frame, leading to a mess. > > But in fact the information is in pt_regs, it is in regs->nip so KFENCE s= hould be able to use that=20 > instead of the stack. > >>=20 >> What's confusing is that it's only this test, and none of the others. >> Given that, it might be code-gen related, which results in some subtle >> issue with stack unwinding. There are a few things to try, if you feel >> like it: >>=20 >> -- Change the unwinder, if it's possible for ppc32. > > I don't think it is possible. I think this actually is the solution. It seems the good architectures have all added support for arch_stack_walk(), and we have not. Looking at some of the implementations of arch_stack_walk() it seems it's expected that the first entry emitted includes the PC (or NIP on ppc). For us stack_trace_save() calls save_stack_trace() which only emits entries from the stack, which doesn't necessarily include the function NIP is pointing to. So I think it's probably on us to update to that new API. Or at least update our save_stack_trace() to fabricate an entry using the NIP, as it seems that's what callers expect. cheers