Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1381491pxb; Thu, 4 Mar 2021 09:51:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJzIMDGp89ZyEXQvzTL6MaabAAboRysDorsjWXt0nTi/J7Pn8JHAvGyHDd74CNaGTE/crG4v X-Received: by 2002:a17:907:b06:: with SMTP id h6mr5545494ejl.144.1614880305455; Thu, 04 Mar 2021 09:51:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614880305; cv=none; d=google.com; s=arc-20160816; b=c0T3Dr91Wq5YWF5yaFTr5LdzL+oX7LT/+rvfaP9L0R7Bahep8rqFVBl8/CfUB95Shb 333PRkn0WPWuZMFpm83TrODc2hXiwT4jYtiA18NzEbrZLklXhWLRBZEkRWIMoBwZ64j5 xQXmQdP5WHD4OzxpjH3W5Pqh9o/bwHBKbmu/b0vEIGq+y0gx7lgsf6entM8wquzudZjT NziTxgOmJRJ3zWgkYsOHFcIu0oNzf/yRMLozUXoP6M1gimDyow1fiE/y2X0Ge5u5dGG9 vSAVPR2d/hiEDG6dNRQhe10FDojBF8mBL9HWNk9v+TOmprFCMObAPYCEDsOiaT0aXIez t1Yw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=zxxSvr+0Qvu8W49w5usmTYuq6qFHgPXDoxA55ieNp7I=; b=KV6nYRGe9DAqYhDjXDSTqHvBjxbORM362cHDYButTbXAI9CpI/Qa1L4hsQENH1EIMM YAFIYirSS7VHWN76GOLsTV92cohCILuRPZrIhBSpVAmI5cqBdUAvwhzfeerMwinaYf9h CpO4gfM8yAmzuqVztpAkQZWRSkyGsNrO96M81xsi65jNCrDG3j7hQfKEI4mMKE3wraqM +3A6cadKTgniwbdRJnSVRF8V69myLV1GwW+xw/7+76v+v2v2EqGEsoeV1c13PuDLThpy hJKrIrP0560qqIomjNi2h1HIb6ZgI0G12cYUz3ADvmd/XrdBkH6drwnrvBplWBV6PesF 2cUQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k2si10704215ejv.112.2021.03.04.09.51.21; Thu, 04 Mar 2021 09:51:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232771AbhCDO6a (ORCPT + 99 others); Thu, 4 Mar 2021 09:58:30 -0500 Received: from foss.arm.com ([217.140.110.172]:39434 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232710AbhCDO6W (ORCPT ); Thu, 4 Mar 2021 09:58:22 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B40671FB; Thu, 4 Mar 2021 06:57:36 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.53.210]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E80F43F766; Thu, 4 Mar 2021 06:57:33 -0800 (PST) Date: Thu, 4 Mar 2021 14:57:30 +0000 From: Mark Rutland To: Marco Elver Cc: Christophe Leroy , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , LKML , linuxppc-dev@lists.ozlabs.org, kasan-dev , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, broonie@kernel.org Subject: Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends Message-ID: <20210304145730.GC54534@C02TD0UTHF1T.local> References: <1802be3e-dc1a-52e0-1754-a40f0ea39658@csgroup.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [adding Mark Brown] On Wed, Mar 03, 2021 at 04:20:43PM +0100, Marco Elver wrote: > On Wed, Mar 03, 2021 at 03:52PM +0100, Christophe Leroy wrote: > > Le 03/03/2021 � 15:38, Marco Elver a �crit�: > > > On Wed, 3 Mar 2021 at 15:09, Christophe Leroy > > > wrote: > > > > > > > > It seems like all other sane architectures, namely x86 and arm64 > > > > at least, include the running function as top entry when saving > > > > stack trace. > > > > > > > > Functionnalities like KFENCE expect it. > > > > > > > > Do the same on powerpc, it allows KFENCE to properly identify the faulting > > > > function as depicted below. Before the patch KFENCE was identifying > > > > finish_task_switch.isra as the faulting function. > > > > > > > > [ 14.937370] ================================================================== > > > > [ 14.948692] BUG: KFENCE: invalid read in test_invalid_access+0x54/0x108 > > > > [ 14.948692] > > > > [ 14.956814] Invalid read at 0xdf98800a: > > > > [ 14.960664] test_invalid_access+0x54/0x108 > > > > [ 14.964876] finish_task_switch.isra.0+0x54/0x23c > > > > [ 14.969606] kunit_try_run_case+0x5c/0xd0 > > > > [ 14.973658] kunit_generic_run_threadfn_adapter+0x24/0x30 > > > > [ 14.979079] kthread+0x15c/0x174 > > > > [ 14.982342] ret_from_kernel_thread+0x14/0x1c > > > > [ 14.986731] > > > > [ 14.988236] CPU: 0 PID: 111 Comm: kunit_try_catch Tainted: G B 5.12.0-rc1-01537-g95f6e2088d7e-dirty #4682 > > > > [ 14.999795] NIP: c016ec2c LR: c02f517c CTR: c016ebd8 > > > > [ 15.004851] REGS: e2449d90 TRAP: 0301 Tainted: G B (5.12.0-rc1-01537-g95f6e2088d7e-dirty) > > > > [ 15.015274] MSR: 00009032 CR: 22000004 XER: 00000000 > > > > [ 15.022043] DAR: df98800a DSISR: 20000000 > > > > [ 15.022043] GPR00: c02f517c e2449e50 c1142080 e100dd24 c084b13c 00000008 c084b32b c016ebd8 > > > > [ 15.022043] GPR08: c0850000 df988000 c0d10000 e2449eb0 22000288 > > > > [ 15.040581] NIP [c016ec2c] test_invalid_access+0x54/0x108 > > > > [ 15.046010] LR [c02f517c] kunit_try_run_case+0x5c/0xd0 > > > > [ 15.051181] Call Trace: > > > > [ 15.053637] [e2449e50] [c005a68c] finish_task_switch.isra.0+0x54/0x23c (unreliable) > > > > [ 15.061338] [e2449eb0] [c02f517c] kunit_try_run_case+0x5c/0xd0 > > > > [ 15.067215] [e2449ed0] [c02f648c] kunit_generic_run_threadfn_adapter+0x24/0x30 > > > > [ 15.074472] [e2449ef0] [c004e7b0] kthread+0x15c/0x174 > > > > [ 15.079571] [e2449f30] [c001317c] ret_from_kernel_thread+0x14/0x1c > > > > [ 15.085798] Instruction dump: > > > > [ 15.088784] 8129d608 38e7ebd8 81020280 911f004c 39000000 995f0024 907f0028 90ff001c > > > > [ 15.096613] 3949000a 915f0020 3d40c0d1 3d00c085 <8929000a> 3908adb0 812a4b98 3d40c02f > > > > [ 15.104612] ================================================================== > > > > > > > > Signed-off-by: Christophe Leroy > > > > > > Acked-by: Marco Elver > > > > > > Thank you, I think this looks like the right solution. Just a question below: > > > > > ... > > > > > > @@ -59,23 +70,26 @@ void save_stack_trace(struct stack_trace *trace) > > > > > > > > sp = current_stack_frame(); > > > > > > > > - save_context_stack(trace, sp, current, 1); > > > > + save_context_stack(trace, sp, (unsigned long)save_stack_trace, current, 1); > > > > > > This causes ip == save_stack_trace and also below for > > > save_stack_trace_tsk. Does this mean save_stack_trace() is included in > > > the trace? Looking at kernel/stacktrace.c, I think the library wants > > > to exclude itself from the trace, as it does '.skip = skipnr + 1' (and > > > '.skip = skipnr + (current == tsk)' for the _tsk variant). > > > > > > If the arch-helper here is included, should this use _RET_IP_ instead? > > > > > > > Don't really know, I was inspired by arm64 which has: > > > > void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, > > struct task_struct *task, struct pt_regs *regs) > > { > > struct stackframe frame; > > > > if (regs) > > start_backtrace(&frame, regs->regs[29], regs->pc); > > else if (task == current) > > start_backtrace(&frame, > > (unsigned long)__builtin_frame_address(0), > > (unsigned long)arch_stack_walk); > > else > > start_backtrace(&frame, thread_saved_fp(task), > > thread_saved_pc(task)); > > > > walk_stackframe(task, &frame, consume_entry, cookie); > > } > > > > But looking at x86 you may be right, so what should be done really ? > > x86: > > [ 2.843292] calling stack_trace_save: > [ 2.843705] test_func+0x6c/0x118 > [ 2.844184] do_one_initcall+0x58/0x270 > [ 2.844618] kernel_init_freeable+0x1da/0x23a > [ 2.845110] kernel_init+0xc/0x166 > [ 2.845494] ret_from_fork+0x22/0x30 > > [ 2.867525] calling stack_trace_save_tsk: > [ 2.868017] test_func+0xa9/0x118 > [ 2.868530] do_one_initcall+0x58/0x270 > [ 2.869003] kernel_init_freeable+0x1da/0x23a > [ 2.869535] kernel_init+0xc/0x166 > [ 2.869957] ret_from_fork+0x22/0x30 > > arm64: > > [ 3.786911] calling stack_trace_save: > [ 3.787147] stack_trace_save+0x50/0x78 > [ 3.787443] test_func+0x84/0x13c > [ 3.787738] do_one_initcall+0x5c/0x310 > [ 3.788099] kernel_init_freeable+0x214/0x294 > [ 3.788363] kernel_init+0x18/0x164 > [ 3.788585] ret_from_fork+0x10/0x30 > > [ 3.803615] calling stack_trace_save_tsk: > [ 3.804266] stack_trace_save_tsk+0x9c/0x100 > [ 3.804541] test_func+0xc4/0x13c > [ 3.804803] do_one_initcall+0x5c/0x310 > [ 3.805031] kernel_init_freeable+0x214/0x294 > [ 3.805284] kernel_init+0x18/0x164 > [ 3.805505] ret_from_fork+0x10/0x30 > > +Cc arm64 folks. > > So I think the arm64 version also has a bug, because I think a user of > really doesn't care about the library function > itself. And from reading kernel/stacktrace.c I think it wants to exclude > itself entirely. > > It's a shame that isn't better documented, but I'm > pretty sure that including the library functions in the trace is not > useful. I agree this behaviour isn't desireable, and that the lack of documentation is unfortunate. It looks like GCC is happy to give us the function-entry-time FP if we use __builtin_frame_address(1), and assuming clang is similarly happy we can do: | diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c | index ad20981dfda4..5dfbf915eb7f 100644 | --- a/arch/arm64/kernel/stacktrace.c | +++ b/arch/arm64/kernel/stacktrace.c | @@ -203,8 +203,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, | start_backtrace(&frame, regs->regs[29], regs->pc); | else if (task == current) | start_backtrace(&frame, | - (unsigned long)__builtin_frame_address(0), | - (unsigned long)arch_stack_walk); | + (unsigned long)__builtin_frame_address(1), | + (unsigned long)__builtin_return_address(0)); | else | start_backtrace(&frame, thread_saved_fp(task), | thread_saved_pc(task)); ... such that arch_stack_walk() will try to avoid including itself in a trace, and so the existing skipping should (w/ caveats below) skip stack_trace_save() or stack_trace_save_tsk(). If that works for you, I can spin that as a patch, though we'll need to check that doesn't introduce a new fencepost error elsewhere. The bigger problem here is that skipping is dodgy to begin with, and this is still liable to break in some cases. One big concern is that (especially with LTO) we cannot guarantee the compiler will not inline or outline functions, causing the skipp value to be too large or too small. That's liable to happen to callers, and in theory (though unlikely in practice), portions of arch_stack_walk() or stack_trace_save() could get outlined too. Unless we can get some strong guarantees from compiler folk such that we can guarantee a specific function acts boundary for unwinding (and doesn't itself get split, etc), the only reliable way I can think to solve this requires an assembly trampoline. Whatever we do is liable to need some invasive rework. Thanks, Mark.