Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp2256978rdb; Mon, 20 Nov 2023 06:30:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IH9gKMgNKFd8Cr9a0ldDnsGKxoGNun82953QIuAtNGoMojFU3w1tgjEToCMNRAIR4Tbn/b/ X-Received: by 2002:a17:902:7c96:b0:1cc:7a8e:606b with SMTP id y22-20020a1709027c9600b001cc7a8e606bmr5721246pll.43.1700490602493; Mon, 20 Nov 2023 06:30:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700490602; cv=none; d=google.com; s=arc-20160816; b=00+QmObQ0kKWQbK16dYtS7FiVOVrAuSBg1Poq60AthIEqrL7b5M/bppbqK7RQYCwiQ nPS9HNAUmDdMlL43Y6svD2jKepSm9jkb8rctFIrLkretgH2q3utpcBY3z2SW0KK51EVi VuwQLg5aHtGRrOhtC9GMDcbmExt2cHyRu70fXQvjGF/ltZZzWND8crT2wJkAwd3GV8of 8zBOcSoWsBcow4kOKUtekVV1kCTsXYHDgwY3JHqxsK6CCLhP9FJaKiv6YbbZZorxiBMF esvgm8TovqPl4sCQw14g7Q8aLenRv1KdyPGA52Zy2aFAWTQEWRAyNojHhorcQ1r7VxPO bzpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=Xd/N7BfrUt0GpDcdVP3L83LJX6QKDIJHelto8R9Trks=; fh=g6WJsk/hBEqlJ8cCPhwtHUTXOBOM+sUF8inssB7OczM=; b=b1dNtrU43r6U7aQdPChz3kvel8fx41nfOlXbmy0HMjj82xuMVn3Mpvb2vZqkdVP/4V K3nbrqt6oct9EkF5ldXJd8cJjlmCJ1v6dSgOa22OJV8DYA8TcW93hKiPWIj6Ex5cn+Tt Mp3ZmOOg9Cm2PL1y6N9IqN4dirK5g+8tLM5jvtGM1UI/BJ4+p1qpTj9lGp6wKI0yfPJh 7k8ecRV7xY6CUTqM5tArZumSDy/eLvUevzBi2WoTH0GFIBE4qJwwUXcPFawBf63X1VNW Lxg83MDb4/MDFqdXU924gx0g7C2/6xc3KKK13BXf1/Y8PvNl5Kw69//wJ1KLHTa8dfom qhFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id d8-20020a170903230800b001cc60ca8f43si8442312plh.358.2023.11.20.06.30.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 06:30:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id B7DE880A49A5; Mon, 20 Nov 2023 06:29:55 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233546AbjKTO3b (ORCPT + 99 others); Mon, 20 Nov 2023 09:29:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233522AbjKTO33 (ORCPT ); Mon, 20 Nov 2023 09:29:29 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D486AED for ; Mon, 20 Nov 2023 06:29:24 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ACF25FEC; Mon, 20 Nov 2023 06:30:09 -0800 (PST) Received: from FVFF77S0Q05N.cambridge.arm.com (FVFF77S0Q05N.cambridge.arm.com [10.1.25.119]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 82A383F6C4; Mon, 20 Nov 2023 06:29:22 -0800 (PST) Date: Mon, 20 Nov 2023 14:29:17 +0000 From: Mark Rutland To: qiwuchen55@gmail.com Cc: catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, chenqiwu Subject: Re: [PATCH] arm64: Add user stacktrace support Message-ID: References: <20231118134504.154842-1-qiwu.chen@transsion.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231118134504.154842-1-qiwu.chen@transsion.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 20 Nov 2023 06:29:56 -0800 (PST) On Sat, Nov 18, 2023 at 09:45:04PM +0800, qiwuchen55@gmail.com wrote: > From: chenqiwu > > 1. Introduce and export arch_dump_user_stacktrace() API to support > user stacktrace dump for a user task (both current and non-current task). > A example test about the log format of user stacktrace as shown below: > [test-515] Dump user backtrace: > <0xffffb0c1a750> in /lib/aarch64-linux-gnu/libc-2.32.so[ffffb0b53000-ffffb0cb1000] > <0xaaaacbf8097c> in /mnt/test[aaaacbf80000-aaaacbf81000] > <0xffffb0b778b8> in /lib/aarch64-linux-gnu/libc-2.32.so[ffffb0b53000-ffffb0cb1000] > <0xaaaacbf80834> in /mnt/test[aaaacbf80000-aaaacbf81000] Where is this used? We already have user stacktracing code in arch/arm64/kernel/perf_callchain.c which doesn't depend on this API. What does this API enable that we don't support today? > 2. Add arch_stack_walk_user() implementation to support userstacktrace transsionce option. What is this 'userstacktrace transsionce option' ? > A example test about the output format of ftrace userstacktrace as shown below: > bash-489 [000] ..... 2167.660775: sched_process_fork: comm=bash pid=489 child_comm=bash child_pid=596 > bash-489 [000] ..... 2167.660787: > => /lib/aarch64-linux-gnu/libc-2.32.so[+0xa76d8] > => /bin/bash[+0x5f354] > => /bin/bash[+0x4876c] > => /bin/bash[+0x4aec4] > => /bin/bash[+0x4da48] > => /bin/bash[+0x4b710] > => /bin/bash[+0x4c31c] > => /bin/bash[+0x339b0] > > Tested-by-by: chenqiwu > Signed-off-by: chenqiwu > --- > arch/arm64/Kconfig | 1 + > arch/arm64/kernel/stacktrace.c | 208 +++++++++++++++++++++++++++++++++ > include/linux/stacktrace.h | 10 ++ > 3 files changed, 219 insertions(+) As above, we already have user stacktracing code, and we shouldn't add *distinct* unwinders. Either that code should be factored out and reused, or this code should replace it. > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 7b071a004..4c5066f88 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -255,6 +255,7 @@ config ARM64 > select TRACE_IRQFLAGS_SUPPORT > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > + select USER_STACKTRACE_SUPPORT > help > ARM 64-bit (AArch64) Linux support. > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > index 17f66a74c..4e7bf2922 100644 > --- a/arch/arm64/kernel/stacktrace.c > +++ b/arch/arm64/kernel/stacktrace.c > @@ -215,6 +215,214 @@ static bool dump_backtrace_entry(void *arg, unsigned long where) > return true; > } > > +/* The struct defined for AArch64 userspace stack frame */ > +struct stack_frame_user { > + unsigned long fp; > + unsigned long sp; > + unsigned long pc; > +}; > + > +/* > + * The function of AArch64 userspace stack frame unwind method. > + * Note: If the caller is not current task, it's supposed to call > + * access_process_vm() to access another task' address space. > + */ > +static int arch_unwind_user_frame(struct task_struct *tsk, unsigned long high, > + struct stack_frame_user *frame) > +{ > + int ret = 0; > + unsigned long fp = frame->fp; > + unsigned long low = frame->sp; > + > + if (fp < low || fp > high || fp & 0xf) > + return -EFAULT; > + > + frame->sp = fp + 0x10; Given you always set frame->sp as fp + 0x10, why does frame->sp need to exist at all? Per AAPCS64, the frame record only conatins a copy of the FP and LR, and is *not* directly associated with the SP, so I don't think we should pretend it is. > + /* Disable page fault to make sure get_user going on wheels */ I have no idea what this comment is trying to say. Why exactly you you think we need to disable page faults? Isn't that going to make this fail arbitrarily when we *can* fault pages in? I know that the existing perf unwinder does this, but that's a design problem we'd like to solve (e.g. by deferring the unwind until return to userspace). > + pagefault_disable(); > + if (tsk == current) { > + if (get_user(frame->fp, (unsigned long __user *)fp) || > + get_user(frame->pc, (unsigned long __user *)(fp + 8))) > + ret = -EFAULT; > + } else { > + if (access_process_vm(tsk, fp, &frame->fp, > + sizeof(unsigned long), 0) != sizeof(unsigned long) || > + access_process_vm(tsk, fp + 0x08, &frame->pc, > + sizeof(unsigned long), 0) != sizeof(unsigned long)) > + ret = -EFAULT; > + } > + pagefault_enable(); If task isn't current, userspace could be running and this will be racy and unreliable. Where is this used with task != current? Why do we need to support that case at all? What does this do for COMPAT tasks? > + > + return ret; > +} > + > +/* > + * Print the executable address and corresponding VMA info. > + */ > +static void print_vma_addr_info(char *prefix, struct task_struct *task, > + unsigned long ip, const char *loglvl) > +{ > + struct mm_struct *mm; > + struct vm_area_struct *vma; > + > + if (task != current) > + mm = get_task_mm(task); > + else > + mm = task->mm; Why can't we always use get_task_mm(), even for task == current? > + > + if (!mm) > + return; > + /* > + * we might be running from an atomic context so we cannot sleep > + */ > + if (!mmap_read_trylock(mm)) { > + mmput(mm); > + return; > + } When is this called from an atomic context? > + > + vma = find_vma(mm, ip); > + if (vma && vma->vm_file) { > + struct file *f = vma->vm_file; > + char *buf = (char *)__get_free_page(GFP_NOWAIT); > + > + if (buf) { > + char *p; > + > + p = file_path(f, buf, PAGE_SIZE); > + if (IS_ERR(p)) > + p = "?"; > + printk("%s%s%s[%lx-%lx]\n", loglvl, prefix, p, > + vma->vm_start, > + vma->vm_end); > + free_page((unsigned long)buf); > + } > + } > + mmap_read_unlock(mm); > + if (task != current) > + mmput(mm); > +} > + > +static struct vm_area_struct *find_user_stack_vma(struct task_struct *task, unsigned long sp) > +{ > + struct mm_struct *mm; > + struct vm_area_struct *vma; > + > + if (task != current) > + mm = get_task_mm(task); > + else > + mm = task->mm; > + > + if (!mm) > + return NULL; > + /* > + * we might be running from an atomic context so we cannot sleep > + */ > + if (!mmap_read_trylock(mm)) { > + mmput(mm); > + return NULL; > + } > + vma = find_vma(mm, sp); > + mmap_read_unlock(mm); > + if (task != current) > + mmput(mm); What guarantees the VMA is safe to use after this? What ensures that it won't be freed? What ensures that it is still valid and not subject to concurrent modification? > + > + return vma; > +} > + > +static void dump_user_backtrace_entry(struct task_struct *tsk, > + unsigned long where, const char *loglvl) > +{ > + char prefix[64]; > + > + snprintf(prefix, sizeof(prefix), "<0x%lx> in ", where); > + print_vma_addr_info(prefix, tsk, where, loglvl); > +} > + > +void arch_dump_user_stacktrace(struct pt_regs *regs, struct task_struct *tsk, > + const char *loglvl) > +{ > + struct stack_frame_user frame; > + struct vm_area_struct *vma; > + unsigned long userstack_start, userstack_end; > + > + if (!tsk) > + tsk = current; > + > + /* > + * If @regs is not specified or caller is not current task,. > + * @regs is supposed to get from @tsk. > + */ > + if (!regs || tsk != current) > + regs = task_pt_regs(tsk); The user state is *always* in task_pt_regs(tsk), even when tsk == current. Why does this function take the regs as an argument at all? > + > + /* TODO: support stack unwind for compat user mode */ > + if (compat_user_mode(regs)) > + return; > + > + userstack_start = regs->user_regs.sp; > + vma = find_user_stack_vma(tsk, userstack_start); > + if (!vma) > + return; > + > + userstack_end = vma->vm_end; > + frame.fp = regs->user_regs.regs[29]; > + frame.sp = userstack_start; > + frame.pc = regs->user_regs.pc; > + > + printk("%s[%s-%d] Dump user backtrace:\n", loglvl, tsk->comm, tsk->pid); > + while (1) { > + unsigned long where = frame.pc; > + > + if (!where || where & 0x3) > + break; > + dump_user_backtrace_entry(tsk, where, loglvl); > + if (arch_unwind_user_frame(tsk, userstack_end, &frame) < 0) > + break; > + } > +} > +EXPORT_SYMBOL_GPL(arch_dump_user_stacktrace); Where is this used from? Why should it be exported? > + > +/** > + * stack_trace_save_user - Save user space stack traces into a storage array > + * @consume_entry: Callback for save a user space stack trace > + * @cookie: Caller supplied pointer handed back by arch_stack_walk() > + * @regs: The pt_regs pointer of current task > + */ > +void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, > + const struct pt_regs *regs) > +{ > + struct stack_frame_user frame; > + struct vm_area_struct *vma; > + unsigned long userstack_start, userstack_end; > + struct task_struct *tsk = current; > + > + /* TODO: support stack unwind for compat user mode */ > + if (!regs || !user_mode(regs) || compat_user_mode(regs)) > + return; > + > + userstack_start = regs->user_regs.sp; > + vma = find_user_stack_vma(tsk, userstack_start); > + if (!vma) Yet again this duplicates the code above. If we really need this, then arch_stack_walk_user() should be the real unwinder, and the caes above should be built atop arch_stack_walk_user(). > + return; > + > + userstack_end = vma->vm_end; > + frame.fp = regs->user_regs.regs[29]; > + frame.sp = userstack_start; > + frame.pc = regs->user_regs.pc; > + > + while (1) { > + unsigned long where = frame.pc; > + > + /* Sanity check: ABI requires pc to be aligned 4 bytes. */ > + if (!where || where & 0x3) > + break; Why do we care whether the PC is valid? There are plenty of other things that we could check (e.g. whether this points to executable memory), but it seems kinda pointless to care beyond whether we can unwind the frame. Note that we're missing the LR anyway, so this *isn't* a reliable unwind. Thanks, Mark. > + if (!consume_entry(cookie, where)) > + break; > + if (arch_unwind_user_frame(tsk, userstack_end, &frame) < 0) > + break; > + } > +} > + > void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, > const char *loglvl) > { > diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h > index 97455880a..bc5a7bf56 100644 > --- a/include/linux/stacktrace.h > +++ b/include/linux/stacktrace.h > @@ -60,6 +60,16 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, > > void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, > const struct pt_regs *regs); > + > +/** > + * arch_dump_user_stacktrace - Architecture specific function to dump the > + * stack trace for user process > + * @regs: Pointer to the pt_regs of user process > + * @tsk: Pointer to the task_struct of user process > + * @loglvl: Log level > + */ > +void arch_dump_user_stacktrace(struct pt_regs *regs, struct task_struct *tsk, > + const char *loglvl); > #endif /* CONFIG_ARCH_STACKWALK */ > > #ifdef CONFIG_STACKTRACE > -- > 2.25.1 >