Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp458184pxf; Wed, 31 Mar 2021 07:40:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxo0v41nYiz2MfcnRGq2OwlTHaUfM5bCVAD2reVmsoNRMNrcH/xuehIPs4TsmjtQItAAMX7 X-Received: by 2002:a17:907:7684:: with SMTP id jv4mr3987535ejc.231.1617201619163; Wed, 31 Mar 2021 07:40:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617201619; cv=none; d=google.com; s=arc-20160816; b=JYo7Xz/7PAvxO85GqfqoTi+Zod57vYlcI0PiQJopjzv79wE1sgSuFghCdzMqN+n2sG vRDnpLbuFDcRjq3L8zUPy0muw2B7Es+VZbL59Wf0yb7FV5fujWVcV093EItYPVQGibas 5JJGkjORDzG+DY0Rowq8vzlS7QUuLRE7npdr05dZJxYTKtXNJBDgt23dEthDQmStWkiE 3n2Llxi8Q6/cx/bt9D8TyLSAbjMaXYmeNsFasjCKoWhTfB4orp+Wh+P1ebDSYwbG9SMK wUhIg61Lfak/n5zWPhBqzHVaH0OMXkP409xPnyIls3lnl4wltd+/k74uVkcgfP4r9h/b /TxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=BwJtFOKPOPb1wWSi4ncMiLkukeMAC77cpIQXCk3Nsys=; b=FJEF+VntjGQ3RUKgkJvaLxGvHpusCnxIuMZoDsw7xXRP/NFH0wX1ywiBHTg/VmpFzy bFKEKdoewcUlO78xtyKOYUkn2j7xg/274nll1nlChgQQKCuZT+VcDEmtOR4LcOJFmn0E D+jF1XULo2Vne41Y/hokoFi987nEcDCz6ZQBB/e59QVIb4HW/euVOCsL4TqVTW2MEEFj R8il0iTwJgYzP9ivl1yaqlhdcKnEATua6mxp/TjILk5FIGs+KiT1+8UqTBNtnMO1nbHJ +HA6+iynff9NBtuTpfyg3K39+GDL080M67D7U/qVTXTI1AnDnpV1zM67RA4ABA775k44 SJmw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hd34si1891367ejc.197.2021.03.31.07.39.54; Wed, 31 Mar 2021 07:40:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236120AbhCaOiW (ORCPT + 99 others); Wed, 31 Mar 2021 10:38:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:45286 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235842AbhCaOhv (ORCPT ); Wed, 31 Mar 2021 10:37:51 -0400 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E945D60FF3; Wed, 31 Mar 2021 14:37:50 +0000 (UTC) Date: Wed, 31 Mar 2021 10:37:49 -0400 From: Steven Rostedt To: Vasily Gorbik Cc: Josef Bacik , linux-kernel@vger.kernel.org Subject: Re: User stacktrace garbage when USER_STACKTRACE_SUPPORT is not enabled Message-ID: <20210331103749.01a7c283@gandalf.local.home> In-Reply-To: References: X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 31 Mar 2021 13:52:45 +0200 Vasily Gorbik wrote: > Hi Steven, > > At least on s390 since commit cbc3b92ce037 ("tracing: Set kernel_stack's > caller size properly") kernel stack trace contains 8 garbage values in the end. > I assume those are supposed to be filled by ftrace_trace_userstack, which is > only implemented on x86. You assume wrong ;-) No, the kernel_stack trace does not do anything for user stack unless the architecture does something with it, and x86 does not do anything with it when running kernel stack. > > sshd-804 [050] 1997.252608: kernel_stack: > => trampoline_probe_handler (549628c94) > => kprobe_handler (549629260) > => kprobe_exceptions_notify (549629370) > => notify_die (549686e5e) > => illegal_op (54960d440) > => __do_pgm_check (54a106b08) > => pgm_check_handler (54a112cc8) > => kretprobe_trampoline (549629438) > => kretprobe_trampoline (549629436) > => do_syscall (549611ee6) > => __do_syscall (54a106ccc) > => system_call (54a112b5a) > => 769010000000322 > => 22125e4d8 > => 22125e8f8 > => e000054100040100 > => _end (3220000000c) > => 2 > => 20f892ec00000002 > => 20f898b800000002 > > kernel/trace/trace_entries.h: > 159 #define FTRACE_STACK_ENTRIES 8 > 160 > 161 FTRACE_ENTRY(kernel_stack, stack_entry, > 162 > 163 TRACE_STACK, > 164 > 165 F_STRUCT( > 166 __field( int, size ) > 167 __array( unsigned long, caller, FTRACE_STACK_ENTRIES ) > 168 ), > > Is there any reason to keep those 8 extra values in the caller array if > CONFIG_USER_STACKTRACE_SUPPORT is not enabled? Any advice how to fix that > gracefully? It seems to work if I simply set FTRACE_STACK_ENTRIES to 0 when > CONFIG_USER_STACKTRACE_SUPPORT is not enabled. Note, the above is just a holder for user space parsing. It plays no role in how the kernel reports the data from the /sys/kernel/tracing/trace file. Here's the code that does the actual work (from kernel/tracing/trace.c) fstack = this_cpu_ptr(ftrace_stacks.stacks) + stackidx; size = ARRAY_SIZE(fstack->calls); if (regs) { nr_entries = stack_trace_save_regs(regs, fstack->calls, size, skip); } else { nr_entries = stack_trace_save(fstack->calls, size, skip); } size = nr_entries * sizeof(unsigned long); event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, sizeof(*entry) + size, trace_ctx); if (!event) goto out; entry = ring_buffer_event_data(event); memcpy(&entry->caller, fstack->calls, size); entry->size = nr_entries; ftrace_stacks.stacks is a per cpu variable of size PAGE_SIZE and divided by 4, so it can be used in normal, softirq, interrupt and NMI context. Depending on if regs is passed in (and I'll assume it is not here), then stack_trace_save() is called. Which looks like this: unsigned int stack_trace_save(unsigned long *store, unsigned int size, unsigned int skipnr) { stack_trace_consume_fn consume_entry = stack_trace_consume_entry; struct stacktrace_cookie c = { .store = store, .size = size, .skip = skipnr + 1, }; arch_stack_walk(consume_entry, &c, current, NULL); return c.len; } arch_stack_walk() is arch specific, and it does whatever s390 does for stack walks, and the number of entries is returned by c.len, and that value determines what is seen. But after writing all of the above, I think I found a bug! It's this: size = nr_entries * sizeof(unsigned long); event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, sizeof(*entry) + size, trace_ctx); I said the above commit did not play a role in output, but it does play a role in creating the struct stack_trace entry. And by making it a fixed array (even though it's not used) it added 8 more entries to the stack! This should fix the problem: -- Steve diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 3c605957bb5c..507a30bf26e4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2985,7 +2985,8 @@ static void __ftrace_trace_stack(struct trace_buffer *buffer, size = nr_entries * sizeof(unsigned long); event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, - sizeof(*entry) + size, trace_ctx); + (sizeof(*entry) - sizeof(entry->caller)) + size, + trace_ctx); if (!event) goto out; entry = ring_buffer_event_data(event);