Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp4387008ybh; Tue, 6 Aug 2019 10:46:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqyuMj2y/bS7o/LO1nDFnncI92bzIdX08A70/zwizXD49UStidmPUgH5jUhGvrV3Ppq754fM X-Received: by 2002:a17:902:b582:: with SMTP id a2mr4422466pls.128.1565113617553; Tue, 06 Aug 2019 10:46:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565113617; cv=none; d=google.com; s=arc-20160816; b=txEPlYplLlqqEw7BqLV2NpMsSPNIZqqoZ9LY1KB72diFFhur9VKaHplhIiZcMbeILQ 0LhaDH3w0FhbZSeyR/BpqWuPCy+NFj+WjIiL50t4waBD5QUyh13ZJJPemVJ+NUPZUKwz 1sfNGI3prvQ63GzeTT4ZCBmbK+uvaXcWFVRT/gCXeC3Rc84VgQkQbp4uoEbZcrBQMpxd WT8sPIjWZwe9xvXICF6YNWfghF45v5XF3cGs8WGxARS97GRKSDFqSfazF3aQUkJCKId9 DD7NfzxWPP0aX4ECnNwJLxStaXYLCmvcLIIAPYz9wL2ISX4I4ETUiE8EFnOyLCdR/dCQ tHag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=++FX/M76MkJ77Wmy6ZQS0b99NlwKRsO0LdWt36B9bDo=; b=X7Qw/C+9e6YFzx1mTV+VCuaPftRBy8EsF5CD0j/r1xCO4a1//nuw6HakrOfUmXzKSi SKHuvdBoiVNqi0mQbnJ6sj8aSvbR1eoKCwAPkJA6393OM+hyCp/Qgv5K0L/zs/HFVYAM GEqs3BdCTydjfmA7/864aTd6oH43p/dP8E800CoHKfVRYMeeiTeXSvmIjFL50EFzz+z3 anNKTAZ4Xo2rFma2gr1czsM4xb7Zn7W13rE4gwzPCxzwMxrCJIwwVBdJiTL13cGuBuv7 4MRuvkg570GyqkPmqLUGI4k98nIYW0GV+qXCCx0G/MzHCGKBXuTp1EZHGvPEx94mTUUk K6kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=eWzzE7nu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a24si47878895pfi.205.2019.08.06.10.46.41; Tue, 06 Aug 2019 10:46:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=eWzzE7nu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732996AbfHFRZX (ORCPT + 99 others); Tue, 6 Aug 2019 13:25:23 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:46741 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729161AbfHFRZX (ORCPT ); Tue, 6 Aug 2019 13:25:23 -0400 Received: by mail-pg1-f196.google.com with SMTP id w3so4735102pgt.13 for ; Tue, 06 Aug 2019 10:25:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=++FX/M76MkJ77Wmy6ZQS0b99NlwKRsO0LdWt36B9bDo=; b=eWzzE7nuUqqomzdRaZM4fbGQAsQ9CcWdhoShx2noBZxcc1DJ+7oJERIsTVprpW6Ium WrK9ML+WKEOGF8KZ2XARf7QTUspWfLJiYS6PVQNpGSTeE2Gy74nWuBYVKF4ILLL5h2Qz aJRG4k9Ueidxzse3Rx0HUDSIgNWEYBB6UbOMc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=++FX/M76MkJ77Wmy6ZQS0b99NlwKRsO0LdWt36B9bDo=; b=Ud4FNSBb8hF/0xrRm1DDfJQ0vbPwEMmSA7NIglAT8r2f7pjkITlNGX1HjF0peuC5jH TfK39YBngM6Tpj/BAamJ73yj5wt2JYu2qs9DsplZW7QsIhuXdxq5C1Kbwpbu7dzp1ZIv LrggKff7cx9dNFzXUQy0IJu1yCO/yyNprIaGP8u3sGXt5Ln4RE7LVRXx9D/4Bw6dPNCQ iwU6H/2uEgNBWu76qJyjySP9oRQIyXW0cMLT5ugYG9G+SIUWu32M8YrvZNVbjQNgVfwe LmcqEZbD4zkpcQ0/Nh9xPVlahrbrrKErnTd/+Fie9KdTNo5D3Q94yBL8Da+I2yqtNhWU OyHw== X-Gm-Message-State: APjAAAV4SyG2QeQH45HGGAg3DEcXxMVoMwipvJg+l9QN04F7jCjt6+Eg RXUPFKaEClsaSHA9RhYPWVvs9g== X-Received: by 2002:a63:6fc9:: with SMTP id k192mr3981375pgc.20.1565112321645; Tue, 06 Aug 2019 10:25:21 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id r75sm112195247pfc.18.2019.08.06.10.25.20 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 06 Aug 2019 10:25:20 -0700 (PDT) Date: Tue, 6 Aug 2019 13:25:19 -0400 From: Joel Fernandes To: Steven Rostedt Cc: Jiping Ma , mingo@redhat.com, catalin.marinas@arm.com, will.deacon@arm.com, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH v3] tracing: Function stack size and its name mismatch in arm64 Message-ID: <20190806172519.GD39951@google.com> References: <20190802094103.163576-1-jiping.ma2@windriver.com> <20190802112259.0530a648@gandalf.local.home> <20190802120920.3b1f4351@gandalf.local.home> <20190802121124.6b41f26a@gandalf.local.home> <20190806154811.GB39951@google.com> <20190806123455.487ac02b@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190806123455.487ac02b@gandalf.local.home> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 06, 2019 at 12:34:55PM -0400, Steven Rostedt wrote: > On Tue, 6 Aug 2019 11:48:11 -0400 > Joel Fernandes wrote: > > > > > diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h > > > index 5ab5200b2bdc..13a4832cfb00 100644 > > > --- a/arch/arm64/include/asm/ftrace.h > > > +++ b/arch/arm64/include/asm/ftrace.h > > > @@ -13,6 +13,7 @@ > > > #define HAVE_FUNCTION_GRAPH_FP_TEST > > > #define MCOUNT_ADDR ((unsigned long)_mcount) > > > #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE > > > +#define ARCH_RET_ADDR_AFTER_LOCAL_VARS 1 > > > > > > #ifndef __ASSEMBLY__ > > > #include > > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > > > index 5d16f73898db..050c6bd9beac 100644 > > > --- a/kernel/trace/trace_stack.c > > > +++ b/kernel/trace/trace_stack.c > > > @@ -158,6 +158,18 @@ static void check_stack(unsigned long ip, unsigned long *stack) > > > i++; > > > } > > > > > > +#ifdef ARCH_RET_ADDR_AFTER_LOCAL_VARS > > > + /* > > > + * Most archs store the return address before storing the > > > + * function's local variables. But some archs do this backwards. > > > + */ > > > + if (x > 1) { > > > + memmove(&stack_trace_index[0], &stack_trace_index[1], > > > + sizeof(stack_trace_index[0]) * (x - 1)); > > > + x--; > > > + } > > > +#endif > > > + > > > stack_trace_nr_entries = x; > > > > > > if (task_stack_end_corrupted(current)) { > > > > > > I am not fully understanding the fix :(. If the positions of the data and > > FP/LR are swapped, then there should be a loop of some sort where the FP/LR > > are copied repeatedly to undo the mess we are discussing. But in this patch > > I see only one copy happening. May be I just don't understand this code well > > enough. Are there any more clues for helping understand the fix? > > Here's the best way to explain this. The code is using the stack trace > to figure out which function is the stack hog. Or perhaps a serious of > stack hogs. On x86, a call stores the return address as it calls the > next function. Then that function allocates its stack frame for its > local variables and saving of registers. This makes perfect sense, (probably also makes sense to push this whole explanation into either the changelog or the kernel documentation) Thanks a lot, Steve! - Joel > on x86: > > [ top of stack ] > 0: sys call entry frame > 10: return addr to entry code > 11: start of sys_foo frame > 20: return addr to sys_foo > 21: start of kernel_func_bar frame > 30: return addr to kernel_func_bar > 31: [ do trace stack here ] > > > Then we do a save_stack_trace which returns the addresses of the > functions it finds. Which would be (from the bottom of the stack to the > top) > > return addr to kernel_func_bar > return addr to sys_foo > return addr to entry code > > What we do here is try to figure out how much stack each of theses > functions have. So we loop through the stack looking for the addresses > returned by the save_stack trace, and see where on the stack this is. > This gives us: > > return addr to kernel_func_bar [ 30 ] > return addr to sys_foo [ 20 ] > return addr to entry frame [ 10 ] > > From this, we can conclude (on x86) that the size of the stack used for > kernel_func_bar is 30 - 20 = 10. Because on the stack we have: > > 20: return addr to sys_foo > 21: start of kernel_func_bar frame <<-- kernel_func_bar stack frame > 30: return addr to kernel_func_bar > > > Now, what Jiping reported, is that on arm64, it saves the link register > (the return address) when it is needed to, which is after the stack > frame for the current function has been saved. That means we have > something that looks like this: > > on arm64: > > [ top of stack ] > 0: sys call entry frame > 10: start of sys_foo_frame > 19: return addr to entry code << lr saved before calling kern_func_bar > 20: start of kernel_func_bar frame > 29: return addr to sys_foo_frame << lr saved before calling next function > 30: [ do trace stack here ] > > Now, I have a question. To call the mcount code (ftrace and the stack > tracing), you need to save the return address of kern_func_bar > somewhere, otherwise the call to mcount will overwrite the lr. But > let's say it does and then forgets it, so we have: > > 30: return addr of kernel_func_bar frame > 31: [ do trace stack here ] > > Now save_stack_trace gives us the same result: > > return addr to kernel_func bar > return addr to sys_foo > return addr to entry frame > > But we get a different result when finding them in the location of the > stack. > > return addr to kernel_func_bar [ 30 ] > return addr to sys foo [ 29 ] > return addr to entry frame [ 19 ] > > The simple subtractions will be off: > > kernel_func_bar stack size = 30 - 29 = 1 > Or even, sys_foo 29 - 19 = 10, but if we look at the stack: > > 10: start of sys_foo_frame > 19: return addr to entry_code > 20: start of kernel_func_bar frame > 29: return addr to sys_foo > > We are measuring the kernel_func_bar frame for sys_foo! > > We are off by one here. > > stack_trace_index[] is an array of the offsets mapping to the function > return addresses found. If we shift it by one, then we then sync the > functions found with their frames: > > stack_trace_index[0] = 30 > stack_trace_index[1] = 29 > stack_trace_index[2] = 19 > > memmove((&stack_trace_index[0], &stack_trace_index[1], > sizeof(stack_trace_index[0]) * (x - 1)); > > Makes that: > > stack_trace_index[0] = 29 > stack_trace_index[1] = 19 > > And we do x-- to lose the last frame. > > With the stack_dump_trace being: > > stack_dump_trace[0] = return addr kernel_func_bar > stack_dump_trace[1] = return addr sys_foo > > we then match which frame size belongs to which function better. > > > > > > Also, this stack trace loop (original code) is a bit hairy :) It appears > > there is a call to stack_trace_save() followed by another loop that goes > > through the returned entries from there and tries to generate a set of > > indexes. Isn't the real issue that the entries returned by stack_trace_save() > > are a out of whack? I am curious also if other users of stack_trace_save() > > will suffer from the same issue. > > No, the order is fine. The issue is that we are using the location of > the return address in the stack to find out what function has the > biggest stack usage, and our assumption for arm64 is incorrect in that > location. > > -- Steve