Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754845AbcDYPRF (ORCPT ); Mon, 25 Apr 2016 11:17:05 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:36578 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751750AbcDYPRE (ORCPT ); Mon, 25 Apr 2016 11:17:04 -0400 Date: Mon, 25 Apr 2016 17:16:38 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Dmitry Safonov , Thomas Gleixner , Shuah Khan , Ingo Molnar , Dave Hansen , Borislav Petkov , khorenko@virtuozzo.com, X86 ML , Andrew Morton , xemul@virtuozzo.com, linux-kselftest@vger.kernel.org, Cyrill Gorcunov , Dmitry Safonov <0x7f454c46@gmail.com>, "linux-kernel@vger.kernel.org" , "H. Peter Anvin" Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to change compatible mode Message-ID: <20160425151638.GL3430@twins.programming.kicks-ass.net> References: <5707D9F1.3090102@virtuozzo.com> <570E79EF.7030408@virtuozzo.com> <20160420110402.GY3408@twins.programming.kicks-ass.net> <20160420190507.GF3408@twins.programming.kicks-ass.net> <20160421201241.GM3430@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5623 Lines: 165 On Thu, Apr 21, 2016 at 04:27:19PM -0700, Andy Lutomirski wrote: > > Did that help? Or did I confuse you moar? > > > > I think I'm starting to get it. What if we rearrange slightly, like this: > > perf_sample_data already has a struct perf_regs in it. We could add a > flags field to the first chunk of perf_sample_data: > > u64 sample_flags; I actually considered that for another problem. Didn't like it then, but seeing how I still haven't figured out a better way and you're now proposing this too, maybe... Part of the problem is that this will completely exhaust that first cacheline :/ > perf_sample_data_init sets sample_flags to zero. And while we're on struct perf_sample_data, that thing has gotten insanely large. We carry it on-stack! It should be fairly easy to take regs_user_copy out and use a per-cpu array of them things for this I think, see below. > Now we rename perf_sample_regs_user to __perf_sample_regs_user and > make it non-static. We also teach it to set do data->sample_flags |= > PERF_SAMPLE_FLAGS_HAS_REGS_USER. We add: > > static void perf_fetch_regs_user(struct perf_sample_data *data, struct > pt_regs *interrupt_regs) > { > if (data->sample_flags & PERF_SAMPLE_FLAGS_HAS_REGS_USER) > return; > > __perf_sample_regs_user(&data->regs_user, interrupt_regs, > &data->regs_user_copy); > } I meant to change perf_prepare_sample() to do: u64 sample_type = event->attr.sample_type & ~data.sample_type; or something similar, such that we can override/avoid some of the work there. > (Hmm. This only really works well if we can guarantee that > interrupt_regs remains valid for the life of the perf_sample_data > object. Could we perhaps move the interrupt_regs pointer *into* > perf_sample_data and stop passing it all over the place?) So the problem with that is that we'll now overflow the one cacheline, and the last time I really looked at this that made samples that much slower. It might be time to re-evaluate this stuff, since pretty much everything will eventually write into perf_sample_data::ip etc.. which is the second line anyway. Also, looking at it, we actually have a pointer in there for this, perf_sample_data::regs_intr::regs, but its at the very tail of this monster, 4 cachelines off the normal path. > We change all the callers of perf_sample_regs_user to use > perf_fetch_regs_user instead. There's only the one site currently, but yeah. > What do you think? If you like it, I can probably find some time to > give it a shot, but I don't guarantee that I won't miss some subtlety > in its interaction with the rest of the event output code. Sure give it a go, I'll stomp on it to fix the pebs-time issue (we need to skip perf_prepare_sample's PERF_SAMPLE_TIME branch for that). > On a vaguely related note, why is the big prebs-to-pt_regs copy > conditional on (sample_type & PERF_SAMPLE_REGS_INTR)? I bet it would > be faster to make it unconditional, because you could avoid copying > over the entire pt_regs struct if PERF_SAMPLE_REGS_INTR isn't set. Hmm, yes.. that code did move about a bit, not sure what it looked like originally. In any case, That fully copy is overkill in the simple case as well, I think that could get away with only copying cs,flags. Compile tested only.. --- Subject: perf: Replace perf_sample_data::regs_user_copy with per-cpu storage struct perf_sample_data is immense, and we carry it on stack, shrink it some. struct perf_sample_data { /* size: 384, cachelines: 6, members: 19 */ } Signed-off-by: Peter Zijlstra (Intel) --- include/linux/perf_event.h | 2 -- kernel/events/core.c | 23 +++++++++++++++++------ 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 85749ae8cb5f..dd2cab6c5bbb 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -795,8 +795,6 @@ struct perf_sample_data { * on arch details. */ struct perf_regs regs_user; - struct pt_regs regs_user_copy; - struct perf_regs regs_intr; u64 stack_user_size; } ____cacheline_aligned; diff --git a/kernel/events/core.c b/kernel/events/core.c index eabeb2aec00f..72754607d2cd 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5146,15 +5146,27 @@ perf_output_sample_regs(struct perf_output_handle *handle, } } -static void perf_sample_regs_user(struct perf_regs *regs_user, - struct pt_regs *regs, - struct pt_regs *regs_user_copy) +static DEFINE_PER_CPU(struct pt_regs, __regs_user[4]); + +static struct pt_regs *regs_user_ptr(void) +{ + if (in_nmi()) + return this_cpu_ptr(&__regs_user[0]); + if (in_interrupt()) + return this_cpu_ptr(&__regs_user[1]); + if (in_serving_softirq()) + return this_cpu_ptr(&__regs_user[2]); + return this_cpu_ptr(&__regs_user[3]); +} + +static void +perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { if (user_mode(regs)) { regs_user->abi = perf_reg_abi(current); regs_user->regs = regs; } else if (current->mm) { - perf_get_regs_user(regs_user, regs, regs_user_copy); + perf_get_regs_user(regs_user, regs, regs_user_ptr()); } else { regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; regs_user->regs = NULL; @@ -5638,8 +5650,7 @@ void perf_prepare_sample(struct perf_event_header *header, } if (sample_type & (PERF_SAMPLE_REGS_USER | PERF_SAMPLE_STACK_USER)) - perf_sample_regs_user(&data->regs_user, regs, - &data->regs_user_copy); + perf_sample_regs_user(&data->regs_user, regs); if (sample_type & PERF_SAMPLE_REGS_USER) { /* regs dump ABI info */