2020-05-16 00:17:23

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V6 04/37] x86: Make hardware latency tracing explicit


The hardware latency tracer calls into trace_sched_clock and ends up in
various instrumentable functions which is problemeatic vs. the kprobe
handling especially the text poke machinery. It's invoked from
nmi_enter/exit(), i.e. non-instrumentable code.

Use nmi_enter/exit_notrace() instead. These variants do not invoke the
hardware latency tracer which avoids chasing down complex callchains to
make them non-instrumentable.

The real interesting measurement is the actual NMI handler. Add an explicit
invocation for the hardware latency tracer to it.

#DB and #BP are uninteresting as they really should not be in use when
analzying hardware induced latencies.

If #DF hits, hardware latency is definitely not interesting anymore and in
case of a machine check the hardware latency is not the most troublesome
issue either.

Signed-off-by: Thomas Gleixner <[email protected]>

---
arch/x86/kernel/cpu/mce/core.c | 4 ++--
arch/x86/kernel/nmi.c | 6 ++++--
arch/x86/kernel/traps.c | 12 +++++++-----
3 files changed, 13 insertions(+), 9 deletions(-)

--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1916,7 +1916,7 @@ static __always_inline void exc_machine_
mce_check_crashing_cpu())
return;

- nmi_enter();
+ nmi_enter_notrace();
/*
* The call targets are marked noinstr, but objtool can't figure
* that out because it's an indirect call. Annotate it.
@@ -1924,7 +1924,7 @@ static __always_inline void exc_machine_
instrumentation_begin();
machine_check_vector(regs);
instrumentation_end();
- nmi_exit();
+ nmi_exit_notrace();
}

static __always_inline void exc_machine_check_user(struct pt_regs *regs)
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc
__this_cpu_write(last_nmi_rip, regs->ip);

instrumentation_begin();
+ ftrace_nmi_handler_enter();

handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled);
@@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc
unknown_nmi_error(reason, regs);

out:
+ ftrace_nmi_handler_exit();
instrumentation_end();
}

@@ -536,14 +538,14 @@ DEFINE_IDTENTRY_NMI(exc_nmi)
}
#endif

- nmi_enter();
+ nmi_enter_notrace();

inc_irq_stat(__nmi_count);

if (!ignore_nmis)
default_do_nmi(regs);

- nmi_exit();
+ nmi_exit_notrace();

#ifdef CONFIG_X86_64
if (unlikely(this_cpu_read(update_debug_stack))) {
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -387,7 +387,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
}
#endif

- nmi_enter();
+ nmi_enter_notrace();
instrumentation_begin();
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);

@@ -632,12 +632,14 @@ DEFINE_IDTENTRY_RAW(exc_int3)
instrumentation_end();
idtentry_exit(regs);
} else {
- nmi_enter();
+ nmi_enter_notrace();
instrumentation_begin();
+ ftrace_nmi_handler_enter();
if (!do_int3(regs))
die("int3", regs, 0);
+ ftrace_nmi_handler_exit();
instrumentation_end();
- nmi_exit();
+ nmi_exit_notrace();
}
}

@@ -849,7 +851,7 @@ static void noinstr handle_debug(struct
static __always_inline void exc_debug_kernel(struct pt_regs *regs,
unsigned long dr6)
{
- nmi_enter();
+ nmi_enter_notrace();
/*
* The SDM says "The processor clears the BTF flag when it
* generates a debug exception." Clear TIF_BLOCKSTEP to keep
@@ -871,7 +873,7 @@ static __always_inline void exc_debug_ke
if (dr6)
handle_debug(regs, dr6, false);

- nmi_exit();
+ nmi_exit_notrace();
}

static __always_inline void exc_debug_user(struct pt_regs *regs,


2020-05-17 05:45:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner <[email protected]> wrote:
>
>
> The hardware latency tracer calls into trace_sched_clock and ends up in
> various instrumentable functions which is problemeatic vs. the kprobe
> handling especially the text poke machinery. It's invoked from
> nmi_enter/exit(), i.e. non-instrumentable code.
>
> Use nmi_enter/exit_notrace() instead. These variants do not invoke the
> hardware latency tracer which avoids chasing down complex callchains to
> make them non-instrumentable.
>
> The real interesting measurement is the actual NMI handler. Add an explicit
> invocation for the hardware latency tracer to it.
>
> #DB and #BP are uninteresting as they really should not be in use when
> analzying hardware induced latencies.
>

> @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct
> static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> unsigned long dr6)
> {
> - nmi_enter();
> + nmi_enter_notrace();

Why can't exc_debug_kernel() handle instrumentation? We shouldn't
recurse into #DB since we've already cleared DR7, right?

2020-05-17 08:55:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

Andy Lutomirski <[email protected]> writes:
> On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner <[email protected]> wrote:
>>
>>
>> The hardware latency tracer calls into trace_sched_clock and ends up in
>> various instrumentable functions which is problemeatic vs. the kprobe
>> handling especially the text poke machinery. It's invoked from
>> nmi_enter/exit(), i.e. non-instrumentable code.
>>
>> Use nmi_enter/exit_notrace() instead. These variants do not invoke the
>> hardware latency tracer which avoids chasing down complex callchains to
>> make them non-instrumentable.
>>
>> The real interesting measurement is the actual NMI handler. Add an explicit
>> invocation for the hardware latency tracer to it.
>>
>> #DB and #BP are uninteresting as they really should not be in use when
>> analzying hardware induced latencies.
>>
>
>> @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct
>> static __always_inline void exc_debug_kernel(struct pt_regs *regs,
>> unsigned long dr6)
>> {
>> - nmi_enter();
>> + nmi_enter_notrace();
>
> Why can't exc_debug_kernel() handle instrumentation? We shouldn't
> recurse into #DB since we've already cleared DR7, right?

It can later on. The point is that the trace stuff calls into the world
and some more before the entry handling is complete.

Remember this is about ensuring that all the state is properly
established before any of this instrumentation muck can happen.

DR7 handling is specific to #DB and done even before nmi_enter to
prevent recursion.

Thanks,

tglx


2020-05-18 05:55:01

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

On Sun, May 17, 2020 at 1:48 AM Thomas Gleixner <[email protected]> wrote:
>
> Andy Lutomirski <[email protected]> writes:
> > On Fri, May 15, 2020 at 5:10 PM Thomas Gleixner <[email protected]> wrote:
> >>
> >>
> >> The hardware latency tracer calls into trace_sched_clock and ends up in
> >> various instrumentable functions which is problemeatic vs. the kprobe
> >> handling especially the text poke machinery. It's invoked from
> >> nmi_enter/exit(), i.e. non-instrumentable code.
> >>
> >> Use nmi_enter/exit_notrace() instead. These variants do not invoke the
> >> hardware latency tracer which avoids chasing down complex callchains to
> >> make them non-instrumentable.
> >>
> >> The real interesting measurement is the actual NMI handler. Add an explicit
> >> invocation for the hardware latency tracer to it.
> >>
> >> #DB and #BP are uninteresting as they really should not be in use when
> >> analzying hardware induced latencies.
> >>
> >
> >> @@ -849,7 +851,7 @@ static void noinstr handle_debug(struct
> >> static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> >> unsigned long dr6)
> >> {
> >> - nmi_enter();
> >> + nmi_enter_notrace();
> >
> > Why can't exc_debug_kernel() handle instrumentation? We shouldn't
> > recurse into #DB since we've already cleared DR7, right?
>
> It can later on. The point is that the trace stuff calls into the world
> and some more before the entry handling is complete.
>
> Remember this is about ensuring that all the state is properly
> established before any of this instrumentation muck can happen.
>
> DR7 handling is specific to #DB and done even before nmi_enter to
> prevent recursion.

So why is this change needed?

2020-05-18 08:05:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote:
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc
> __this_cpu_write(last_nmi_rip, regs->ip);
>
> instrumentation_begin();
> + ftrace_nmi_handler_enter();
>
> handled = nmi_handle(NMI_LOCAL, regs);
> __this_cpu_add(nmi_stats.normal, handled);
> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc
> unknown_nmi_error(reason, regs);
>
> out:
> + ftrace_nmi_handler_exit();
> instrumentation_end();
> }

Yeah, so I'm confused about this and the previous patch too. Why not
do just this? Remove that ftrace_nmi_handler.* crud from
nmi_{enter,exit}() and stick it here? Why do we needs the
nmi_{enter,exit}_notrace() thing?

2020-05-18 08:07:08

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

Andy Lutomirski <[email protected]> writes:
> On Sun, May 17, 2020 at 1:48 AM Thomas Gleixner <[email protected]> wrote:
>> Remember this is about ensuring that all the state is properly
>> established before any of this instrumentation muck can happen.
>>
>> DR7 handling is specific to #DB and done even before nmi_enter to
>> prevent recursion.
>
> So why is this change needed?

We really want nmi_enter() to be the carefully crafted mechanism which
establishes correct state in whatever strange context the exception
hits. Not more, not less.

Random instrumentation has absolutely no business there and I went a
long way to make sure that this is enforcible by objtool.

Aside of that the tracing which is contained in nmi_enter() is about
taking timestamps for hardware latency detection. If someone runs
hardware latency detection with active break/watchpoints then I really
can't help it.

Thanks,

tglx

2020-05-18 08:11:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

Peter Zijlstra <[email protected]> writes:
> On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote:
>> --- a/arch/x86/kernel/nmi.c
>> +++ b/arch/x86/kernel/nmi.c
>> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc
>> __this_cpu_write(last_nmi_rip, regs->ip);
>>
>> instrumentation_begin();
>> + ftrace_nmi_handler_enter();
>>
>> handled = nmi_handle(NMI_LOCAL, regs);
>> __this_cpu_add(nmi_stats.normal, handled);
>> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc
>> unknown_nmi_error(reason, regs);
>>
>> out:
>> + ftrace_nmi_handler_exit();
>> instrumentation_end();
>> }
>
> Yeah, so I'm confused about this and the previous patch too. Why not
> do just this? Remove that ftrace_nmi_handler.* crud from
> nmi_{enter,exit}() and stick it here? Why do we needs the
> nmi_{enter,exit}_notrace() thing?

Because you then have to fixup _all_ architectures which use
nmi_enter/exit().

Thanks,

tglx

2020-05-18 08:14:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

On Mon, May 18, 2020 at 10:05:56AM +0200, Thomas Gleixner wrote:
> Peter Zijlstra <[email protected]> writes:
> > On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote:
> >> --- a/arch/x86/kernel/nmi.c
> >> +++ b/arch/x86/kernel/nmi.c
> >> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc
> >> __this_cpu_write(last_nmi_rip, regs->ip);
> >>
> >> instrumentation_begin();
> >> + ftrace_nmi_handler_enter();
> >>
> >> handled = nmi_handle(NMI_LOCAL, regs);
> >> __this_cpu_add(nmi_stats.normal, handled);
> >> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc
> >> unknown_nmi_error(reason, regs);
> >>
> >> out:
> >> + ftrace_nmi_handler_exit();
> >> instrumentation_end();
> >> }
> >
> > Yeah, so I'm confused about this and the previous patch too. Why not
> > do just this? Remove that ftrace_nmi_handler.* crud from
> > nmi_{enter,exit}() and stick it here? Why do we needs the
> > nmi_{enter,exit}_notrace() thing?
>
> Because you then have to fixup _all_ architectures which use
> nmi_enter/exit().

We probably have to anyway. But I can do that later I suppose.

2020-05-18 20:47:35

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

On Mon, May 18, 2020 at 1:03 AM Thomas Gleixner <[email protected]> wrote:
>
> Andy Lutomirski <[email protected]> writes:
> > On Sun, May 17, 2020 at 1:48 AM Thomas Gleixner <[email protected]> wrote:
> >> Remember this is about ensuring that all the state is properly
> >> established before any of this instrumentation muck can happen.
> >>
> >> DR7 handling is specific to #DB and done even before nmi_enter to
> >> prevent recursion.
> >
> > So why is this change needed?
>
> We really want nmi_enter() to be the carefully crafted mechanism which
> establishes correct state in whatever strange context the exception
> hits. Not more, not less.
>
> Random instrumentation has absolutely no business there and I went a
> long way to make sure that this is enforcible by objtool.
>
> Aside of that the tracing which is contained in nmi_enter() is about
> taking timestamps for hardware latency detection. If someone runs
> hardware latency detection with active break/watchpoints then I really
> can't help it.
>

Okay. I'll stop looking for the bug you're fixing, then.

2020-05-20 20:12:28

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

Peter Zijlstra <[email protected]> writes:
> On Mon, May 18, 2020 at 10:05:56AM +0200, Thomas Gleixner wrote:
>> Peter Zijlstra <[email protected]> writes:
>> > On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote:
>> >> --- a/arch/x86/kernel/nmi.c
>> >> +++ b/arch/x86/kernel/nmi.c
>> >> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc
>> >> __this_cpu_write(last_nmi_rip, regs->ip);
>> >>
>> >> instrumentation_begin();
>> >> + ftrace_nmi_handler_enter();
>> >>
>> >> handled = nmi_handle(NMI_LOCAL, regs);
>> >> __this_cpu_add(nmi_stats.normal, handled);
>> >> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc
>> >> unknown_nmi_error(reason, regs);
>> >>
>> >> out:
>> >> + ftrace_nmi_handler_exit();
>> >> instrumentation_end();
>> >> }
>> >
>> > Yeah, so I'm confused about this and the previous patch too. Why not
>> > do just this? Remove that ftrace_nmi_handler.* crud from
>> > nmi_{enter,exit}() and stick it here? Why do we needs the
>> > nmi_{enter,exit}_notrace() thing?
>>
>> Because you then have to fixup _all_ architectures which use
>> nmi_enter/exit().
>
> We probably have to anyway. But I can do that later I suppose.

Second thoughts. For #DB and #INT3 we can just keep nmi_enter(), needs
just annotation in nmi_enter() around that trace muck.

For #NMI and #MCE I rather avoid the early trace call and do it once we
have reached "stable" state, i.e. avoid it in the whole nested NMI mess.

Thanks,

tglx

2020-05-20 20:18:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit



> On May 20, 2020, at 1:10 PM, Thomas Gleixner <[email protected]> wrote:
>
> Peter Zijlstra <[email protected]> writes:
>>> On Mon, May 18, 2020 at 10:05:56AM +0200, Thomas Gleixner wrote:
>>> Peter Zijlstra <[email protected]> writes:
>>>> On Sat, May 16, 2020 at 01:45:51AM +0200, Thomas Gleixner wrote:
>>>>> --- a/arch/x86/kernel/nmi.c
>>>>> +++ b/arch/x86/kernel/nmi.c
>>>>> @@ -334,6 +334,7 @@ static noinstr void default_do_nmi(struc
>>>>> __this_cpu_write(last_nmi_rip, regs->ip);
>>>>>
>>>>> instrumentation_begin();
>>>>> + ftrace_nmi_handler_enter();
>>>>>
>>>>> handled = nmi_handle(NMI_LOCAL, regs);
>>>>> __this_cpu_add(nmi_stats.normal, handled);
>>>>> @@ -420,6 +421,7 @@ static noinstr void default_do_nmi(struc
>>>>> unknown_nmi_error(reason, regs);
>>>>>
>>>>> out:
>>>>> + ftrace_nmi_handler_exit();
>>>>> instrumentation_end();
>>>>> }
>>>>
>>>> Yeah, so I'm confused about this and the previous patch too. Why not
>>>> do just this? Remove that ftrace_nmi_handler.* crud from
>>>> nmi_{enter,exit}() and stick it here? Why do we needs the
>>>> nmi_{enter,exit}_notrace() thing?
>>>
>>> Because you then have to fixup _all_ architectures which use
>>> nmi_enter/exit().
>>
>> We probably have to anyway. But I can do that later I suppose.
>
> Second thoughts. For #DB and #INT3 we can just keep nmi_enter(), needs
> just annotation in nmi_enter() around that trace muck.
>
> For #NMI and #MCE I rather avoid the early trace call and do it once we
> have reached "stable" state, i.e. avoid it in the whole nested NMI mess.
>
>

What’s the issue? The actual meat is mostly in the asm for NMI, and for MCE it’s just the sync-all-the-cores thing. The actual simultaneous NMI-and-MCE case is utterly busted regardless, and I’ve been thinking about how to fix it. It won’t be pretty, but nmi_enter() will have nothing to do with it.

2020-05-20 22:23:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V6 04/37] x86: Make hardware latency tracing explicit

Andy Lutomirski <[email protected]> writes:
>> On May 20, 2020, at 1:10 PM, Thomas Gleixner <[email protected]> wrote:
>> Peter Zijlstra <[email protected]> writes:
>>> We probably have to anyway. But I can do that later I suppose.
>>
>> Second thoughts. For #DB and #INT3 we can just keep nmi_enter(), needs
>> just annotation in nmi_enter() around that trace muck.
>>
>> For #NMI and #MCE I rather avoid the early trace call and do it once we
>> have reached "stable" state, i.e. avoid it in the whole nested NMI mess.
>
> What’s the issue? The actual meat is mostly in the asm for NMI, and
> for MCE it’s just the sync-all-the-cores thing. The actual
> simultaneous NMI-and-MCE case is utterly busted regardless, and I’ve
> been thinking about how to fix it. It won’t be pretty, but nmi_enter()
> will have nothing to do with it.

The issue is that I want to avoid anything which is not essential just
for pure paranoia reasons.

I can drop that and just move the trace muck after RCU is safe and
annotate it properly.