2015-08-18 19:12:17

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

This fixes a couple minor holes if we took an IRQ very early in syscall
processing:

- We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
was fine), but we could warn if all the debugging options were
set.

- We could have the IRQ regs overlap task_pt_regs. I'm not aware
of anything important that would break, but some of the /proc
stuff could plausibly have gotten confused.

Fix it the straightforward way: finish filling in pt_regs and call
enter_from_user_mode before enabling interrupts if _TIF_NOHZ is set.

This should be the last piece of the puzzle needed to get rid of most
remaining exception_enter calls. (vmalloc faults are still tricky,
but they're mostly fatal in the syscall prologue already.)

Signed-off-by: Andy Lutomirski <[email protected]>
---

This is the last significant functionality change I send for 4.3, I
hope. With this applied, context tracking for all non-NMI, non-debug
entries should be exact.

There's probably some (minor) performance regression on
CONFIG_CONTEXT_TRACKING=y kernels that aren't using nohz. If so
(I'll benchmark it later this week), I'll try to rig up a simple
patch to NOP out the hooks of nohz is off.

Sasha, this should fix the intermittent DEBUG_LOCKS splat you're
seeing.

I don't intend to send v2 the #BP stuff for 4.3. The pile is plenty
big already.

arch/x86/entry/common.c | 12 +-------
arch/x86/entry/entry_64.S | 32 ++++++++++++++------
arch/x86/entry/entry_64_compat.S | 60 +++++++++++++++++++++++++++++---------
arch/x86/include/asm/thread_info.h | 3 +-
4 files changed, 71 insertions(+), 36 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 80dcc9261ca3..b570cea2f469 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -70,21 +70,11 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
u32 work;

BUG_ON(regs != task_pt_regs(current));
+ CT_WARN_ON(ct_state() != CONTEXT_KERNEL);

work = ACCESS_ONCE(current_thread_info()->flags) &
_TIF_WORK_SYSCALL_ENTRY;

-#ifdef CONFIG_CONTEXT_TRACKING
- /*
- * If TIF_NOHZ is set, we are required to call user_exit() before
- * doing anything that could touch RCU.
- */
- if (work & _TIF_NOHZ) {
- enter_from_user_mode();
- work &= ~_TIF_NOHZ;
- }
-#endif
-
#ifdef CONFIG_SECCOMP
/*
* Do seccomp first -- it should minimize exposure of other
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e2d078c9dfe4..6bf0c7ecf399 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -142,20 +142,16 @@ ENTRY(entry_SYSCALL_64)
*/
GLOBAL(entry_SYSCALL_64_after_swapgs)

+ /*
+ * IRQs must be off while we use rsp_scratch to keep it from
+ * being clobbered by a different task.
+ */
movq %rsp, PER_CPU_VAR(rsp_scratch)
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp

/* Construct struct pt_regs on stack */
pushq $__USER_DS /* pt_regs->ss */
pushq PER_CPU_VAR(rsp_scratch) /* pt_regs->sp */
- /*
- * Re-enable interrupts.
- * We use 'rsp_scratch' as a scratch space, hence irq-off block above
- * must execute atomically in the face of possible interrupt-driven
- * task preemption. We must enable interrupts only after we're done
- * with using rsp_scratch:
- */
- ENABLE_INTERRUPTS(CLBR_NONE)
pushq %r11 /* pt_regs->flags */
pushq $__USER_CS /* pt_regs->cs */
pushq %rcx /* pt_regs->ip */
@@ -171,8 +167,17 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
pushq %r11 /* pt_regs->r11 */
sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */

- testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+ testl $(_TIF_WORK_SYSCALL_ENTRY | _TIF_NOHZ), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz tracesys
+
+ /*
+ * Re-enable interrupts. IRQ tracing already thinks that IRQs are
+ * on (since we treat user mode as having IRQs on), and the
+ * prologue above is too short for it to be worth adding a
+ * tracing round trip.
+ */
+ ENABLE_INTERRUPTS(CLBR_NONE)
+
entry_SYSCALL_64_fastpath:
#if __SYSCALL_MASK == ~0
cmpq $__NR_syscall_max, %rax
@@ -235,6 +240,15 @@ GLOBAL(int_ret_from_sys_call_irqs_off)

/* Do syscall entry tracing */
tracesys:
+#ifdef CONFIG_CONTEXT_TRACKING
+ /* This is slow enough that it's worth tracing. */
+ TRACE_IRQS_OFF
+ call enter_from_user_mode
+ TRACE_IRQS_ON
+#endif
+
+ ENABLE_INTERRUPTS(CLBR_NONE)
+
movq %rsp, %rdi
movl $AUDIT_ARCH_X86_64, %esi
call syscall_trace_enter_phase1
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index ff32a289b5d1..099ec1174ff9 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -103,11 +103,19 @@ ENTRY(entry_SYSENTER_compat)
jnz sysenter_fix_flags
sysenter_flags_fixed:

+#ifdef CONFIG_CONTEXT_TRACKING
+ /* This is slow enough that it's worth tracing. */
+ TRACE_IRQS_OFF
+ call enter_from_user_mode
+ TRACE_IRQS_ON
+#endif
+
/*
* Re-enable interrupts. IRQ tracing already thinks that IRQs are
* on (since we treat user mode as having IRQs on), and the
* prologue above is too short for it to be worth adding a
- * tracing round trip.
+ * tracing round trip except in the CONFIG_CONTEXT_TRACKING
+ * case.
*/
ENABLE_INTERRUPTS(CLBR_NONE)

@@ -318,15 +326,10 @@ ENDPROC(entry_SYSENTER_compat)
* with the int 0x80 path.
*/
ENTRY(entry_SYSCALL_compat)
- /*
- * Interrupts are off on entry.
- * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
- * it is too small to ever cause noticeable irq latency.
- */
+ /* Interrupts are off on entry. */
SWAPGS_UNSAFE_STACK
movl %esp, %r8d
movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
- ENABLE_INTERRUPTS(CLBR_NONE)

/* Zero-extending 32-bit regs, do not remove */
movl %eax, %eax
@@ -346,6 +349,22 @@ ENTRY(entry_SYSCALL_compat)
pushq $-ENOSYS /* pt_regs->ax */
sub $(10*8), %rsp /* pt_regs->r8-11, bp, bx, r12-15 not saved */

+#ifdef CONFIG_CONTEXT_TRACKING
+ /* This is slow enough that it's worth tracing. */
+ TRACE_IRQS_OFF
+ call enter_from_user_mode
+ TRACE_IRQS_ON
+#endif
+
+ /*
+ * Re-enable interrupts. IRQ tracing already thinks that IRQs are
+ * on (since we treat user mode as having IRQs on), and the
+ * prologue above is too short for it to be worth adding a
+ * tracing round trip except in the CONFIG_CONTEXT_TRACKING
+ * case.
+ */
+ ENABLE_INTERRUPTS(CLBR_NONE)
+
/*
* No need to do an access_ok check here because r8 has been
* 32-bit zero extended:
@@ -354,6 +373,7 @@ ENTRY(entry_SYSCALL_compat)
1: movl (%r8), %r9d
_ASM_EXTABLE(1b, ia32_badarg)
ASM_CLAC
+
orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz cstar_tracesys
@@ -518,14 +538,9 @@ ia32_ret_from_sys_call:
*/

ENTRY(entry_INT80_compat)
- /*
- * Interrupts are off on entry.
- * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
- * it is too small to ever cause noticeable irq latency.
- */
+ /* Interrupts are off on entry. */
PARAVIRT_ADJUST_EXCEPTION_FRAME
SWAPGS
- ENABLE_INTERRUPTS(CLBR_NONE)

/* Zero-extending 32-bit regs, do not remove */
movl %eax, %eax
@@ -545,9 +560,17 @@ ENTRY(entry_INT80_compat)
sub $(6*8), %rsp /* pt_regs->bp, bx, r12-15 not saved */

orl $TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
- testl $_TIF_WORK_SYSCALL_ENTRY, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
+ testl $(_TIF_WORK_SYSCALL_ENTRY | _TIF_NOHZ), ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
jnz ia32_tracesys

+ /*
+ * Re-enable interrupts. IRQ tracing already thinks that IRQs are
+ * on (since we treat user mode as having IRQs on), and the
+ * prologue above is too short for it to be worth adding a
+ * tracing round trip.
+ */
+ ENABLE_INTERRUPTS(CLBR_NONE)
+
ia32_do_call:
/* 32-bit syscall -> 64-bit C ABI argument conversion */
movl %edi, %r8d /* arg5 */
@@ -564,6 +587,15 @@ ia32_do_call:
jmp int_ret_from_sys_call

ia32_tracesys:
+#ifdef CONFIG_CONTEXT_TRACKING
+ /* This is slow enough that it's worth tracing. */
+ TRACE_IRQS_OFF
+ call enter_from_user_mode
+ TRACE_IRQS_ON
+#endif
+
+ ENABLE_INTERRUPTS(CLBR_NONE)
+
SAVE_EXTRA_REGS
movq %rsp, %rdi /* &pt_regs -> arg1 */
call syscall_trace_enter
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 8afdc3e44247..3c5a96815dec 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -140,8 +140,7 @@ struct thread_info {
/* work to do in syscall_trace_enter() */
#define _TIF_WORK_SYSCALL_ENTRY \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \
- _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \
- _TIF_NOHZ)
+ _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)

/* work to do on any return to user space */
#define _TIF_ALLWORK_MASK \
--
2.4.3


2015-08-18 22:16:33

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
> This fixes a couple minor holes if we took an IRQ very early in syscall
> processing:
>
> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
> was fine), but we could warn if all the debugging options were
> set.

So this is fixing issues after your changes that call user_exit() from
IRQs, right?

But the IRQs aren't supposed to call user_exit(), they have their own hooks.
That's where the real issue is.

2015-08-18 22:35:51

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <[email protected]> wrote:
> On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
>> This fixes a couple minor holes if we took an IRQ very early in syscall
>> processing:
>>
>> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
>> was fine), but we could warn if all the debugging options were
>> set.
>
> So this is fixing issues after your changes that call user_exit() from
> IRQs, right?

Yes. Here's an example splat, courtesy of Sasha:

https://gist.github.com/sashalevin/a006a44989312f6835e7

>
> But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> That's where the real issue is.

In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
when entering the kernel for a non-NMI reason. That means that we can
avoid all of the (expensive!) checks for what context we're in. It
also means that (other than IRQs, which need further cleanup), we only
switch once per user/kernel switch.

The cost for doing should be essentially zero, modulo artifacts from
poor inlining. IMO the code is much more straightforward than it used
to be, and it has the potential to be quite fast. For one thing, we
never invoke context tracking with IRQs on, and Rik had some profiles
suggesting that a bunch of the overhead involved dealing with repeated
irq flag manipulation.

One way or another, IRQs need to switch from RCU-not-watching to
RCU-watching, and I don't see what's wrong with user_exit for this
purpose. Of course, if user_exit is slow, we should fix that.

Also, this isn't really related to IRQs calling user_exit. It's that
IRQs can recurse into other entries (#GP in Sasha's case) which also
validate the context.

None of the speedups that will be enabled are written yet, but I
strongly suspect they will be soon :)

In my book, the fact that we now have context tracking assertions all
over the place is a good thing. It means we're much less likely to
break it.

--Andy


--
Andy Lutomirski
AMA Capital Management, LLC

2015-08-18 23:02:41

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <[email protected]> wrote:
> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
> >> This fixes a couple minor holes if we took an IRQ very early in syscall
> >> processing:
> >>
> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
> >> was fine), but we could warn if all the debugging options were
> >> set.
> >
> > So this is fixing issues after your changes that call user_exit() from
> > IRQs, right?
>
> Yes. Here's an example splat, courtesy of Sasha:
>
> https://gist.github.com/sashalevin/a006a44989312f6835e7
>
> >
> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> > That's where the real issue is.
>
> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
> when entering the kernel for a non-NMI reason.

Why? IRQs don't need that! We already have irq_enter()/irq_exit().

And we don't want to call rcu_user_*() pairs on IRQs, you're
introducing a serious performance regression here! And I'm talking about
the code that's currently in -tip.

> That means that we can
> avoid all of the (expensive!) checks for what context we're in.

If you're referring to context tracking, the context check is a per-cpu
read. Not something that's usually considered expensive.

> It also means that (other than IRQs, which need further cleanup), we only
> switch once per user/kernel switch.

???

>
> The cost for doing should be essentially zero, modulo artifacts from
> poor inlining.

And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
implying full memory barriers. Plus the unnecessary vtime accounting that doubles
the existing one in irq_enter/exit() (those even imply a lock currently, which will
probably be turned to seqcount, but still, full memory barriers...).

I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
concerns current tip:x86/asm).

2015-08-18 23:08:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker <[email protected]> wrote:
> On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
>> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <[email protected]> wrote:
>> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
>> >> This fixes a couple minor holes if we took an IRQ very early in syscall
>> >> processing:
>> >>
>> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
>> >> was fine), but we could warn if all the debugging options were
>> >> set.
>> >
>> > So this is fixing issues after your changes that call user_exit() from
>> > IRQs, right?
>>
>> Yes. Here's an example splat, courtesy of Sasha:
>>
>> https://gist.github.com/sashalevin/a006a44989312f6835e7
>>
>> >
>> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
>> > That's where the real issue is.
>>
>> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
>> when entering the kernel for a non-NMI reason.
>
> Why? IRQs don't need that! We already have irq_enter()/irq_exit().
>

Those are certainly redundant. I want to have a real hook to call
that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ
context from CONTEXT_KERNEL" (aka noop), but that doesn't currently
exist.

> And we don't want to call rcu_user_*() pairs on IRQs, you're
> introducing a serious performance regression here! And I'm talking about
> the code that's currently in -tip.

Is there an easy way to fix it? For example, could we figure out what
makes it take so long and make it faster? If we need to, we could
back out the IRQ bit and change the assertions for 4.3, but I'd rather
keep the exact context tracking if at all possible.

>
>> That means that we can
>> avoid all of the (expensive!) checks for what context we're in.
>
> If you're referring to context tracking, the context check is a per-cpu
> read. Not something that's usually considered expensive.

In -tip, there aren't even extra branches, except those imposed by the
user_exit implementation.

>
>> It also means that (other than IRQs, which need further cleanup), we only
>> switch once per user/kernel switch.
>
> ???

In 4.2 and before, we can switch multiple times on the way out of the
kernel, via SCHEDULE_USER, do_notify_resume, etc. In -tip, we do it
exactly once no matter what.

>
>>
>> The cost for doing should be essentially zero, modulo artifacts from
>> poor inlining.
>
> And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
> implying full memory barriers. Plus the unnecessary vtime accounting that doubles
> the existing one in irq_enter/exit() (those even imply a lock currently, which will
> probably be turned to seqcount, but still, full memory barriers...).
>
> I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
> concerns current tip:x86/asm).

Why do we need these heavyweight barriers?

If there's actually a measurable performance hit in IRQs in -tip, then
can we come up with a better fix? For example, we could change all
the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ
context" and make the IRQ entry do a lighter weight context tracking
operation.

But I think I'm still missing something fundamental about the
performance: why is irq_enter() any faster than user_exit()?

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2015-08-19 00:31:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 12:11 PM, Andy Lutomirski <[email protected]> wrote:
> This fixes a couple minor holes if we took an IRQ very early in syscall
> processing:
>

The patch is buggy. v2 coming soon, hopefully.

--Andy

2015-08-19 15:54:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 5:30 PM, Andy Lutomirski <[email protected]> wrote:
> On Tue, Aug 18, 2015 at 12:11 PM, Andy Lutomirski <[email protected]> wrote:
>> This fixes a couple minor holes if we took an IRQ very early in syscall
>> processing:
>>
>
> The patch is buggy. v2 coming soon, hopefully.
>

No, let's drop this for now. It would be straightforward but quite
messy to fix it, and I don't want to introduce further
incomprehensibility into the asm. I'll send the warning change
instead, and we can fix it better for 4.4

--Andy

2015-08-19 17:10:14

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 04:07:51PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 18, 2015 at 4:02 PM, Frederic Weisbecker <[email protected]> wrote:
> > On Tue, Aug 18, 2015 at 03:35:30PM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <[email protected]> wrote:
> >> > On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
> >> >> This fixes a couple minor holes if we took an IRQ very early in syscall
> >> >> processing:
> >> >>
> >> >> - We could enter the IRQ with CONTEXT_USER. Everything worked (RCU
> >> >> was fine), but we could warn if all the debugging options were
> >> >> set.
> >> >
> >> > So this is fixing issues after your changes that call user_exit() from
> >> > IRQs, right?
> >>
> >> Yes. Here's an example splat, courtesy of Sasha:
> >>
> >> https://gist.github.com/sashalevin/a006a44989312f6835e7
> >>
> >> >
> >> > But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> >> > That's where the real issue is.
> >>
> >> In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
> >> when entering the kernel for a non-NMI reason.
> >
> > Why? IRQs don't need that! We already have irq_enter()/irq_exit().
> >
>
> Those are certainly redundant.

So? What's the point in duplicating a hook in arch code that core code already
has?

> I want to have a real hook to call
> that says "switch to IRQ context from CONTEXT_USER" or "switch to IRQ
> context from CONTEXT_KERNEL" (aka noop), but that doesn't currently
> exist.

You're not answering _why_ you want that.

>
> > And we don't want to call rcu_user_*() pairs on IRQs, you're
> > introducing a serious performance regression here! And I'm talking about
> > the code that's currently in -tip.
>
> Is there an easy way to fix it? For example, could we figure out what
> makes it take so long and make it faster?

Sure, just remove your arch IRQ hook.

> If we need to, we could
> back out the IRQ bit and change the assertions for 4.3, but I'd rather
> keep the exact context tracking if at all possible.

I have no idea what you mean by exact context tracking here.

But If we ever want to call irq_enter() using arch hooks, and I have no idea why
we would ever want to do that since that involve complexifying the code
by $NR_ARCHS and moving C code to ASM, we need serious reasons! And that's
certainly not something we are going to plan now for the next week's merge window.

> >> That means that we can
> >> avoid all of the (expensive!) checks for what context we're in.
> >
> > If you're referring to context tracking, the context check is a per-cpu
> > read. Not something that's usually considered expensive.
>
> In -tip, there aren't even extra branches, except those imposed by the
> user_exit implementation.

No there is the "call enter_from_user_mode" in the IRQ fast path.

>
> >
> >> It also means that (other than IRQs, which need further cleanup), we only
> >> switch once per user/kernel switch.
> >
> > ???
>
> In 4.2 and before, we can switch multiple times on the way out of the
> kernel, via SCHEDULE_USER, do_notify_resume, etc. In -tip, we do it
> exactly once no matter what.

That's what we want for syscalls but not for IRQs.

>
> >
> >>
> >> The cost for doing should be essentially zero, modulo artifacts from
> >> poor inlining.
> >
> > And modulo rcu_user_*() that do multiple costly atomic_add_return() operations
> > implying full memory barriers. Plus the unnecessary vtime accounting that doubles
> > the existing one in irq_enter/exit() (those even imply a lock currently, which will
> > probably be turned to seqcount, but still, full memory barriers...).
> >
> > I'm sorry but I'm going to NACK any code that does that in IRQs (and again that
> > concerns current tip:x86/asm).
>
> Why do we need these heavyweight barriers?

Actually it's not full barriers but atomic ones (smp_mb__after_atomic_stuff())
I suspect we can't do much better given RCU requirements.

Still we don't need to call it twice.

>
> If there's actually a measurable performance hit in IRQs in -tip, then
> can we come up with a better fix?

I'm sure it's very easily measurable.

> For example, we could change all
> the new CT_WARN_ON calls to check "are we in CONTEXT_KERNEL or in IRQ
> context" and make the IRQ entry do a lighter weight context tracking
> operation.

I don't see what we need to check actually. Context tracking can be in any
state while in IRQ.

>
> But I think I'm still missing something fundamental about the
> performance: why is irq_enter() any faster than user_exit()?

It's stlightly faster at least because it takes care of nesting IRQs which
is likely with softirqs that get interrupted.

Now of course we wouldn't call user_exit() in this case, but the hook is there
in generic code, no need for anything from the arch.

2015-08-21 07:50:41

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts


* Frederic Weisbecker <[email protected]> wrote:

> > I want to have a real hook to call that says "switch to IRQ context from
> > CONTEXT_USER" or "switch to IRQ context from CONTEXT_KERNEL" (aka noop), but
> > that doesn't currently exist.
>
> You're not answering _why_ you want that.

So we'd have a comprehensive, 100% coverage, self-sufficient set of callbacks that
track the kernel's current context state at the points where the context switches
actually occur - not just something cobbled together heterogenously. The low level
x86 asm code was rather messy in this area, better organization would be welcome,
I don't think we can overdo it.

( I'm assuming here that it can all be done for zero or negative cost, and that
the result will be correct and won't hurt existing users in any fashion. )

Thanks,

Ingo

2015-08-21 13:14:09

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Fri, Aug 21, 2015 at 09:50:35AM +0200, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > > I want to have a real hook to call that says "switch to IRQ context from
> > > CONTEXT_USER" or "switch to IRQ context from CONTEXT_KERNEL" (aka noop), but
> > > that doesn't currently exist.
> >
> > You're not answering _why_ you want that.
>
> So we'd have a comprehensive, 100% coverage, self-sufficient set of callbacks that
> track the kernel's current context state at the points where the context switches
> actually occur - not just something cobbled together heterogenously. The low level
> x86 asm code was rather messy in this area, better organization would be welcome,
> I don't think we can overdo it.
>
> ( I'm assuming here that it can all be done for zero or negative cost, and that
> the result will be correct and won't hurt existing users in any fashion. )

Sure, if x86 needs such a thing for internal organization it's ok, but don't
do context tracking calls there. irq_enter() and irq_exit() already take care
of RCU and time accounting. What Andy is doing is roughly like calling these
callbacks twice for no reason. It's horrible for IRQ performances.

The merge window is like tomorrow and he keeps arguing against that obvious
regression instead of fixing it.

Thanks.