2015-08-21 05:03:28

by Andy Lutomirski

[permalink] [raw]
Subject: [PATCH] x86/traps: Weaken context tracking entry assertions

We were asserting that we were all the way in CONTEXT_KERNEL when
exception handlers were called. While having this be true is, I
think, a nice goal (or maybe a variant in which we assert that we're
in CONTEXT_KERNEL or some new IRQ context), we're not quite there.

In particular, if an IRQ interrupts the SYSCALL prologue and the IRQ
handler in turn causes an exception, the exception entry will be
called in RCU IRQ mode but with CONTEXT_USER.

This is okay (nothing goes wrong), but until we fix up the SYSCALL
prologue, we need to avoid warning.

Signed-off-by: Andy Lutomirski <[email protected]>
---
arch/x86/kernel/traps.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 86a82eafb96f..45e8d9891fa3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -112,7 +112,7 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
void ist_enter(struct pt_regs *regs)
{
if (user_mode(regs)) {
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
} else {
/*
* We might have interrupted pretty much anything. In
@@ -282,7 +282,7 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
{
siginfo_t info;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");

if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
@@ -364,7 +364,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
const struct bndcsr *bndcsr;
siginfo_t *info;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
return;
@@ -442,7 +442,7 @@ do_general_protection(struct pt_regs *regs, long error_code)
{
struct task_struct *tsk;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
conditional_sti(regs);

if (v8086_mode(regs)) {
@@ -496,7 +496,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
return;

ist_enter(regs);
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
@@ -729,14 +729,14 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)

dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
{
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_MF);
}

dotraplinkage void
do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
{
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_XF);
}

@@ -749,7 +749,7 @@ do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
dotraplinkage void
do_device_not_available(struct pt_regs *regs, long error_code)
{
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
BUG_ON(use_eager_fpu());

#ifdef CONFIG_MATH_EMULATION
@@ -775,7 +775,7 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
{
siginfo_t info;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
local_irq_enable();

info.si_signo = SIGILL;
--
2.4.3


2015-08-21 06:23:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/traps: Weaken context tracking entry assertions


* Andy Lutomirski <[email protected]> wrote:

> We were asserting that we were all the way in CONTEXT_KERNEL when exception
> handlers were called. While having this be true is, I think, a nice goal (or
> maybe a variant in which we assert that we're in CONTEXT_KERNEL or some new IRQ
> context), we're not quite there.
>
> In particular, if an IRQ interrupts the SYSCALL prologue and the IRQ handler in
> turn causes an exception, the exception entry will be called in RCU IRQ mode but
> with CONTEXT_USER.

Hm, so what harm would there be in making IRQ handlers enter CONTEXT_KERNEL?
Would nohz-full break?

I'd rather have a bit more tracking overhead here than lose such useful sanity
checks.

Thanks,

Ingo

2015-08-21 13:24:44

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/traps: Weaken context tracking entry assertions

On Thu, Aug 20, 2015 at 10:03:21PM -0700, Andy Lutomirski wrote:
> We were asserting that we were all the way in CONTEXT_KERNEL when
> exception handlers were called. While having this be true is, I
> think, a nice goal (or maybe a variant in which we assert that we're
> in CONTEXT_KERNEL or some new IRQ context), we're not quite there.
>
> In particular, if an IRQ interrupts the SYSCALL prologue and the IRQ
> handler in turn causes an exception, the exception entry will be
> called in RCU IRQ mode but with CONTEXT_USER.
>
> This is okay (nothing goes wrong), but until we fix up the SYSCALL
> prologue, we need to avoid warning.

We can avoid interrupts before the context tracking call but we'll
never be able to remove all possibility for exceptions. I don't think
we can assume that without making context tracking more fragile.

>
> Signed-off-by: Andy Lutomirski <[email protected]>

ACK!

Thanks!

We can indeed definetly trigger an exception in the kernel entry code
(syscall, exception, irq) before the user_exit() call and that
would break the checks. We can fix that later with context tracking
calls on exception entry code. I still think an exception slow path
based on static keys is the best way to go there.

> ---
> arch/x86/kernel/traps.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 86a82eafb96f..45e8d9891fa3 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -112,7 +112,7 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
> void ist_enter(struct pt_regs *regs)
> {
> if (user_mode(regs)) {
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> } else {
> /*
> * We might have interrupted pretty much anything. In
> @@ -282,7 +282,7 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
> {
> siginfo_t info;
>
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
>
> if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
> NOTIFY_STOP) {
> @@ -364,7 +364,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
> const struct bndcsr *bndcsr;
> siginfo_t *info;
>
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> if (notify_die(DIE_TRAP, "bounds", regs, error_code,
> X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
> return;
> @@ -442,7 +442,7 @@ do_general_protection(struct pt_regs *regs, long error_code)
> {
> struct task_struct *tsk;
>
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> conditional_sti(regs);
>
> if (v8086_mode(regs)) {
> @@ -496,7 +496,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
> return;
>
> ist_enter(regs);
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
> SIGTRAP) == NOTIFY_STOP)
> @@ -729,14 +729,14 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)
>
> dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
> {
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> math_error(regs, error_code, X86_TRAP_MF);
> }
>
> dotraplinkage void
> do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
> {
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> math_error(regs, error_code, X86_TRAP_XF);
> }
>
> @@ -749,7 +749,7 @@ do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
> dotraplinkage void
> do_device_not_available(struct pt_regs *regs, long error_code)
> {
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> BUG_ON(use_eager_fpu());
>
> #ifdef CONFIG_MATH_EMULATION
> @@ -775,7 +775,7 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
> {
> siginfo_t info;
>
> - CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
> + rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
> local_irq_enable();
>
> info.si_signo = SIGILL;
> --
> 2.4.3
>

2015-08-21 13:38:58

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/traps: Weaken context tracking entry assertions

On Fri, Aug 21, 2015 at 08:23:28AM +0200, Ingo Molnar wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
> > We were asserting that we were all the way in CONTEXT_KERNEL when exception
> > handlers were called. While having this be true is, I think, a nice goal (or
> > maybe a variant in which we assert that we're in CONTEXT_KERNEL or some new IRQ
> > context), we're not quite there.
> >
> > In particular, if an IRQ interrupts the SYSCALL prologue and the IRQ handler in
> > turn causes an exception, the exception entry will be called in RCU IRQ mode but
> > with CONTEXT_USER.
>
> Hm, so what harm would there be in making IRQ handlers enter CONTEXT_KERNEL?
> Would nohz-full break?

That would imply to double the calls to vtime and RCU that are already in irq
generic handlers. Now we can have a CONTEXT_IRQ flag if you guys really want to
track irqs, something that takes care of not calling the RCU and time accounting
twice.

Now exceptions can still happen on irq entry before we run the context tracking call
though. I think there will always be this kind of fragility due to the drift between
soft context tracking and real context.

>
> I'd rather have a bit more tracking overhead here than lose such useful sanity
> checks.

The RCU check should testify enough about sanity here.

2015-08-21 14:40:01

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH] x86/traps: Weaken context tracking entry assertions

On Thu, Aug 20, 2015 at 11:23 PM, Ingo Molnar <[email protected]> wrote:
>
> * Andy Lutomirski <[email protected]> wrote:
>
>> We were asserting that we were all the way in CONTEXT_KERNEL when exception
>> handlers were called. While having this be true is, I think, a nice goal (or
>> maybe a variant in which we assert that we're in CONTEXT_KERNEL or some new IRQ
>> context), we're not quite there.
>>
>> In particular, if an IRQ interrupts the SYSCALL prologue and the IRQ handler in
>> turn causes an exception, the exception entry will be called in RCU IRQ mode but
>> with CONTEXT_USER.
>
> Hm, so what harm would there be in making IRQ handlers enter CONTEXT_KERNEL?
> Would nohz-full break?
>

We already do it for IRQs that hit user mode. We don't do it for IRQs
that hit kernel mode because we don't need it yet (with this patch
applied) and because IMO we have no business taking IRQs from kernel
mode while in CONTEXT_USER.

I want to fix the latter in 4.4. It's easy for native entries (it's
exactly the entry_64.S part of the other patch I sent), but it's
currently a big mess for compat entries because of the uaccess for
arg6, and I got that totally wrong in my patch. Rather than further
complicating the asm, I think I want to try moving all of the compat
entries into C for 4.4. I ran out of time to do it for 4.3.

Also, Rik said awhile ago that *huge* context tracking speedups would
become possible if we promised to stop calling the context tracking
hooks with IRQs on. That's almost done in -tip -- I think the only
remaining ones are the syscall entries. (syscall return is done in
-tip.)

I could teach IRQ entries to switch all the way to CONTEXT_KERNEL even
if they interrupt syscall entry, but that would also make the asm
messier for minimal short-term-only gain.

> I'd rather have a bit more tracking overhead here than lose such useful sanity
> checks.

I agree, but even the weaker sanity checks retain a decent amount of the value.

--Andy

Subject: [tip:core/core] x86/traps: Weaken context tracking entry assertions

Commit-ID: f0a97af83f6287357dcc100c859ec0066f164f32
Gitweb: http://git.kernel.org/tip/f0a97af83f6287357dcc100c859ec0066f164f32
Author: Andy Lutomirski <[email protected]>
AuthorDate: Thu, 20 Aug 2015 22:03:21 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 22 Aug 2015 11:12:10 +0200

x86/traps: Weaken context tracking entry assertions

We were asserting that we were all the way in CONTEXT_KERNEL
when exception handlers were called. While having this be true
is, I think, a nice goal (or maybe a variant in which we assert
that we're in CONTEXT_KERNEL or some new IRQ context), we're not
quite there.

In particular, if an IRQ interrupts the SYSCALL prologue and the
IRQ handler in turn causes an exception, the exception entry
will be called in RCU IRQ mode but with CONTEXT_USER.

This is okay (nothing goes wrong), but until we fix up the
SYSCALL prologue, we need to avoid warning.

Signed-off-by: Andy Lutomirski <[email protected]>
Acked-by: Frederic Weisbecker <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/c81faf3916346c0e04346c441392974f49cd7184.1440133286.git.luto@kernel.org
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/traps.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 86a82ea..45e8d98 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -112,7 +112,7 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
void ist_enter(struct pt_regs *regs)
{
if (user_mode(regs)) {
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
} else {
/*
* We might have interrupted pretty much anything. In
@@ -282,7 +282,7 @@ static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
{
siginfo_t info;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");

if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
@@ -364,7 +364,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
const struct bndcsr *bndcsr;
siginfo_t *info;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, "bounds", regs, error_code,
X86_TRAP_BR, SIGSEGV) == NOTIFY_STOP)
return;
@@ -442,7 +442,7 @@ do_general_protection(struct pt_regs *regs, long error_code)
{
struct task_struct *tsk;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
conditional_sti(regs);

if (v8086_mode(regs)) {
@@ -496,7 +496,7 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
return;

ist_enter(regs);
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
@@ -729,14 +729,14 @@ static void math_error(struct pt_regs *regs, int error_code, int trapnr)

dotraplinkage void do_coprocessor_error(struct pt_regs *regs, long error_code)
{
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_MF);
}

dotraplinkage void
do_simd_coprocessor_error(struct pt_regs *regs, long error_code)
{
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
math_error(regs, error_code, X86_TRAP_XF);
}

@@ -749,7 +749,7 @@ do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)
dotraplinkage void
do_device_not_available(struct pt_regs *regs, long error_code)
{
- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
BUG_ON(use_eager_fpu());

#ifdef CONFIG_MATH_EMULATION
@@ -775,7 +775,7 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code)
{
siginfo_t info;

- CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
+ rcu_lockdep_assert(rcu_is_watching(), "entry code didn't wake RCU");
local_irq_enable();

info.si_signo = SIGILL;