LinuxLists.cc - [PATCH -v5 0/5] rcu,nohz,kvm: use RCU extended quiescent state when running KVM guest

2015-02-10 20:28:07

Subject: [PATCH -v5 0/5] rcu,nohz,kvm: use RCU extended quiescent state when running KVM guest

When running a KVM guest on a system with NOHZ_FULL enabled, and the
KVM guest running with idle=poll mode, we still get wakeups of the
rcuos/N threads.

This problem has already been solved for user space by telling the
RCU subsystem that the CPU is in an extended quiescent state while
running user space code.

This patch series extends that code a little bit to make it usable
to track KVM guest space, too.

I tested the code by booting a KVM guest with idle=poll, on a system
with NOHZ_FULL enabled on most CPUs, and a VCPU thread bound to a
CPU. In a 10 second interval, rcuos/N threads on other CPUs got woken
up several times, while the rcuos thread on the CPU running the bound
and alwasy running VCPU thread never got woken up once.

Thanks to Christian Borntraeger, Paul McKenney, Paulo Bonzini,
Frederic Weisbecker, and Will Deacon for reviewing and improving
earlier versions of this patch series.

2015-02-10 20:28:08

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 1/5] context_tracking: generalize context tracking APIs to support user and guest

From: Rik van Riel <[email protected]>

Split out the mechanism from context_tracking_user_enter and
context_tracking_user_exit into context_tracking_enter and
context_tracking_exit. Leave the old functions in order to avoid
breaking ARM, which calls these functions from assembler code,
and cannot easily use C enum parameters.

Add the expected ctx_state as a parameter to context_tracking_enter and
context_tracking_exit, allowing the same functions to not just track
kernel <> user space switching, but also kernel <> guest transitions.

Signed-off-by: Rik van Riel <[email protected]>
---
include/linux/context_tracking.h | 8 +++++---
kernel/context_tracking.c | 43 ++++++++++++++++++++++++++--------------
2 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 37b81bd51ec0..954253283709 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -10,6 +10,8 @@
#ifdef CONFIG_CONTEXT_TRACKING
extern void context_tracking_cpu_set(int cpu);

+extern void context_tracking_enter(enum ctx_state state);
+extern void context_tracking_exit(enum ctx_state state);
extern void context_tracking_user_enter(void);
extern void context_tracking_user_exit(void);
extern void __context_tracking_task_switch(struct task_struct *prev,
@@ -35,7 +37,7 @@ static inline enum ctx_state exception_enter(void)
return 0;

prev_ctx = this_cpu_read(context_tracking.state);
- context_tracking_user_exit();
+ context_tracking_exit(prev_ctx);

return prev_ctx;
}
@@ -43,8 +45,8 @@ static inline enum ctx_state exception_enter(void)
static inline void exception_exit(enum ctx_state prev_ctx)
{
if (context_tracking_is_enabled()) {
- if (prev_ctx == IN_USER)
- context_tracking_user_enter();
+ if (prev_ctx != IN_KERNEL)
+ context_tracking_enter(prev_ctx);
}
}

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 937ecdfdf258..38e38aeac8b9 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -39,15 +39,15 @@ void context_tracking_cpu_set(int cpu)
}

/**
- * context_tracking_user_enter - Inform the context tracking that the CPU is going to
- * enter userspace mode.
+ * context_tracking_enter - Inform the context tracking that the CPU is going
+ * enter user or guest space mode.
*
* This function must be called right before we switch from the kernel
- * to userspace, when it's guaranteed the remaining kernel instructions
- * to execute won't use any RCU read side critical section because this
- * function sets RCU in extended quiescent state.
+ * to user or guest space, when it's guaranteed the remaining kernel
+ * instructions to execute won't use any RCU read side critical section
+ * because this function sets RCU in extended quiescent state.
*/
-void context_tracking_user_enter(void)
+void context_tracking_enter(enum ctx_state state)
{
unsigned long flags;

@@ -75,7 +75,7 @@ void context_tracking_user_enter(void)
WARN_ON_ONCE(!current->mm);

local_irq_save(flags);
- if ( __this_cpu_read(context_tracking.state) != IN_USER) {
+ if ( __this_cpu_read(context_tracking.state) != state) {
if (__this_cpu_read(context_tracking.active)) {
trace_user_enter(0);
/*
@@ -101,24 +101,31 @@ void context_tracking_user_enter(void)
* OTOH we can spare the calls to vtime and RCU when context_tracking.active
* is false because we know that CPU is not tickless.
*/
- __this_cpu_write(context_tracking.state, IN_USER);
+ __this_cpu_write(context_tracking.state, state);
}
local_irq_restore(flags);
}
+NOKPROBE_SYMBOL(context_tracking_enter);
+
+void context_tracking_user_enter(void)
+{
+ context_tracking_enter(IN_USER);
+}
NOKPROBE_SYMBOL(context_tracking_user_enter);

/**
- * context_tracking_user_exit - Inform the context tracking that the CPU is
- * exiting userspace mode and entering the kernel.
+ * context_tracking_exit - Inform the context tracking that the CPU is
+ * exiting user or guest mode and entering the kernel.
*
- * This function must be called after we entered the kernel from userspace
- * before any use of RCU read side critical section. This potentially include
- * any high level kernel code like syscalls, exceptions, signal handling, etc...
+ * This function must be called after we entered the kernel from user or
+ * guest space before any use of RCU read side critical section. This
+ * potentially include any high level kernel code like syscalls, exceptions,
+ * signal handling, etc...
*
* This call supports re-entrancy. This way it can be called from any exception
* handler without needing to know if we came from userspace or not.
*/
-void context_tracking_user_exit(void)
+void context_tracking_exit(enum ctx_state state)
{
unsigned long flags;

@@ -129,7 +136,7 @@ void context_tracking_user_exit(void)
return;

local_irq_save(flags);
- if (__this_cpu_read(context_tracking.state) == IN_USER) {
+ if (__this_cpu_read(context_tracking.state) == state) {
if (__this_cpu_read(context_tracking.active)) {
/*
* We are going to run code that may use RCU. Inform
@@ -143,6 +150,12 @@ void context_tracking_user_exit(void)
}
local_irq_restore(flags);
}
+NOKPROBE_SYMBOL(context_tracking_exit);
+
+void context_tracking_user_exit(void)
+{
+ context_tracking_exit(IN_USER);
+}
NOKPROBE_SYMBOL(context_tracking_user_exit);

/**
--
1.9.3

2015-02-10 20:28:02

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 2/5] nohz: add stub context_tracking_is_enabled

From: Rik van Riel <[email protected]>

With code elsewhere doing something conditional on whether or not
context tracking is enabled, we want a stub function that tells us
context tracking is not enabled, when CONFIG_CONTEXT_TRACKING is
not set.

Signed-off-by: Rik van Riel <[email protected]>
---
include/linux/context_tracking_state.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
index 97a81225d037..72ab10fe1e46 100644
--- a/include/linux/context_tracking_state.h
+++ b/include/linux/context_tracking_state.h
@@ -39,6 +39,8 @@ static inline bool context_tracking_in_user(void)
#else
static inline bool context_tracking_in_user(void) { return false; }
static inline bool context_tracking_active(void) { return false; }
+static inline bool context_tracking_is_enabled(void) { return false; }
+static inline bool context_tracking_cpu_is_enabled(void) { return false; }
#endif /* CONFIG_CONTEXT_TRACKING */

#endif
--
1.9.3

2015-02-10 20:28:06

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 3/5] rcu,nohz: run vtime_user_enter/exit only when state == IN_USER

From: Rik van Riel <[email protected]>

Only run vtime_user_enter, vtime_user_exit, and the user enter & exit
trace points when we are entering or exiting user state, respectively.

The KVM code in guest_enter and guest_exit already take care of calling
vtime_guest_enter and vtime_guest_exit, respectively.

The RCU code only distinguishes between "idle" and "not idle or kernel".
There should be no need to add an additional (unused) state there.

Signed-off-by: Rik van Riel <[email protected]>
---
kernel/context_tracking.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 38e38aeac8b9..0e4e318d5ea4 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -77,7 +77,6 @@ void context_tracking_enter(enum ctx_state state)
local_irq_save(flags);
if ( __this_cpu_read(context_tracking.state) != state) {
if (__this_cpu_read(context_tracking.active)) {
- trace_user_enter(0);
/*
* At this stage, only low level arch entry code remains and
* then we'll run in userspace. We can assume there won't be
@@ -85,7 +84,10 @@ void context_tracking_enter(enum ctx_state state)
* user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
* on the tick.
*/
- vtime_user_enter(current);
+ if (state == IN_USER) {
+ trace_user_enter(0);
+ vtime_user_enter(current);
+ }
rcu_user_enter();
}
/*
@@ -143,8 +145,10 @@ void context_tracking_exit(enum ctx_state state)
* RCU core about that (ie: we may need the tick again).
*/
rcu_user_exit();
- vtime_user_exit(current);
- trace_user_exit(0);
+ if (state == IN_USER) {
+ vtime_user_exit(current);
+ trace_user_exit(0);
+ }
}
__this_cpu_write(context_tracking.state, IN_KERNEL);
}
--
1.9.3

2015-02-10 20:28:00

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 4/5] nohz,kvm: export context_tracking_user_enter/exit

From: Rik van Riel <[email protected]>

Export context_tracking_user_enter/exit so it can be used by KVM.

Signed-off-by: Rik van Riel <[email protected]>
---
kernel/context_tracking.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
index 0e4e318d5ea4..5bdf1a342ab3 100644
--- a/kernel/context_tracking.c
+++ b/kernel/context_tracking.c
@@ -108,6 +108,7 @@ void context_tracking_enter(enum ctx_state state)
local_irq_restore(flags);
}
NOKPROBE_SYMBOL(context_tracking_enter);
+EXPORT_SYMBOL_GPL(context_tracking_enter);

void context_tracking_user_enter(void)
{
@@ -155,6 +156,7 @@ void context_tracking_exit(enum ctx_state state)
local_irq_restore(flags);
}
NOKPROBE_SYMBOL(context_tracking_exit);
+EXPORT_SYMBOL_GPL(context_tracking_exit);

void context_tracking_user_exit(void)
{
--
1.9.3

2015-02-10 20:28:04

by Rik van Riel

[permalink] [raw]

Subject: [PATCH 5/5] kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest

From: Rik van Riel <[email protected]>

The host kernel is not doing anything while the CPU is executing
a KVM guest VCPU, so it can be marked as being in an extended
quiescent state, identical to that used when running user space
code.

The only exception to that rule is when the host handles an
interrupt, which is already handled by the irq code, which
calls rcu_irq_enter and rcu_irq_exit.

The guest_enter and guest_exit functions already switch vtime
accounting independent of context tracking. Leave those calls
where they are, instead of moving them into the context tracking
code.

Signed-off-by: Rik van Riel <[email protected]>
---
include/linux/context_tracking.h | 6 ++++++
include/linux/context_tracking_state.h | 1 +
include/linux/kvm_host.h | 3 ++-
3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index 954253283709..b65fd1420e53 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -80,10 +80,16 @@ static inline void guest_enter(void)
vtime_guest_enter(current);
else
current->flags |= PF_VCPU;
+
+ if (context_tracking_is_enabled())
+ context_tracking_enter(IN_GUEST);
}

static inline void guest_exit(void)
{
+ if (context_tracking_is_enabled())
+ context_tracking_exit(IN_GUEST);
+
if (vtime_accounting_enabled())
vtime_guest_exit(current);
else
diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
index 72ab10fe1e46..90a7bab8779e 100644
--- a/include/linux/context_tracking_state.h
+++ b/include/linux/context_tracking_state.h
@@ -15,6 +15,7 @@ struct context_tracking {
enum ctx_state {
IN_KERNEL = 0,
IN_USER,
+ IN_GUEST,
} state;
};

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 26f106022c88..c7828a6a9614 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -772,7 +772,8 @@ static inline void kvm_guest_enter(void)
* one time slice). Lets treat guest mode as quiescent state, just like
* we do with user-mode execution.
*/
- rcu_virt_note_context_switch(smp_processor_id());
+ if (!context_tracking_cpu_is_enabled())
+ rcu_virt_note_context_switch(smp_processor_id());
}

static inline void kvm_guest_exit(void)
--
1.9.3

2015-02-10 21:28:32

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 1/5] context_tracking: generalize context tracking APIs to support user and guest

On Tue, Feb 10, 2015 at 03:27:50PM -0500, [email protected] wrote:
> From: Rik van Riel <[email protected]>
>
> Split out the mechanism from context_tracking_user_enter and
> context_tracking_user_exit into context_tracking_enter and
> context_tracking_exit. Leave the old functions in order to avoid
> breaking ARM, which calls these functions from assembler code,
> and cannot easily use C enum parameters.
>
> Add the expected ctx_state as a parameter to context_tracking_enter and
> context_tracking_exit, allowing the same functions to not just track
> kernel <> user space switching, but also kernel <> guest transitions.
>
> Signed-off-by: Rik van Riel <[email protected]>

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> include/linux/context_tracking.h | 8 +++++---
> kernel/context_tracking.c | 43 ++++++++++++++++++++++++++--------------
> 2 files changed, 33 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index 37b81bd51ec0..954253283709 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -10,6 +10,8 @@
> #ifdef CONFIG_CONTEXT_TRACKING
> extern void context_tracking_cpu_set(int cpu);
>
> +extern void context_tracking_enter(enum ctx_state state);
> +extern void context_tracking_exit(enum ctx_state state);
> extern void context_tracking_user_enter(void);
> extern void context_tracking_user_exit(void);
> extern void __context_tracking_task_switch(struct task_struct *prev,
> @@ -35,7 +37,7 @@ static inline enum ctx_state exception_enter(void)
> return 0;
>
> prev_ctx = this_cpu_read(context_tracking.state);
> - context_tracking_user_exit();
> + context_tracking_exit(prev_ctx);
>
> return prev_ctx;
> }
> @@ -43,8 +45,8 @@ static inline enum ctx_state exception_enter(void)
> static inline void exception_exit(enum ctx_state prev_ctx)
> {
> if (context_tracking_is_enabled()) {
> - if (prev_ctx == IN_USER)
> - context_tracking_user_enter();
> + if (prev_ctx != IN_KERNEL)
> + context_tracking_enter(prev_ctx);
> }
> }
>
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 937ecdfdf258..38e38aeac8b9 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -39,15 +39,15 @@ void context_tracking_cpu_set(int cpu)
> }
>
> /**
> - * context_tracking_user_enter - Inform the context tracking that the CPU is going to
> - * enter userspace mode.
> + * context_tracking_enter - Inform the context tracking that the CPU is going
> + * enter user or guest space mode.
> *
> * This function must be called right before we switch from the kernel
> - * to userspace, when it's guaranteed the remaining kernel instructions
> - * to execute won't use any RCU read side critical section because this
> - * function sets RCU in extended quiescent state.
> + * to user or guest space, when it's guaranteed the remaining kernel
> + * instructions to execute won't use any RCU read side critical section
> + * because this function sets RCU in extended quiescent state.
> */
> -void context_tracking_user_enter(void)
> +void context_tracking_enter(enum ctx_state state)
> {
> unsigned long flags;
>
> @@ -75,7 +75,7 @@ void context_tracking_user_enter(void)
> WARN_ON_ONCE(!current->mm);
>
> local_irq_save(flags);
> - if ( __this_cpu_read(context_tracking.state) != IN_USER) {
> + if ( __this_cpu_read(context_tracking.state) != state) {
> if (__this_cpu_read(context_tracking.active)) {
> trace_user_enter(0);
> /*
> @@ -101,24 +101,31 @@ void context_tracking_user_enter(void)
> * OTOH we can spare the calls to vtime and RCU when context_tracking.active
> * is false because we know that CPU is not tickless.
> */
> - __this_cpu_write(context_tracking.state, IN_USER);
> + __this_cpu_write(context_tracking.state, state);
> }
> local_irq_restore(flags);
> }
> +NOKPROBE_SYMBOL(context_tracking_enter);
> +
> +void context_tracking_user_enter(void)
> +{
> + context_tracking_enter(IN_USER);
> +}
> NOKPROBE_SYMBOL(context_tracking_user_enter);
>
> /**
> - * context_tracking_user_exit - Inform the context tracking that the CPU is
> - * exiting userspace mode and entering the kernel.
> + * context_tracking_exit - Inform the context tracking that the CPU is
> + * exiting user or guest mode and entering the kernel.
> *
> - * This function must be called after we entered the kernel from userspace
> - * before any use of RCU read side critical section. This potentially include
> - * any high level kernel code like syscalls, exceptions, signal handling, etc...
> + * This function must be called after we entered the kernel from user or
> + * guest space before any use of RCU read side critical section. This
> + * potentially include any high level kernel code like syscalls, exceptions,
> + * signal handling, etc...
> *
> * This call supports re-entrancy. This way it can be called from any exception
> * handler without needing to know if we came from userspace or not.
> */
> -void context_tracking_user_exit(void)
> +void context_tracking_exit(enum ctx_state state)
> {
> unsigned long flags;
>
> @@ -129,7 +136,7 @@ void context_tracking_user_exit(void)
> return;
>
> local_irq_save(flags);
> - if (__this_cpu_read(context_tracking.state) == IN_USER) {
> + if (__this_cpu_read(context_tracking.state) == state) {
> if (__this_cpu_read(context_tracking.active)) {
> /*
> * We are going to run code that may use RCU. Inform
> @@ -143,6 +150,12 @@ void context_tracking_user_exit(void)
> }
> local_irq_restore(flags);
> }
> +NOKPROBE_SYMBOL(context_tracking_exit);
> +
> +void context_tracking_user_exit(void)
> +{
> + context_tracking_exit(IN_USER);
> +}
> NOKPROBE_SYMBOL(context_tracking_user_exit);
>
> /**
> --
> 1.9.3
>

2015-02-10 21:29:39

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 2/5] nohz: add stub context_tracking_is_enabled

On Tue, Feb 10, 2015 at 03:27:51PM -0500, [email protected] wrote:
> From: Rik van Riel <[email protected]>
>
> With code elsewhere doing something conditional on whether or not
> context tracking is enabled, we want a stub function that tells us
> context tracking is not enabled, when CONFIG_CONTEXT_TRACKING is
> not set.
>
> Signed-off-by: Rik van Riel <[email protected]>

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> include/linux/context_tracking_state.h | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
> index 97a81225d037..72ab10fe1e46 100644
> --- a/include/linux/context_tracking_state.h
> +++ b/include/linux/context_tracking_state.h
> @@ -39,6 +39,8 @@ static inline bool context_tracking_in_user(void)
> #else
> static inline bool context_tracking_in_user(void) { return false; }
> static inline bool context_tracking_active(void) { return false; }
> +static inline bool context_tracking_is_enabled(void) { return false; }
> +static inline bool context_tracking_cpu_is_enabled(void) { return false; }
> #endif /* CONFIG_CONTEXT_TRACKING */
>
> #endif
> --
> 1.9.3
>

2015-02-10 21:36:06

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 3/5] rcu,nohz: run vtime_user_enter/exit only when state == IN_USER

On Tue, Feb 10, 2015 at 03:27:52PM -0500, [email protected] wrote:
> From: Rik van Riel <[email protected]>
>
> Only run vtime_user_enter, vtime_user_exit, and the user enter & exit
> trace points when we are entering or exiting user state, respectively.
>
> The KVM code in guest_enter and guest_exit already take care of calling
> vtime_guest_enter and vtime_guest_exit, respectively.
>
> The RCU code only distinguishes between "idle" and "not idle or kernel".
> There should be no need to add an additional (unused) state there.
>
> Signed-off-by: Rik van Riel <[email protected]>

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> kernel/context_tracking.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 38e38aeac8b9..0e4e318d5ea4 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -77,7 +77,6 @@ void context_tracking_enter(enum ctx_state state)
> local_irq_save(flags);
> if ( __this_cpu_read(context_tracking.state) != state) {
> if (__this_cpu_read(context_tracking.active)) {
> - trace_user_enter(0);
> /*
> * At this stage, only low level arch entry code remains and
> * then we'll run in userspace. We can assume there won't be
> @@ -85,7 +84,10 @@ void context_tracking_enter(enum ctx_state state)
> * user_exit() or rcu_irq_enter(). Let's remove RCU's dependency
> * on the tick.
> */
> - vtime_user_enter(current);
> + if (state == IN_USER) {
> + trace_user_enter(0);
> + vtime_user_enter(current);
> + }
> rcu_user_enter();
> }
> /*
> @@ -143,8 +145,10 @@ void context_tracking_exit(enum ctx_state state)
> * RCU core about that (ie: we may need the tick again).
> */
> rcu_user_exit();
> - vtime_user_exit(current);
> - trace_user_exit(0);
> + if (state == IN_USER) {
> + vtime_user_exit(current);
> + trace_user_exit(0);
> + }
> }
> __this_cpu_write(context_tracking.state, IN_KERNEL);
> }
> --
> 1.9.3
>

2015-02-10 21:36:27

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 4/5] nohz,kvm: export context_tracking_user_enter/exit

On Tue, Feb 10, 2015 at 03:27:53PM -0500, [email protected] wrote:
> From: Rik van Riel <[email protected]>
>
> Export context_tracking_user_enter/exit so it can be used by KVM.
>
> Signed-off-by: Rik van Riel <[email protected]>

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> kernel/context_tracking.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c
> index 0e4e318d5ea4..5bdf1a342ab3 100644
> --- a/kernel/context_tracking.c
> +++ b/kernel/context_tracking.c
> @@ -108,6 +108,7 @@ void context_tracking_enter(enum ctx_state state)
> local_irq_restore(flags);
> }
> NOKPROBE_SYMBOL(context_tracking_enter);
> +EXPORT_SYMBOL_GPL(context_tracking_enter);
>
> void context_tracking_user_enter(void)
> {
> @@ -155,6 +156,7 @@ void context_tracking_exit(enum ctx_state state)
> local_irq_restore(flags);
> }
> NOKPROBE_SYMBOL(context_tracking_exit);
> +EXPORT_SYMBOL_GPL(context_tracking_exit);
>
> void context_tracking_user_exit(void)
> {
> --
> 1.9.3
>

2015-02-10 21:42:15

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH 5/5] kvm,rcu,nohz: use RCU extended quiescent state when running KVM guest

On Tue, Feb 10, 2015 at 03:27:54PM -0500, [email protected] wrote:
> From: Rik van Riel <[email protected]>
>
> The host kernel is not doing anything while the CPU is executing
> a KVM guest VCPU, so it can be marked as being in an extended
> quiescent state, identical to that used when running user space
> code.
>
> The only exception to that rule is when the host handles an
> interrupt, which is already handled by the irq code, which
> calls rcu_irq_enter and rcu_irq_exit.
>
> The guest_enter and guest_exit functions already switch vtime
> accounting independent of context tracking. Leave those calls
> where they are, instead of moving them into the context tracking
> code.
>
> Signed-off-by: Rik van Riel <[email protected]>

The checks for context_tracking_is_enabled() around the calls to
context_tracking_enter() and context_tracking_exit() (as opposed to
only within these functions) looked strange at first, but it avoids
a needless unconditional call in cases where the static_key disables
context tracking. (This is due to separate compilation.)

So...

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> include/linux/context_tracking.h | 6 ++++++
> include/linux/context_tracking_state.h | 1 +
> include/linux/kvm_host.h | 3 ++-
> 3 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index 954253283709..b65fd1420e53 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -80,10 +80,16 @@ static inline void guest_enter(void)
> vtime_guest_enter(current);
> else
> current->flags |= PF_VCPU;
> +
> + if (context_tracking_is_enabled())
> + context_tracking_enter(IN_GUEST);
> }
>
> static inline void guest_exit(void)
> {
> + if (context_tracking_is_enabled())
> + context_tracking_exit(IN_GUEST);
> +
> if (vtime_accounting_enabled())
> vtime_guest_exit(current);
> else
> diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h
> index 72ab10fe1e46..90a7bab8779e 100644
> --- a/include/linux/context_tracking_state.h
> +++ b/include/linux/context_tracking_state.h
> @@ -15,6 +15,7 @@ struct context_tracking {
> enum ctx_state {
> IN_KERNEL = 0,
> IN_USER,
> + IN_GUEST,
> } state;
> };
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 26f106022c88..c7828a6a9614 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -772,7 +772,8 @@ static inline void kvm_guest_enter(void)
> * one time slice). Lets treat guest mode as quiescent state, just like
> * we do with user-mode execution.
> */
> - rcu_virt_note_context_switch(smp_processor_id());
> + if (!context_tracking_cpu_is_enabled())
> + rcu_virt_note_context_switch(smp_processor_id());
> }
>
> static inline void kvm_guest_exit(void)
> --
> 1.9.3
>

2015-02-11 19:43:34

by Rik van Riel

[permalink] [raw]

Subject: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

If exception_enter happens when already in IN_KERNEL state, the
code still calls context_tracking_exit, which ends up in
rcu_eqs_exit_common, which explodes with a WARN_ON when it is
called in a situation where dynticks are not enabled.

This can be avoided by having exception_enter only switch to
IN_KERNEL state if the current state is not already IN_KERNEL.

Signed-off-by: Rik van Riel <[email protected]>
Reported-by: Luiz Capitulino <[email protected]>
---
Frederic, you will want this "bonus" patch, too :)

Thanks to Luiz for finding this one. Whatever I was running did not
trigger this issue...

include/linux/context_tracking.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
index b65fd1420e53..9da230406e8c 100644
--- a/include/linux/context_tracking.h
+++ b/include/linux/context_tracking.h
@@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void)
return 0;

prev_ctx = this_cpu_read(context_tracking.state);
- context_tracking_exit(prev_ctx);
+ if (prev_ctx != IN_KERNEL)
+ context_tracking_exit(prev_ctx);

return prev_ctx;
}

2015-02-11 21:27:31

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
> If exception_enter happens when already in IN_KERNEL state, the
> code still calls context_tracking_exit, which ends up in
> rcu_eqs_exit_common, which explodes with a WARN_ON when it is
> called in a situation where dynticks are not enabled.
>
> This can be avoided by having exception_enter only switch to
> IN_KERNEL state if the current state is not already IN_KERNEL.

Ugh... Time to formally verify, sounds like...

Thanx, Paul

> Signed-off-by: Rik van Riel <[email protected]>
> Reported-by: Luiz Capitulino <[email protected]>
> ---
> Frederic, you will want this "bonus" patch, too :)
>
> Thanks to Luiz for finding this one. Whatever I was running did not
> trigger this issue...
>
> include/linux/context_tracking.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index b65fd1420e53..9da230406e8c 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void)
> return 0;
>
> prev_ctx = this_cpu_read(context_tracking.state);
> - context_tracking_exit(prev_ctx);
> + if (prev_ctx != IN_KERNEL)
> + context_tracking_exit(prev_ctx);
>
> return prev_ctx;
> }
>

2015-02-12 15:42:39

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
> If exception_enter happens when already in IN_KERNEL state, the
> code still calls context_tracking_exit, which ends up in
> rcu_eqs_exit_common, which explodes with a WARN_ON when it is
> called in a situation where dynticks are not enabled.

Fortunately context_tracking_exit() already has a current_state == IN_KERNEL
check so this shouldn't be a problem.

Meanwhile I'll still take the patch, it's better to handle that
from the caller.

Thanks.

>
> This can be avoided by having exception_enter only switch to
> IN_KERNEL state if the current state is not already IN_KERNEL.
>
> Signed-off-by: Rik van Riel <[email protected]>
> Reported-by: Luiz Capitulino <[email protected]>
> ---
> Frederic, you will want this "bonus" patch, too :)
>
> Thanks to Luiz for finding this one. Whatever I was running did not
> trigger this issue...
>
> include/linux/context_tracking.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h
> index b65fd1420e53..9da230406e8c 100644
> --- a/include/linux/context_tracking.h
> +++ b/include/linux/context_tracking.h
> @@ -37,7 +37,8 @@ static inline enum ctx_state exception_enter(void)
> return 0;
>
> prev_ctx = this_cpu_read(context_tracking.state);
> - context_tracking_exit(prev_ctx);
> + if (prev_ctx != IN_KERNEL)
> + context_tracking_exit(prev_ctx);
>
> return prev_ctx;
> }
>

2015-02-12 16:44:28

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/12/2015 10:42 AM, Frederic Weisbecker wrote:
> On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
>> If exception_enter happens when already in IN_KERNEL state, the
>> code still calls context_tracking_exit, which ends up in
>> rcu_eqs_exit_common, which explodes with a WARN_ON when it is
>> called in a situation where dynticks are not enabled.
>
> Fortunately context_tracking_exit() already has a current_state ==
> IN_KERNEL check so this shouldn't be a problem.

No, it had a hard-coded "current_state == IN_USER" check,
which is very close, but ...

... I replaced that with a state argument, and forgot to
ensure that it never gets called with state == IN_KERNEL.
This patch fixes that.

> Meanwhile I'll still take the patch, it's better to handle that
> from the caller.

Thanks.

- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU3Mr+AAoJEM553pKExN6DYNUH/2m9CtXhLdTHOEHRvxg41PCZ
/xafetUOS9cka0CNuiYpUuvfMSucoePW7YqUXqjYSIP25DsAleOh0qdep1Ob5bH+
2BqZNMwK3QDHf1+/V7nulnjVkeHtpXJm0HIZOjc06xeL+9T6ydB1vhQGIMLrGL9S
LvOstI3fseeIgglwYc2Gx7H7e99oOkxysvwMMvcMrW0cPSRAOdYxINQnfYW8A5kq
DTTXwWuJRZa4FLtP3wLpvocm5dMGDwTsDmuOk1PmXYlsTsO6H2BmCeio0euzStoJ
l+jR4x7Aq2KXES7gnMgpPw1iON3xKJ/RbXF8IC/doII8FYEV8Raxnf7hl47etBw=
=yIjW
-----END PGP SIGNATURE-----

2015-02-12 17:00:09

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

On Thu, Feb 12, 2015 at 10:47:10AM -0500, Rik van Riel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 02/12/2015 10:42 AM, Frederic Weisbecker wrote:
> > On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
> >> If exception_enter happens when already in IN_KERNEL state, the
> >> code still calls context_tracking_exit, which ends up in
> >> rcu_eqs_exit_common, which explodes with a WARN_ON when it is
> >> called in a situation where dynticks are not enabled.
> >
> > Fortunately context_tracking_exit() already has a current_state ==
> > IN_KERNEL check so this shouldn't be a problem.
>
> No, it had a hard-coded "current_state == IN_USER" check,
> which is very close, but ...
>
> ... I replaced that with a state argument, and forgot to
> ensure that it never gets called with state == IN_KERNEL.
> This patch fixes that.

Ah that's right! Well I'm going to merge this patch to 1/5 then to
avoid breaking bisection.

Thanks.

2015-02-12 17:09:18

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH -v5 0/5] rcu,nohz,kvm: use RCU extended quiescent state when running KVM guest

On Tue, Feb 10, 2015 at 03:27:49PM -0500, [email protected] wrote:
> When running a KVM guest on a system with NOHZ_FULL enabled, and the
> KVM guest running with idle=poll mode, we still get wakeups of the
> rcuos/N threads.
>
> This problem has already been solved for user space by telling the
> RCU subsystem that the CPU is in an extended quiescent state while
> running user space code.
>
> This patch series extends that code a little bit to make it usable
> to track KVM guest space, too.
>
> I tested the code by booting a KVM guest with idle=poll, on a system
> with NOHZ_FULL enabled on most CPUs, and a VCPU thread bound to a
> CPU. In a 10 second interval, rcuos/N threads on other CPUs got woken
> up several times, while the rcuos thread on the CPU running the bound
> and alwasy running VCPU thread never got woken up once.
>
> Thanks to Christian Borntraeger, Paul McKenney, Paulo Bonzini,
> Frederic Weisbecker, and Will Deacon for reviewing and improving
> earlier versions of this patch series.
>

So the patchset look good, thanks everyone. I'm applying the series
and will see anything explode.

I'll wait until the end of the merge window to push it to Ingo.

Thanks.

2015-02-12 17:49:48

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH -v5 6/5] context_tracking: fix exception_enter when already in IN_KERNEL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 02/12/2015 12:00 PM, Frederic Weisbecker wrote:
> On Thu, Feb 12, 2015 at 10:47:10AM -0500, Rik van Riel wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>
>> On 02/12/2015 10:42 AM, Frederic Weisbecker wrote:
>>> On Wed, Feb 11, 2015 at 02:43:19PM -0500, Rik van Riel wrote:
>>>> If exception_enter happens when already in IN_KERNEL state,
>>>> the code still calls context_tracking_exit, which ends up in
>>>> rcu_eqs_exit_common, which explodes with a WARN_ON when it
>>>> is called in a situation where dynticks are not enabled.
>>>
>>> Fortunately context_tracking_exit() already has a current_state
>>> == IN_KERNEL check so this shouldn't be a problem.
>>
>> No, it had a hard-coded "current_state == IN_USER" check, which
>> is very close, but ...
>>
>> ... I replaced that with a state argument, and forgot to ensure
>> that it never gets called with state == IN_KERNEL. This patch
>> fixes that.
>
> Ah that's right! Well I'm going to merge this patch to 1/5 then to
> avoid breaking bisection.

Thank you, Frederic!

- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU3OeDAAoJEM553pKExN6D7BsIAJ8CKC73jQ8T5Dqa/tlHV7Db
QFSJdpxP+7jCZwssehgpjpxCwtJ0UvGgle5OwX/POUhmagHxHmxVydOBz+xfYdBr
UuGkEl5TL+oyoMUr80Q4RTnJSZN08zi+THqiv33tyPUj6cNiycBZAuho3ELTRNOA
bRcHrMW+xd95uqoung7dSKrgA2jcym3+umNGnQb0gniraqcNLAmWs+jfAO8yZLJg
vk8bIKed6epQ3n6gcdYe0A28cLOuBvjEs5JNcEPxujY/349sjitKR2pLQ6HsfHLV
frlKsh7qQIRtoUJLO9ZBBDtGrmThwBwH8rw+GcVR8zviPNvV4IRrx47VBcHDWjc=
=mwFO
-----END PGP SIGNATURE-----