2018-03-25 17:53:51

by Yury Norov

[permalink] [raw]
Subject: [PATCH 0/2] smp: don't kick CPUs running idle or nohz_full tasks

kick_all_cpus_sync() is used to broadcast IPIs to all online CPUs to force
them sync caches, TLB etc. It is is called only 3 times - from mm/slab,
arm64 and powerpc code.

With framework introduced in patch b8c17e6664c46 ("rcu: Maintain special
bits at bottom of ->dynticks counter") we can delay synchrosization work
for CPUs in extended quiescent state (idle or nohz_full userspace).

As Paul E. McKenney wrote:

--

Currently, IPIs are used to force other CPUs to invalidate their TLBs
in response to a kernel virtual-memory mapping change. This works, but
degrades both battery lifetime (for idle CPUs) and real-time response
(for nohz_full CPUs), and in addition results in unnecessary IPIs due to
the fact that CPUs executing in usermode are unaffected by stale kernel
mappings. It would be better to cause a CPU executing in usermode to
wait until it is entering kernel mode to do the flush, first to avoid
interrupting usemode tasks and second to handle multiple flush requests
with a single flush in the case of a long-running user task.

--

For mm/slab and arm64 it looks safe to delay synchronization. This is done
in patch #2 by introducing kick_active_cpus_sync() function. For powerpc -
I'm not sure, and I'd like to ask powerpc people, is it safe to do same
also for that code? If so, we can completely drop kick_all_cpus_sync().

Yury Norov (2):
rcu: declare rcu_eqs_special_set() in public header
smp: introduce kick_active_cpus_sync()

arch/arm64/kernel/insn.c | 2 +-
include/linux/rcutree.h | 1 +
include/linux/smp.h | 2 ++
kernel/smp.c | 24 ++++++++++++++++++++++++
mm/slab.c | 2 +-
5 files changed, 29 insertions(+), 2 deletions(-)

--
2.14.1



2018-03-25 17:54:00

by Yury Norov

[permalink] [raw]
Subject: [PATCH 1/2] rcu: declare rcu_eqs_special_set() in public header

rcu_eqs_special_set() is declared only in internal header
kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h.

This patch declares rcu_eqs_special_set() in include/linux/rcutree.h, so
it can be used in non-rcu kernel code.

Signed-off-by: Yury Norov <[email protected]>
---
include/linux/rcutree.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cdf1833..448f20f27396 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -74,6 +74,7 @@ static inline void synchronize_rcu_bh_expedited(void)
void rcu_barrier(void);
void rcu_barrier_bh(void);
void rcu_barrier_sched(void);
+bool rcu_eqs_special_set(int cpu);
unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
--
2.14.1


2018-03-25 17:54:24

by Yury Norov

[permalink] [raw]
Subject: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
If CPU is in extended quiescent state (idle task or nohz_full userspace), this
work may be done at the exit of this state. Delaying synchronization helps to
save power if CPU is in idle state and decrease latency for real-time tasks.

This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
code to delay syncronization.

For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
isolated task would be fatal, as it breaks isolation. The approach with delaying
of synchronization work helps to maintain isolated state.

I've tested it with test from task isolation series on ThunderX2 for more than
10 hours (10k giga-ticks) without breaking isolation.

Signed-off-by: Yury Norov <[email protected]>
---
arch/arm64/kernel/insn.c | 2 +-
include/linux/smp.h | 2 ++
kernel/smp.c | 24 ++++++++++++++++++++++++
mm/slab.c | 2 +-
4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 2718a77da165..9d7c492e920e 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
* synchronization.
*/
ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
- kick_all_cpus_sync();
+ kick_active_cpus_sync();
return ret;
}
}
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 9fb239e12b82..27215e22240d 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
smp_call_func_t func, void *info, int wait);

void kick_all_cpus_sync(void);
+void kick_active_cpus_sync(void);
void wake_up_all_idle_cpus(void);

/*
@@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
}

static inline void kick_all_cpus_sync(void) { }
+static inline void kick_active_cpus_sync(void) { }
static inline void wake_up_all_idle_cpus(void) { }

#ifdef CONFIG_UP_LATE_INIT
diff --git a/kernel/smp.c b/kernel/smp.c
index 084c8b3a2681..0358d6673850 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
}
EXPORT_SYMBOL_GPL(kick_all_cpus_sync);

+/**
+ * kick_active_cpus_sync - Force CPUs that are not in extended
+ * quiescent state (idle or nohz_full userspace) sync by sending
+ * IPI. Extended quiescent state CPUs will sync at the exit of
+ * that state.
+ */
+void kick_active_cpus_sync(void)
+{
+ int cpu;
+ struct cpumask kernel_cpus;
+
+ smp_mb();
+
+ cpumask_clear(&kernel_cpus);
+ preempt_disable();
+ for_each_online_cpu(cpu) {
+ if (!rcu_eqs_special_set(cpu))
+ cpumask_set_cpu(cpu, &kernel_cpus);
+ }
+ smp_call_function_many(&kernel_cpus, do_nothing, NULL, 1);
+ preempt_enable();
+}
+EXPORT_SYMBOL_GPL(kick_active_cpus_sync);
+
/**
* wake_up_all_idle_cpus - break all cpus out of idle
* wake_up_all_idle_cpus try to break all cpus which is in idle state even
diff --git a/mm/slab.c b/mm/slab.c
index 324446621b3e..678d5dbd6f46 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3856,7 +3856,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
* cpus, so skip the IPIs.
*/
if (prev)
- kick_all_cpus_sync();
+ kick_active_cpus_sync();

check_irq_on();
cachep->batchcount = batchcount;
--
2.14.1


2018-03-25 19:14:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/2] rcu: declare rcu_eqs_special_set() in public header

On Sun, Mar 25, 2018 at 08:50:03PM +0300, Yury Norov wrote:
> rcu_eqs_special_set() is declared only in internal header
> kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h.
>
> This patch declares rcu_eqs_special_set() in include/linux/rcutree.h, so
> it can be used in non-rcu kernel code.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> include/linux/rcutree.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index fd996cdf1833..448f20f27396 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -74,6 +74,7 @@ static inline void synchronize_rcu_bh_expedited(void)
> void rcu_barrier(void);
> void rcu_barrier_bh(void);
> void rcu_barrier_sched(void);
> +bool rcu_eqs_special_set(int cpu);
> unsigned long get_state_synchronize_rcu(void);
> void cond_synchronize_rcu(unsigned long oldstate);
> unsigned long get_state_synchronize_sched(void);

Good point, a bit hard to use otherwise. ;-)

I removed the declaration from rcutree.h and updated the commit log as
follows. Does it look OK?

Thanx, Paul

------------------------------------------------------------------------

commit 4497105b718a819072d48a675916d9d200b5327f
Author: Yury Norov <[email protected]>
Date: Sun Mar 25 20:50:03 2018 +0300

rcu: Declare rcu_eqs_special_set() in public header

Because rcu_eqs_special_set() is declared only in internal header
kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is
inaccessible outside of the RCU implementation. This patch therefore
moves the rcu_eqs_special_set() declaration to include/linux/rcutree.h,
which allows it to be used in non-rcu kernel code.

Signed-off-by: Yury Norov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cdf1833..448f20f27396 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -74,6 +74,7 @@ static inline void synchronize_rcu_bh_expedited(void)
void rcu_barrier(void);
void rcu_barrier_bh(void);
void rcu_barrier_sched(void);
+bool rcu_eqs_special_set(int cpu);
unsigned long get_state_synchronize_rcu(void);
void cond_synchronize_rcu(unsigned long oldstate);
unsigned long get_state_synchronize_sched(void);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 59ad0e23c722..d5f617aaa744 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -415,7 +415,6 @@ extern struct rcu_state rcu_preempt_state;
#endif /* #ifdef CONFIG_PREEMPT_RCU */

int rcu_dynticks_snap(struct rcu_dynticks *rdtp);
-bool rcu_eqs_special_set(int cpu);

#ifdef CONFIG_RCU_BOOST
DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status);


2018-03-25 19:20:11

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 1/2] rcu: declare rcu_eqs_special_set() in public header

On Sun, Mar 25, 2018 at 12:12:43PM -0700, Paul E. McKenney wrote:
> On Sun, Mar 25, 2018 at 08:50:03PM +0300, Yury Norov wrote:
> > rcu_eqs_special_set() is declared only in internal header
> > kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h.
> >
> > This patch declares rcu_eqs_special_set() in include/linux/rcutree.h, so
> > it can be used in non-rcu kernel code.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > include/linux/rcutree.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> > index fd996cdf1833..448f20f27396 100644
> > --- a/include/linux/rcutree.h
> > +++ b/include/linux/rcutree.h
> > @@ -74,6 +74,7 @@ static inline void synchronize_rcu_bh_expedited(void)
> > void rcu_barrier(void);
> > void rcu_barrier_bh(void);
> > void rcu_barrier_sched(void);
> > +bool rcu_eqs_special_set(int cpu);
> > unsigned long get_state_synchronize_rcu(void);
> > void cond_synchronize_rcu(unsigned long oldstate);
> > unsigned long get_state_synchronize_sched(void);
>
> Good point, a bit hard to use otherwise. ;-)
>
> I removed the declaration from rcutree.h and updated the commit log as
> follows. Does it look OK?

Of course.

Thanks,
Yury

> ------------------------------------------------------------------------
>
> commit 4497105b718a819072d48a675916d9d200b5327f
> Author: Yury Norov <[email protected]>
> Date: Sun Mar 25 20:50:03 2018 +0300
>
> rcu: Declare rcu_eqs_special_set() in public header
>
> Because rcu_eqs_special_set() is declared only in internal header
> kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is
> inaccessible outside of the RCU implementation. This patch therefore
> moves the rcu_eqs_special_set() declaration to include/linux/rcutree.h,
> which allows it to be used in non-rcu kernel code.
>
> Signed-off-by: Yury Norov <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index fd996cdf1833..448f20f27396 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -74,6 +74,7 @@ static inline void synchronize_rcu_bh_expedited(void)
> void rcu_barrier(void);
> void rcu_barrier_bh(void);
> void rcu_barrier_sched(void);
> +bool rcu_eqs_special_set(int cpu);
> unsigned long get_state_synchronize_rcu(void);
> void cond_synchronize_rcu(unsigned long oldstate);
> unsigned long get_state_synchronize_sched(void);
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index 59ad0e23c722..d5f617aaa744 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -415,7 +415,6 @@ extern struct rcu_state rcu_preempt_state;
> #endif /* #ifdef CONFIG_PREEMPT_RCU */
>
> int rcu_dynticks_snap(struct rcu_dynticks *rdtp);
> -bool rcu_eqs_special_set(int cpu);
>
> #ifdef CONFIG_RCU_BOOST
> DECLARE_PER_CPU(unsigned int, rcu_cpu_kthread_status);

2018-03-25 19:24:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> work may be done at the exit of this state. Delaying synchronization helps to
> save power if CPU is in idle state and decrease latency for real-time tasks.
>
> This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> code to delay syncronization.
>
> For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> isolated task would be fatal, as it breaks isolation. The approach with delaying
> of synchronization work helps to maintain isolated state.
>
> I've tested it with test from task isolation series on ThunderX2 for more than
> 10 hours (10k giga-ticks) without breaking isolation.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/arm64/kernel/insn.c | 2 +-
> include/linux/smp.h | 2 ++
> kernel/smp.c | 24 ++++++++++++++++++++++++
> mm/slab.c | 2 +-
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 2718a77da165..9d7c492e920e 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> * synchronization.
> */
> ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> - kick_all_cpus_sync();
> + kick_active_cpus_sync();
> return ret;
> }
> }
> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 9fb239e12b82..27215e22240d 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> smp_call_func_t func, void *info, int wait);
>
> void kick_all_cpus_sync(void);
> +void kick_active_cpus_sync(void);
> void wake_up_all_idle_cpus(void);
>
> /*
> @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> }
>
> static inline void kick_all_cpus_sync(void) { }
> +static inline void kick_active_cpus_sync(void) { }
> static inline void wake_up_all_idle_cpus(void) { }
>
> #ifdef CONFIG_UP_LATE_INIT
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 084c8b3a2681..0358d6673850 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> }
> EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
>
> +/**
> + * kick_active_cpus_sync - Force CPUs that are not in extended
> + * quiescent state (idle or nohz_full userspace) sync by sending
> + * IPI. Extended quiescent state CPUs will sync at the exit of
> + * that state.
> + */
> +void kick_active_cpus_sync(void)
> +{
> + int cpu;
> + struct cpumask kernel_cpus;
> +
> + smp_mb();
> +
> + cpumask_clear(&kernel_cpus);
> + preempt_disable();
> + for_each_online_cpu(cpu) {
> + if (!rcu_eqs_special_set(cpu))

If we get here, the CPU is not in a quiescent state, so we therefore
must IPI it, correct?

But don't you also need to define rcu_eqs_special_exit() so that RCU
can invoke it when it next leaves its quiescent state? Or are you able
to ignore the CPU in that case? (If you are able to ignore the CPU in
that case, I could give you a lower-cost function to get your job done.)

Thanx, Paul

> + cpumask_set_cpu(cpu, &kernel_cpus);
> + }
> + smp_call_function_many(&kernel_cpus, do_nothing, NULL, 1);
> + preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(kick_active_cpus_sync);
> +
> /**
> * wake_up_all_idle_cpus - break all cpus out of idle
> * wake_up_all_idle_cpus try to break all cpus which is in idle state even
> diff --git a/mm/slab.c b/mm/slab.c
> index 324446621b3e..678d5dbd6f46 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3856,7 +3856,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
> * cpus, so skip the IPIs.
> */
> if (prev)
> - kick_all_cpus_sync();
> + kick_active_cpus_sync();
>
> check_irq_on();
> cachep->batchcount = batchcount;
> --
> 2.14.1
>


2018-03-25 20:15:58

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Sun, Mar 25, 2018 at 12:23:28PM -0700, Paul E. McKenney wrote:
> On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > work may be done at the exit of this state. Delaying synchronization helps to
> > save power if CPU is in idle state and decrease latency for real-time tasks.
> >
> > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > code to delay syncronization.
> >
> > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > of synchronization work helps to maintain isolated state.
> >
> > I've tested it with test from task isolation series on ThunderX2 for more than
> > 10 hours (10k giga-ticks) without breaking isolation.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > arch/arm64/kernel/insn.c | 2 +-
> > include/linux/smp.h | 2 ++
> > kernel/smp.c | 24 ++++++++++++++++++++++++
> > mm/slab.c | 2 +-
> > 4 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 2718a77da165..9d7c492e920e 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > * synchronization.
> > */
> > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > - kick_all_cpus_sync();
> > + kick_active_cpus_sync();
> > return ret;
> > }
> > }
> > diff --git a/include/linux/smp.h b/include/linux/smp.h
> > index 9fb239e12b82..27215e22240d 100644
> > --- a/include/linux/smp.h
> > +++ b/include/linux/smp.h
> > @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> > smp_call_func_t func, void *info, int wait);
> >
> > void kick_all_cpus_sync(void);
> > +void kick_active_cpus_sync(void);
> > void wake_up_all_idle_cpus(void);
> >
> > /*
> > @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> > }
> >
> > static inline void kick_all_cpus_sync(void) { }
> > +static inline void kick_active_cpus_sync(void) { }
> > static inline void wake_up_all_idle_cpus(void) { }
> >
> > #ifdef CONFIG_UP_LATE_INIT
> > diff --git a/kernel/smp.c b/kernel/smp.c
> > index 084c8b3a2681..0358d6673850 100644
> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > }
> > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> >
> > +/**
> > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > + * quiescent state (idle or nohz_full userspace) sync by sending
> > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > + * that state.
> > + */
> > +void kick_active_cpus_sync(void)
> > +{
> > + int cpu;
> > + struct cpumask kernel_cpus;
> > +
> > + smp_mb();
> > +
> > + cpumask_clear(&kernel_cpus);
> > + preempt_disable();
> > + for_each_online_cpu(cpu) {
> > + if (!rcu_eqs_special_set(cpu))
>
> If we get here, the CPU is not in a quiescent state, so we therefore
> must IPI it, correct?
>
> But don't you also need to define rcu_eqs_special_exit() so that RCU
> can invoke it when it next leaves its quiescent state? Or are you able
> to ignore the CPU in that case? (If you are able to ignore the CPU in
> that case, I could give you a lower-cost function to get your job done.)
>
> Thanx, Paul

What's actually needed for synchronization is issuing memory barrier on target
CPUs before we start executing kernel code.

smp_mb() is implicitly called in smp_call_function*() path for it. In
rcu_eqs_special_set() -> rcu_dynticks_eqs_exit() path, smp_mb__after_atomic()
is called just before rcu_eqs_special_exit().

So I think, rcu_eqs_special_exit() may be left untouched. Empty
rcu_eqs_special_exit() in new RCU path corresponds empty do_nothing() in old
IPI path.

Or my understanding of smp_mb__after_atomic() is wrong? By default,
smp_mb__after_atomic() is just alias to smp_mb(). But some
architectures define it differently. x86, for example, aliases it to
just barrier() with a comment: "Atomic operations are already
serializing on x86".

I was initially thinking that it's also fine to leave
rcu_eqs_special_exit() empty in this case, but now I'm not sure...

Anyway, answering to your question, we shouldn't ignore quiescent
CPUs, and rcu_eqs_special_set() path is really needed as it issues
memory barrier on them.

Yury

> > + cpumask_set_cpu(cpu, &kernel_cpus);
> > + }
> > + smp_call_function_many(&kernel_cpus, do_nothing, NULL, 1);
> > + preempt_enable();
> > +}
> > +EXPORT_SYMBOL_GPL(kick_active_cpus_sync);
> > +
> > /**
> > * wake_up_all_idle_cpus - break all cpus out of idle
> > * wake_up_all_idle_cpus try to break all cpus which is in idle state even
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 324446621b3e..678d5dbd6f46 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3856,7 +3856,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
> > * cpus, so skip the IPIs.
> > */
> > if (prev)
> > - kick_all_cpus_sync();
> > + kick_active_cpus_sync();
> >
> > check_irq_on();
> > cachep->batchcount = batchcount;
> > --
> > 2.14.1
> >

2018-03-26 08:54:40

by Andrea Parri

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

Hi Yury,

On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> work may be done at the exit of this state. Delaying synchronization helps to
> save power if CPU is in idle state and decrease latency for real-time tasks.
>
> This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> code to delay syncronization.
>
> For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> isolated task would be fatal, as it breaks isolation. The approach with delaying
> of synchronization work helps to maintain isolated state.
>
> I've tested it with test from task isolation series on ThunderX2 for more than
> 10 hours (10k giga-ticks) without breaking isolation.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/arm64/kernel/insn.c | 2 +-
> include/linux/smp.h | 2 ++
> kernel/smp.c | 24 ++++++++++++++++++++++++
> mm/slab.c | 2 +-
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 2718a77da165..9d7c492e920e 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> * synchronization.
> */
> ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> - kick_all_cpus_sync();
> + kick_active_cpus_sync();
> return ret;
> }
> }
> diff --git a/include/linux/smp.h b/include/linux/smp.h
> index 9fb239e12b82..27215e22240d 100644
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> smp_call_func_t func, void *info, int wait);
>
> void kick_all_cpus_sync(void);
> +void kick_active_cpus_sync(void);
> void wake_up_all_idle_cpus(void);
>
> /*
> @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> }
>
> static inline void kick_all_cpus_sync(void) { }
> +static inline void kick_active_cpus_sync(void) { }
> static inline void wake_up_all_idle_cpus(void) { }
>
> #ifdef CONFIG_UP_LATE_INIT
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 084c8b3a2681..0358d6673850 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> }
> EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
>
> +/**
> + * kick_active_cpus_sync - Force CPUs that are not in extended
> + * quiescent state (idle or nohz_full userspace) sync by sending
> + * IPI. Extended quiescent state CPUs will sync at the exit of
> + * that state.
> + */
> +void kick_active_cpus_sync(void)
> +{
> + int cpu;
> + struct cpumask kernel_cpus;
> +
> + smp_mb();

(A general remark only:)

checkpatch.pl should have warned about the fact that this barrier is
missing an accompanying comment (which accesses are being "ordered",
what is the pairing barrier, etc.).

Moreover if, as your reply above suggested, your patch is relying on
"implicit barriers" (something I would not recommend) then even more
so you should comment on these requirements.

This could: (a) force you to reason about the memory ordering stuff,
(b) easy the task of reviewing and adopting your patch, (c) easy the
task of preserving those requirements (as implementations changes).

Andrea


> +
> + cpumask_clear(&kernel_cpus);
> + preempt_disable();
> + for_each_online_cpu(cpu) {
> + if (!rcu_eqs_special_set(cpu))
> + cpumask_set_cpu(cpu, &kernel_cpus);
> + }
> + smp_call_function_many(&kernel_cpus, do_nothing, NULL, 1);
> + preempt_enable();
> +}
> +EXPORT_SYMBOL_GPL(kick_active_cpus_sync);
> +
> /**
> * wake_up_all_idle_cpus - break all cpus out of idle
> * wake_up_all_idle_cpus try to break all cpus which is in idle state even
> diff --git a/mm/slab.c b/mm/slab.c
> index 324446621b3e..678d5dbd6f46 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3856,7 +3856,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
> * cpus, so skip the IPIs.
> */
> if (prev)
> - kick_all_cpus_sync();
> + kick_active_cpus_sync();
>
> check_irq_on();
> cachep->batchcount = batchcount;
> --
> 2.14.1
>

2018-03-26 12:46:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Sun, Mar 25, 2018 at 11:11:54PM +0300, Yury Norov wrote:
> On Sun, Mar 25, 2018 at 12:23:28PM -0700, Paul E. McKenney wrote:
> > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > > work may be done at the exit of this state. Delaying synchronization helps to
> > > save power if CPU is in idle state and decrease latency for real-time tasks.
> > >
> > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > > code to delay syncronization.
> > >
> > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > > of synchronization work helps to maintain isolated state.
> > >
> > > I've tested it with test from task isolation series on ThunderX2 for more than
> > > 10 hours (10k giga-ticks) without breaking isolation.
> > >
> > > Signed-off-by: Yury Norov <[email protected]>
> > > ---
> > > arch/arm64/kernel/insn.c | 2 +-
> > > include/linux/smp.h | 2 ++
> > > kernel/smp.c | 24 ++++++++++++++++++++++++
> > > mm/slab.c | 2 +-
> > > 4 files changed, 28 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > index 2718a77da165..9d7c492e920e 100644
> > > --- a/arch/arm64/kernel/insn.c
> > > +++ b/arch/arm64/kernel/insn.c
> > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > > * synchronization.
> > > */
> > > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > > - kick_all_cpus_sync();
> > > + kick_active_cpus_sync();
> > > return ret;
> > > }
> > > }
> > > diff --git a/include/linux/smp.h b/include/linux/smp.h
> > > index 9fb239e12b82..27215e22240d 100644
> > > --- a/include/linux/smp.h
> > > +++ b/include/linux/smp.h
> > > @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> > > smp_call_func_t func, void *info, int wait);
> > >
> > > void kick_all_cpus_sync(void);
> > > +void kick_active_cpus_sync(void);
> > > void wake_up_all_idle_cpus(void);
> > >
> > > /*
> > > @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> > > }
> > >
> > > static inline void kick_all_cpus_sync(void) { }
> > > +static inline void kick_active_cpus_sync(void) { }
> > > static inline void wake_up_all_idle_cpus(void) { }
> > >
> > > #ifdef CONFIG_UP_LATE_INIT
> > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > index 084c8b3a2681..0358d6673850 100644
> > > --- a/kernel/smp.c
> > > +++ b/kernel/smp.c
> > > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > > }
> > > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> > >
> > > +/**
> > > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > > + * quiescent state (idle or nohz_full userspace) sync by sending
> > > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > > + * that state.
> > > + */
> > > +void kick_active_cpus_sync(void)
> > > +{
> > > + int cpu;
> > > + struct cpumask kernel_cpus;
> > > +
> > > + smp_mb();
> > > +
> > > + cpumask_clear(&kernel_cpus);
> > > + preempt_disable();
> > > + for_each_online_cpu(cpu) {
> > > + if (!rcu_eqs_special_set(cpu))
> >
> > If we get here, the CPU is not in a quiescent state, so we therefore
> > must IPI it, correct?
> >
> > But don't you also need to define rcu_eqs_special_exit() so that RCU
> > can invoke it when it next leaves its quiescent state? Or are you able
> > to ignore the CPU in that case? (If you are able to ignore the CPU in
> > that case, I could give you a lower-cost function to get your job done.)
> >
> > Thanx, Paul
>
> What's actually needed for synchronization is issuing memory barrier on target
> CPUs before we start executing kernel code.
>
> smp_mb() is implicitly called in smp_call_function*() path for it. In
> rcu_eqs_special_set() -> rcu_dynticks_eqs_exit() path, smp_mb__after_atomic()
> is called just before rcu_eqs_special_exit().
>
> So I think, rcu_eqs_special_exit() may be left untouched. Empty
> rcu_eqs_special_exit() in new RCU path corresponds empty do_nothing() in old
> IPI path.
>
> Or my understanding of smp_mb__after_atomic() is wrong? By default,
> smp_mb__after_atomic() is just alias to smp_mb(). But some
> architectures define it differently. x86, for example, aliases it to
> just barrier() with a comment: "Atomic operations are already
> serializing on x86".
>
> I was initially thinking that it's also fine to leave
> rcu_eqs_special_exit() empty in this case, but now I'm not sure...
>
> Anyway, answering to your question, we shouldn't ignore quiescent
> CPUs, and rcu_eqs_special_set() path is really needed as it issues
> memory barrier on them.

An alternative approach would be for me to make something like this
and export it:

bool rcu_cpu_in_eqs(int cpu)
{
struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
int snap;

smp_mb(); /* Obtain consistent snapshot, pairs with update. */
snap = READ_ONCE(&rdtp->dynticks);
smp_mb(); /* See above. */
return !(snap & RCU_DYNTICK_CTRL_CTR);
}

Then you could replace your use of rcu_cpu_in_eqs() above with
the new rcu_cpu_in_eqs(). This would avoid the RMW atomic, and, more
important, the unnecessary write to ->dynticks.

Or am I missing something?

Thanx, Paul

> Yury
>
> > > + cpumask_set_cpu(cpu, &kernel_cpus);
> > > + }
> > > + smp_call_function_many(&kernel_cpus, do_nothing, NULL, 1);
> > > + preempt_enable();
> > > +}
> > > +EXPORT_SYMBOL_GPL(kick_active_cpus_sync);
> > > +
> > > /**
> > > * wake_up_all_idle_cpus - break all cpus out of idle
> > > * wake_up_all_idle_cpus try to break all cpus which is in idle state even
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index 324446621b3e..678d5dbd6f46 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -3856,7 +3856,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
> > > * cpus, so skip the IPIs.
> > > */
> > > if (prev)
> > > - kick_all_cpus_sync();
> > > + kick_active_cpus_sync();
> > >
> > > check_irq_on();
> > > cachep->batchcount = batchcount;
> > > --
> > > 2.14.1
> > >
>


2018-03-26 18:58:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Mon, 26 Mar 2018 10:53:13 +0200
Andrea Parri <[email protected]> wrote:

> > --- a/kernel/smp.c
> > +++ b/kernel/smp.c
> > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > }
> > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> >
> > +/**
> > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > + * quiescent state (idle or nohz_full userspace) sync by sending
> > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > + * that state.
> > + */
> > +void kick_active_cpus_sync(void)
> > +{
> > + int cpu;
> > + struct cpumask kernel_cpus;
> > +
> > + smp_mb();
>
> (A general remark only:)
>
> checkpatch.pl should have warned about the fact that this barrier is
> missing an accompanying comment (which accesses are being "ordered",
> what is the pairing barrier, etc.).

He could have simply copied the comment above the smp_mb() for
kick_all_cpus_sync():

/* Make sure the change is visible before we kick the cpus */

The kick itself is pretty much a synchronization primitive.

That is, you make some changes and then you need all CPUs to see it,
and you call: kick_active_cpus_synch(), which is the barrier to make
sure you previous changes are seen on all CPUS before you proceed
further. Note, the matching barrier is implicit in the IPI itself.

-- Steve


>
> Moreover if, as your reply above suggested, your patch is relying on
> "implicit barriers" (something I would not recommend) then even more
> so you should comment on these requirements.
>
> This could: (a) force you to reason about the memory ordering stuff,
> (b) easy the task of reviewing and adopting your patch, (c) easy the
> task of preserving those requirements (as implementations changes).
>
> Andrea
>
>
> > +
> > + cpumask_clear(&kernel_cpus);
> > + preempt_disable();
> > + for_each_online_cpu(cpu) {
> > + if (!rcu_eqs_special_set(cpu))
> > + cpumask_set_cpu(cpu, &kernel_cpus);
> > + }
> > + smp_call_function_many(&kernel_cpus, do_nothing, NULL, 1);
> > + preempt_enable();
> > +}
> > +EXPORT_SYMBOL_GPL(kick_active_cpus_sync);
> > +
> > /**
> > * wake_up_all_idle_cpus - break all cpus out of idle
> > * wake_up_all_idle_cpus try to break all cpus which is in idle state even
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 324446621b3e..678d5dbd6f46 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3856,7 +3856,7 @@ static int __do_tune_cpucache(struct kmem_cache *cachep, int limit,
> > * cpus, so skip the IPIs.
> > */
> > if (prev)
> > - kick_all_cpus_sync();
> > + kick_active_cpus_sync();
> >
> > check_irq_on();
> > cachep->batchcount = batchcount;
> > --
> > 2.14.1
> >


2018-03-27 10:31:51

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> work may be done at the exit of this state. Delaying synchronization helps to
> save power if CPU is in idle state and decrease latency for real-time tasks.
>
> This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> code to delay syncronization.
>
> For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> isolated task would be fatal, as it breaks isolation. The approach with delaying
> of synchronization work helps to maintain isolated state.
>
> I've tested it with test from task isolation series on ThunderX2 for more than
> 10 hours (10k giga-ticks) without breaking isolation.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> arch/arm64/kernel/insn.c | 2 +-
> include/linux/smp.h | 2 ++
> kernel/smp.c | 24 ++++++++++++++++++++++++
> mm/slab.c | 2 +-
> 4 files changed, 28 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 2718a77da165..9d7c492e920e 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> * synchronization.
> */
> ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> - kick_all_cpus_sync();
> + kick_active_cpus_sync();
> return ret;
> }
> }

I think this means that runtime modifications to the kernel text might not
be picked up by CPUs coming out of idle. Shouldn't we add an ISB on that
path to avoid executing stale instructions?

Will

2018-03-28 10:59:49

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Tue, Mar 27, 2018 at 11:21:17AM +0100, Will Deacon wrote:
> On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > work may be done at the exit of this state. Delaying synchronization helps to
> > save power if CPU is in idle state and decrease latency for real-time tasks.
> >
> > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > code to delay syncronization.
> >
> > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > of synchronization work helps to maintain isolated state.
> >
> > I've tested it with test from task isolation series on ThunderX2 for more than
> > 10 hours (10k giga-ticks) without breaking isolation.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > arch/arm64/kernel/insn.c | 2 +-
> > include/linux/smp.h | 2 ++
> > kernel/smp.c | 24 ++++++++++++++++++++++++
> > mm/slab.c | 2 +-
> > 4 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 2718a77da165..9d7c492e920e 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > * synchronization.
> > */
> > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > - kick_all_cpus_sync();
> > + kick_active_cpus_sync();
> > return ret;
> > }
> > }
>
> I think this means that runtime modifications to the kernel text might not
> be picked up by CPUs coming out of idle. Shouldn't we add an ISB on that
> path to avoid executing stale instructions?

Thanks, Will, for the hint. I'll do that.

Yury

2018-03-28 13:01:21

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Mon, Mar 26, 2018 at 02:57:35PM -0400, Steven Rostedt wrote:
> On Mon, 26 Mar 2018 10:53:13 +0200
> Andrea Parri <[email protected]> wrote:
>
> > > --- a/kernel/smp.c
> > > +++ b/kernel/smp.c
> > > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > > }
> > > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> > >
> > > +/**
> > > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > > + * quiescent state (idle or nohz_full userspace) sync by sending
> > > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > > + * that state.
> > > + */
> > > +void kick_active_cpus_sync(void)
> > > +{
> > > + int cpu;
> > > + struct cpumask kernel_cpus;
> > > +
> > > + smp_mb();
> >
> > (A general remark only:)
> >
> > checkpatch.pl should have warned about the fact that this barrier is
> > missing an accompanying comment (which accesses are being "ordered",
> > what is the pairing barrier, etc.).
>
> He could have simply copied the comment above the smp_mb() for
> kick_all_cpus_sync():
>
> /* Make sure the change is visible before we kick the cpus */
>
> The kick itself is pretty much a synchronization primitive.
>
> That is, you make some changes and then you need all CPUs to see it,
> and you call: kick_active_cpus_synch(), which is the barrier to make
> sure you previous changes are seen on all CPUS before you proceed
> further. Note, the matching barrier is implicit in the IPI itself.
>
> -- Steve

I know that I had to copy the comment from kick_all_cpus_sync(), but I
don't like copy-pasting in general, and as Steven told, this smp_mb() is
already inside synchronization routine, so we may hope that users of
kick_*_cpus_sync() will explain better what for they need it...

>
> >
> > Moreover if, as your reply above suggested, your patch is relying on
> > "implicit barriers" (something I would not recommend) then even more
> > so you should comment on these requirements.
> >
> > This could: (a) force you to reason about the memory ordering stuff,
> > (b) easy the task of reviewing and adopting your patch, (c) easy the
> > task of preserving those requirements (as implementations changes).
> >
> > Andrea

I need v2 anyway, and I will add comments to address all questions in this
thread.

I also hope that we'll agree that for powerpc it's also safe to delay
synchronization, and if so, we will have no users of kick_all_cpus_sync(),
and can drop it.

(It looks like this, because nohz_full userspace CPU cannot have pending
IPIs, but I'd like to get confirmation from powerpc people.)

Would it make sense to rename kick_all_cpus_sync() to smp_mb_sync(), which
would stand for 'synchronous memory barrier on all online CPUs'?

Yury

2018-03-28 13:38:29

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Mon, Mar 26, 2018 at 05:45:55AM -0700, Paul E. McKenney wrote:
> On Sun, Mar 25, 2018 at 11:11:54PM +0300, Yury Norov wrote:
> > On Sun, Mar 25, 2018 at 12:23:28PM -0700, Paul E. McKenney wrote:
> > > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > > > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > > > work may be done at the exit of this state. Delaying synchronization helps to
> > > > save power if CPU is in idle state and decrease latency for real-time tasks.
> > > >
> > > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > > > code to delay syncronization.
> > > >
> > > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > > > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > > > of synchronization work helps to maintain isolated state.
> > > >
> > > > I've tested it with test from task isolation series on ThunderX2 for more than
> > > > 10 hours (10k giga-ticks) without breaking isolation.
> > > >
> > > > Signed-off-by: Yury Norov <[email protected]>
> > > > ---
> > > > arch/arm64/kernel/insn.c | 2 +-
> > > > include/linux/smp.h | 2 ++
> > > > kernel/smp.c | 24 ++++++++++++++++++++++++
> > > > mm/slab.c | 2 +-
> > > > 4 files changed, 28 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > > index 2718a77da165..9d7c492e920e 100644
> > > > --- a/arch/arm64/kernel/insn.c
> > > > +++ b/arch/arm64/kernel/insn.c
> > > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > > > * synchronization.
> > > > */
> > > > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > > > - kick_all_cpus_sync();
> > > > + kick_active_cpus_sync();
> > > > return ret;
> > > > }
> > > > }
> > > > diff --git a/include/linux/smp.h b/include/linux/smp.h
> > > > index 9fb239e12b82..27215e22240d 100644
> > > > --- a/include/linux/smp.h
> > > > +++ b/include/linux/smp.h
> > > > @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> > > > smp_call_func_t func, void *info, int wait);
> > > >
> > > > void kick_all_cpus_sync(void);
> > > > +void kick_active_cpus_sync(void);
> > > > void wake_up_all_idle_cpus(void);
> > > >
> > > > /*
> > > > @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> > > > }
> > > >
> > > > static inline void kick_all_cpus_sync(void) { }
> > > > +static inline void kick_active_cpus_sync(void) { }
> > > > static inline void wake_up_all_idle_cpus(void) { }
> > > >
> > > > #ifdef CONFIG_UP_LATE_INIT
> > > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > > index 084c8b3a2681..0358d6673850 100644
> > > > --- a/kernel/smp.c
> > > > +++ b/kernel/smp.c
> > > > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > > > }
> > > > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> > > >
> > > > +/**
> > > > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > > > + * quiescent state (idle or nohz_full userspace) sync by sending
> > > > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > > > + * that state.
> > > > + */
> > > > +void kick_active_cpus_sync(void)
> > > > +{
> > > > + int cpu;
> > > > + struct cpumask kernel_cpus;
> > > > +
> > > > + smp_mb();
> > > > +
> > > > + cpumask_clear(&kernel_cpus);
> > > > + preempt_disable();
> > > > + for_each_online_cpu(cpu) {
> > > > + if (!rcu_eqs_special_set(cpu))
> > >
> > > If we get here, the CPU is not in a quiescent state, so we therefore
> > > must IPI it, correct?
> > >
> > > But don't you also need to define rcu_eqs_special_exit() so that RCU
> > > can invoke it when it next leaves its quiescent state? Or are you able
> > > to ignore the CPU in that case? (If you are able to ignore the CPU in
> > > that case, I could give you a lower-cost function to get your job done.)
> > >
> > > Thanx, Paul
> >
> > What's actually needed for synchronization is issuing memory barrier on target
> > CPUs before we start executing kernel code.
> >
> > smp_mb() is implicitly called in smp_call_function*() path for it. In
> > rcu_eqs_special_set() -> rcu_dynticks_eqs_exit() path, smp_mb__after_atomic()
> > is called just before rcu_eqs_special_exit().
> >
> > So I think, rcu_eqs_special_exit() may be left untouched. Empty
> > rcu_eqs_special_exit() in new RCU path corresponds empty do_nothing() in old
> > IPI path.
> >
> > Or my understanding of smp_mb__after_atomic() is wrong? By default,
> > smp_mb__after_atomic() is just alias to smp_mb(). But some
> > architectures define it differently. x86, for example, aliases it to
> > just barrier() with a comment: "Atomic operations are already
> > serializing on x86".
> >
> > I was initially thinking that it's also fine to leave
> > rcu_eqs_special_exit() empty in this case, but now I'm not sure...
> >
> > Anyway, answering to your question, we shouldn't ignore quiescent
> > CPUs, and rcu_eqs_special_set() path is really needed as it issues
> > memory barrier on them.
>
> An alternative approach would be for me to make something like this
> and export it:
>
> bool rcu_cpu_in_eqs(int cpu)
> {
> struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
> int snap;
>
> smp_mb(); /* Obtain consistent snapshot, pairs with update. */
> snap = READ_ONCE(&rdtp->dynticks);
> smp_mb(); /* See above. */
> return !(snap & RCU_DYNTICK_CTRL_CTR);
> }
>
> Then you could replace your use of rcu_cpu_in_eqs() above with

Did you mean replace rcu_eqs_special_set()?

> the new rcu_cpu_in_eqs(). This would avoid the RMW atomic, and, more
> important, the unnecessary write to ->dynticks.
>
> Or am I missing something?
>
> Thanx, Paul

This will not work because EQS CPUs will not be charged to call
smp_mb() on exit of EQS.

Lets sync our understanding of IPI and RCU mechanisms.

Traditional IPI scheme looks like this:

CPU1: CPU2:
touch shared resource(); /* running any code */
smp_mb();
smp_call_function(); ---> handle_IPI()
{
/* Make resource visible */
smp_mb();
do_nothing();
}

And new RCU scheme for eqs CPUs looks like this:

CPU1: CPU2:
touch shared resource(); /* Running EQS */
smp_mb();

if (RCU_DYNTICK_CTRL_CTR)
set(RCU_DYNTICK_CTRL_MASK); /* Still in EQS */

/* And later */
rcu_dynticks_eqs_exit()
{
if (RCU_DYNTICK_CTRL_MASK) {
/* Make resource visible */
smp_mb();
rcu_eqs_special_exit();
}
}

Is it correct?

Yury

2018-03-28 13:59:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Wed, Mar 28, 2018 at 04:36:05PM +0300, Yury Norov wrote:
> On Mon, Mar 26, 2018 at 05:45:55AM -0700, Paul E. McKenney wrote:
> > On Sun, Mar 25, 2018 at 11:11:54PM +0300, Yury Norov wrote:
> > > On Sun, Mar 25, 2018 at 12:23:28PM -0700, Paul E. McKenney wrote:
> > > > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > > > > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > > > > work may be done at the exit of this state. Delaying synchronization helps to
> > > > > save power if CPU is in idle state and decrease latency for real-time tasks.
> > > > >
> > > > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > > > > code to delay syncronization.
> > > > >
> > > > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > > > > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > > > > of synchronization work helps to maintain isolated state.
> > > > >
> > > > > I've tested it with test from task isolation series on ThunderX2 for more than
> > > > > 10 hours (10k giga-ticks) without breaking isolation.
> > > > >
> > > > > Signed-off-by: Yury Norov <[email protected]>
> > > > > ---
> > > > > arch/arm64/kernel/insn.c | 2 +-
> > > > > include/linux/smp.h | 2 ++
> > > > > kernel/smp.c | 24 ++++++++++++++++++++++++
> > > > > mm/slab.c | 2 +-
> > > > > 4 files changed, 28 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > > > index 2718a77da165..9d7c492e920e 100644
> > > > > --- a/arch/arm64/kernel/insn.c
> > > > > +++ b/arch/arm64/kernel/insn.c
> > > > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > > > > * synchronization.
> > > > > */
> > > > > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > > > > - kick_all_cpus_sync();
> > > > > + kick_active_cpus_sync();
> > > > > return ret;
> > > > > }
> > > > > }
> > > > > diff --git a/include/linux/smp.h b/include/linux/smp.h
> > > > > index 9fb239e12b82..27215e22240d 100644
> > > > > --- a/include/linux/smp.h
> > > > > +++ b/include/linux/smp.h
> > > > > @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> > > > > smp_call_func_t func, void *info, int wait);
> > > > >
> > > > > void kick_all_cpus_sync(void);
> > > > > +void kick_active_cpus_sync(void);
> > > > > void wake_up_all_idle_cpus(void);
> > > > >
> > > > > /*
> > > > > @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> > > > > }
> > > > >
> > > > > static inline void kick_all_cpus_sync(void) { }
> > > > > +static inline void kick_active_cpus_sync(void) { }
> > > > > static inline void wake_up_all_idle_cpus(void) { }
> > > > >
> > > > > #ifdef CONFIG_UP_LATE_INIT
> > > > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > > > index 084c8b3a2681..0358d6673850 100644
> > > > > --- a/kernel/smp.c
> > > > > +++ b/kernel/smp.c
> > > > > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> > > > >
> > > > > +/**
> > > > > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > > > > + * quiescent state (idle or nohz_full userspace) sync by sending
> > > > > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > > > > + * that state.
> > > > > + */
> > > > > +void kick_active_cpus_sync(void)
> > > > > +{
> > > > > + int cpu;
> > > > > + struct cpumask kernel_cpus;
> > > > > +
> > > > > + smp_mb();
> > > > > +
> > > > > + cpumask_clear(&kernel_cpus);
> > > > > + preempt_disable();
> > > > > + for_each_online_cpu(cpu) {
> > > > > + if (!rcu_eqs_special_set(cpu))
> > > >
> > > > If we get here, the CPU is not in a quiescent state, so we therefore
> > > > must IPI it, correct?
> > > >
> > > > But don't you also need to define rcu_eqs_special_exit() so that RCU
> > > > can invoke it when it next leaves its quiescent state? Or are you able
> > > > to ignore the CPU in that case? (If you are able to ignore the CPU in
> > > > that case, I could give you a lower-cost function to get your job done.)
> > > >
> > > > Thanx, Paul
> > >
> > > What's actually needed for synchronization is issuing memory barrier on target
> > > CPUs before we start executing kernel code.
> > >
> > > smp_mb() is implicitly called in smp_call_function*() path for it. In
> > > rcu_eqs_special_set() -> rcu_dynticks_eqs_exit() path, smp_mb__after_atomic()
> > > is called just before rcu_eqs_special_exit().
> > >
> > > So I think, rcu_eqs_special_exit() may be left untouched. Empty
> > > rcu_eqs_special_exit() in new RCU path corresponds empty do_nothing() in old
> > > IPI path.
> > >
> > > Or my understanding of smp_mb__after_atomic() is wrong? By default,
> > > smp_mb__after_atomic() is just alias to smp_mb(). But some
> > > architectures define it differently. x86, for example, aliases it to
> > > just barrier() with a comment: "Atomic operations are already
> > > serializing on x86".
> > >
> > > I was initially thinking that it's also fine to leave
> > > rcu_eqs_special_exit() empty in this case, but now I'm not sure...
> > >
> > > Anyway, answering to your question, we shouldn't ignore quiescent
> > > CPUs, and rcu_eqs_special_set() path is really needed as it issues
> > > memory barrier on them.
> >
> > An alternative approach would be for me to make something like this
> > and export it:
> >
> > bool rcu_cpu_in_eqs(int cpu)
> > {
> > struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
> > int snap;
> >
> > smp_mb(); /* Obtain consistent snapshot, pairs with update. */
> > snap = READ_ONCE(&rdtp->dynticks);
> > smp_mb(); /* See above. */
> > return !(snap & RCU_DYNTICK_CTRL_CTR);
> > }
> >
> > Then you could replace your use of rcu_cpu_in_eqs() above with
>
> Did you mean replace rcu_eqs_special_set()?

Yes, apologies for my confusion, and good show figuring it out. ;-)

> > the new rcu_cpu_in_eqs(). This would avoid the RMW atomic, and, more
> > important, the unnecessary write to ->dynticks.
> >
> > Or am I missing something?
> >
> > Thanx, Paul
>
> This will not work because EQS CPUs will not be charged to call
> smp_mb() on exit of EQS.

Actually, CPUs are guaranteed to do a value-returning atomic increment
of ->dynticks on EQS exit, which implies smp_mb() both before and after
that atomic increment.

> Lets sync our understanding of IPI and RCU mechanisms.
>
> Traditional IPI scheme looks like this:
>
> CPU1: CPU2:
> touch shared resource(); /* running any code */
> smp_mb();
> smp_call_function(); ---> handle_IPI()

EQS exit here, so implied
smp_mb() on both sides of the
->dynticks increment.

> {
> /* Make resource visible */
> smp_mb();
> do_nothing();
> }
>
> And new RCU scheme for eqs CPUs looks like this:
>
> CPU1: CPU2:
> touch shared resource(); /* Running EQS */
> smp_mb();
>
> if (RCU_DYNTICK_CTRL_CTR)
> set(RCU_DYNTICK_CTRL_MASK); /* Still in EQS */
>
> /* And later */
> rcu_dynticks_eqs_exit()
> {
> if (RCU_DYNTICK_CTRL_MASK) {
> /* Make resource visible */
> smp_mb();
> rcu_eqs_special_exit();
> }
> }
>
> Is it correct?

You are missing the atomic_add_return() that is already in
rcu_dynticks_eqs_exit(), and this value-returning atomic operation again
implies smp_mb() both before and after. So you should be covered without
needing to worry about RCU_DYNTICK_CTRL_MASK.

Or am I missing something subtle here?

Thanx, Paul


2018-03-28 14:43:17

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Wed, Mar 28, 2018 at 06:56:17AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 28, 2018 at 04:36:05PM +0300, Yury Norov wrote:
> > On Mon, Mar 26, 2018 at 05:45:55AM -0700, Paul E. McKenney wrote:
> > > On Sun, Mar 25, 2018 at 11:11:54PM +0300, Yury Norov wrote:
> > > > On Sun, Mar 25, 2018 at 12:23:28PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > > > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > > > > > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > > > > > work may be done at the exit of this state. Delaying synchronization helps to
> > > > > > save power if CPU is in idle state and decrease latency for real-time tasks.
> > > > > >
> > > > > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > > > > > code to delay syncronization.
> > > > > >
> > > > > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > > > > > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > > > > > of synchronization work helps to maintain isolated state.
> > > > > >
> > > > > > I've tested it with test from task isolation series on ThunderX2 for more than
> > > > > > 10 hours (10k giga-ticks) without breaking isolation.
> > > > > >
> > > > > > Signed-off-by: Yury Norov <[email protected]>
> > > > > > ---
> > > > > > arch/arm64/kernel/insn.c | 2 +-
> > > > > > include/linux/smp.h | 2 ++
> > > > > > kernel/smp.c | 24 ++++++++++++++++++++++++
> > > > > > mm/slab.c | 2 +-
> > > > > > 4 files changed, 28 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > > > > index 2718a77da165..9d7c492e920e 100644
> > > > > > --- a/arch/arm64/kernel/insn.c
> > > > > > +++ b/arch/arm64/kernel/insn.c
> > > > > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > > > > > * synchronization.
> > > > > > */
> > > > > > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > > > > > - kick_all_cpus_sync();
> > > > > > + kick_active_cpus_sync();
> > > > > > return ret;
> > > > > > }
> > > > > > }
> > > > > > diff --git a/include/linux/smp.h b/include/linux/smp.h
> > > > > > index 9fb239e12b82..27215e22240d 100644
> > > > > > --- a/include/linux/smp.h
> > > > > > +++ b/include/linux/smp.h
> > > > > > @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> > > > > > smp_call_func_t func, void *info, int wait);
> > > > > >
> > > > > > void kick_all_cpus_sync(void);
> > > > > > +void kick_active_cpus_sync(void);
> > > > > > void wake_up_all_idle_cpus(void);
> > > > > >
> > > > > > /*
> > > > > > @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> > > > > > }
> > > > > >
> > > > > > static inline void kick_all_cpus_sync(void) { }
> > > > > > +static inline void kick_active_cpus_sync(void) { }
> > > > > > static inline void wake_up_all_idle_cpus(void) { }
> > > > > >
> > > > > > #ifdef CONFIG_UP_LATE_INIT
> > > > > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > > > > index 084c8b3a2681..0358d6673850 100644
> > > > > > --- a/kernel/smp.c
> > > > > > +++ b/kernel/smp.c
> > > > > > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > > > > > }
> > > > > > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> > > > > >
> > > > > > +/**
> > > > > > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > > > > > + * quiescent state (idle or nohz_full userspace) sync by sending
> > > > > > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > > > > > + * that state.
> > > > > > + */
> > > > > > +void kick_active_cpus_sync(void)
> > > > > > +{
> > > > > > + int cpu;
> > > > > > + struct cpumask kernel_cpus;
> > > > > > +
> > > > > > + smp_mb();
> > > > > > +
> > > > > > + cpumask_clear(&kernel_cpus);
> > > > > > + preempt_disable();
> > > > > > + for_each_online_cpu(cpu) {
> > > > > > + if (!rcu_eqs_special_set(cpu))
> > > > >
> > > > > If we get here, the CPU is not in a quiescent state, so we therefore
> > > > > must IPI it, correct?
> > > > >
> > > > > But don't you also need to define rcu_eqs_special_exit() so that RCU
> > > > > can invoke it when it next leaves its quiescent state? Or are you able
> > > > > to ignore the CPU in that case? (If you are able to ignore the CPU in
> > > > > that case, I could give you a lower-cost function to get your job done.)
> > > > >
> > > > > Thanx, Paul
> > > >
> > > > What's actually needed for synchronization is issuing memory barrier on target
> > > > CPUs before we start executing kernel code.
> > > >
> > > > smp_mb() is implicitly called in smp_call_function*() path for it. In
> > > > rcu_eqs_special_set() -> rcu_dynticks_eqs_exit() path, smp_mb__after_atomic()
> > > > is called just before rcu_eqs_special_exit().
> > > >
> > > > So I think, rcu_eqs_special_exit() may be left untouched. Empty
> > > > rcu_eqs_special_exit() in new RCU path corresponds empty do_nothing() in old
> > > > IPI path.
> > > >
> > > > Or my understanding of smp_mb__after_atomic() is wrong? By default,
> > > > smp_mb__after_atomic() is just alias to smp_mb(). But some
> > > > architectures define it differently. x86, for example, aliases it to
> > > > just barrier() with a comment: "Atomic operations are already
> > > > serializing on x86".
> > > >
> > > > I was initially thinking that it's also fine to leave
> > > > rcu_eqs_special_exit() empty in this case, but now I'm not sure...
> > > >
> > > > Anyway, answering to your question, we shouldn't ignore quiescent
> > > > CPUs, and rcu_eqs_special_set() path is really needed as it issues
> > > > memory barrier on them.
> > >
> > > An alternative approach would be for me to make something like this
> > > and export it:
> > >
> > > bool rcu_cpu_in_eqs(int cpu)
> > > {
> > > struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
> > > int snap;
> > >
> > > smp_mb(); /* Obtain consistent snapshot, pairs with update. */
> > > snap = READ_ONCE(&rdtp->dynticks);
> > > smp_mb(); /* See above. */
> > > return !(snap & RCU_DYNTICK_CTRL_CTR);
> > > }
> > >
> > > Then you could replace your use of rcu_cpu_in_eqs() above with
> >
> > Did you mean replace rcu_eqs_special_set()?
>
> Yes, apologies for my confusion, and good show figuring it out. ;-)
>
> > > the new rcu_cpu_in_eqs(). This would avoid the RMW atomic, and, more
> > > important, the unnecessary write to ->dynticks.
> > >
> > > Or am I missing something?
> > >
> > > Thanx, Paul
> >
> > This will not work because EQS CPUs will not be charged to call
> > smp_mb() on exit of EQS.
>
> Actually, CPUs are guaranteed to do a value-returning atomic increment
> of ->dynticks on EQS exit, which implies smp_mb() both before and after
> that atomic increment.
>
> > Lets sync our understanding of IPI and RCU mechanisms.
> >
> > Traditional IPI scheme looks like this:
> >
> > CPU1: CPU2:
> > touch shared resource(); /* running any code */
> > smp_mb();
> > smp_call_function(); ---> handle_IPI()
>
> EQS exit here, so implied
> smp_mb() on both sides of the
> ->dynticks increment.
>
> > {
> > /* Make resource visible */
> > smp_mb();
> > do_nothing();
> > }
> >
> > And new RCU scheme for eqs CPUs looks like this:
> >
> > CPU1: CPU2:
> > touch shared resource(); /* Running EQS */
> > smp_mb();
> >
> > if (RCU_DYNTICK_CTRL_CTR)
> > set(RCU_DYNTICK_CTRL_MASK); /* Still in EQS */
> >
> > /* And later */
> > rcu_dynticks_eqs_exit()
> > {
> > if (RCU_DYNTICK_CTRL_MASK) {
> > /* Make resource visible */
> > smp_mb();
> > rcu_eqs_special_exit();
> > }
> > }
> >
> > Is it correct?
>
> You are missing the atomic_add_return() that is already in
> rcu_dynticks_eqs_exit(), and this value-returning atomic operation again
> implies smp_mb() both before and after. So you should be covered without
> needing to worry about RCU_DYNTICK_CTRL_MASK.
>
> Or am I missing something subtle here?

Ah, now I understand, thank you. I'll collect other comments for more, and
submit v2 with this change.

Yury

2018-03-28 14:46:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Wed, Mar 28, 2018 at 05:41:40PM +0300, Yury Norov wrote:
> On Wed, Mar 28, 2018 at 06:56:17AM -0700, Paul E. McKenney wrote:
> > On Wed, Mar 28, 2018 at 04:36:05PM +0300, Yury Norov wrote:
> > > On Mon, Mar 26, 2018 at 05:45:55AM -0700, Paul E. McKenney wrote:
> > > > On Sun, Mar 25, 2018 at 11:11:54PM +0300, Yury Norov wrote:
> > > > > On Sun, Mar 25, 2018 at 12:23:28PM -0700, Paul E. McKenney wrote:
> > > > > > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > > > > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > > > > > > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > > > > > > work may be done at the exit of this state. Delaying synchronization helps to
> > > > > > > save power if CPU is in idle state and decrease latency for real-time tasks.
> > > > > > >
> > > > > > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > > > > > > code to delay syncronization.
> > > > > > >
> > > > > > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > > > > > > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > > > > > > of synchronization work helps to maintain isolated state.
> > > > > > >
> > > > > > > I've tested it with test from task isolation series on ThunderX2 for more than
> > > > > > > 10 hours (10k giga-ticks) without breaking isolation.
> > > > > > >
> > > > > > > Signed-off-by: Yury Norov <[email protected]>
> > > > > > > ---
> > > > > > > arch/arm64/kernel/insn.c | 2 +-
> > > > > > > include/linux/smp.h | 2 ++
> > > > > > > kernel/smp.c | 24 ++++++++++++++++++++++++
> > > > > > > mm/slab.c | 2 +-
> > > > > > > 4 files changed, 28 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > > > > > index 2718a77da165..9d7c492e920e 100644
> > > > > > > --- a/arch/arm64/kernel/insn.c
> > > > > > > +++ b/arch/arm64/kernel/insn.c
> > > > > > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > > > > > > * synchronization.
> > > > > > > */
> > > > > > > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > > > > > > - kick_all_cpus_sync();
> > > > > > > + kick_active_cpus_sync();
> > > > > > > return ret;
> > > > > > > }
> > > > > > > }
> > > > > > > diff --git a/include/linux/smp.h b/include/linux/smp.h
> > > > > > > index 9fb239e12b82..27215e22240d 100644
> > > > > > > --- a/include/linux/smp.h
> > > > > > > +++ b/include/linux/smp.h
> > > > > > > @@ -105,6 +105,7 @@ int smp_call_function_any(const struct cpumask *mask,
> > > > > > > smp_call_func_t func, void *info, int wait);
> > > > > > >
> > > > > > > void kick_all_cpus_sync(void);
> > > > > > > +void kick_active_cpus_sync(void);
> > > > > > > void wake_up_all_idle_cpus(void);
> > > > > > >
> > > > > > > /*
> > > > > > > @@ -161,6 +162,7 @@ smp_call_function_any(const struct cpumask *mask, smp_call_func_t func,
> > > > > > > }
> > > > > > >
> > > > > > > static inline void kick_all_cpus_sync(void) { }
> > > > > > > +static inline void kick_active_cpus_sync(void) { }
> > > > > > > static inline void wake_up_all_idle_cpus(void) { }
> > > > > > >
> > > > > > > #ifdef CONFIG_UP_LATE_INIT
> > > > > > > diff --git a/kernel/smp.c b/kernel/smp.c
> > > > > > > index 084c8b3a2681..0358d6673850 100644
> > > > > > > --- a/kernel/smp.c
> > > > > > > +++ b/kernel/smp.c
> > > > > > > @@ -724,6 +724,30 @@ void kick_all_cpus_sync(void)
> > > > > > > }
> > > > > > > EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
> > > > > > >
> > > > > > > +/**
> > > > > > > + * kick_active_cpus_sync - Force CPUs that are not in extended
> > > > > > > + * quiescent state (idle or nohz_full userspace) sync by sending
> > > > > > > + * IPI. Extended quiescent state CPUs will sync at the exit of
> > > > > > > + * that state.
> > > > > > > + */
> > > > > > > +void kick_active_cpus_sync(void)
> > > > > > > +{
> > > > > > > + int cpu;
> > > > > > > + struct cpumask kernel_cpus;
> > > > > > > +
> > > > > > > + smp_mb();
> > > > > > > +
> > > > > > > + cpumask_clear(&kernel_cpus);
> > > > > > > + preempt_disable();
> > > > > > > + for_each_online_cpu(cpu) {
> > > > > > > + if (!rcu_eqs_special_set(cpu))
> > > > > >
> > > > > > If we get here, the CPU is not in a quiescent state, so we therefore
> > > > > > must IPI it, correct?
> > > > > >
> > > > > > But don't you also need to define rcu_eqs_special_exit() so that RCU
> > > > > > can invoke it when it next leaves its quiescent state? Or are you able
> > > > > > to ignore the CPU in that case? (If you are able to ignore the CPU in
> > > > > > that case, I could give you a lower-cost function to get your job done.)
> > > > > >
> > > > > > Thanx, Paul
> > > > >
> > > > > What's actually needed for synchronization is issuing memory barrier on target
> > > > > CPUs before we start executing kernel code.
> > > > >
> > > > > smp_mb() is implicitly called in smp_call_function*() path for it. In
> > > > > rcu_eqs_special_set() -> rcu_dynticks_eqs_exit() path, smp_mb__after_atomic()
> > > > > is called just before rcu_eqs_special_exit().
> > > > >
> > > > > So I think, rcu_eqs_special_exit() may be left untouched. Empty
> > > > > rcu_eqs_special_exit() in new RCU path corresponds empty do_nothing() in old
> > > > > IPI path.
> > > > >
> > > > > Or my understanding of smp_mb__after_atomic() is wrong? By default,
> > > > > smp_mb__after_atomic() is just alias to smp_mb(). But some
> > > > > architectures define it differently. x86, for example, aliases it to
> > > > > just barrier() with a comment: "Atomic operations are already
> > > > > serializing on x86".
> > > > >
> > > > > I was initially thinking that it's also fine to leave
> > > > > rcu_eqs_special_exit() empty in this case, but now I'm not sure...
> > > > >
> > > > > Anyway, answering to your question, we shouldn't ignore quiescent
> > > > > CPUs, and rcu_eqs_special_set() path is really needed as it issues
> > > > > memory barrier on them.
> > > >
> > > > An alternative approach would be for me to make something like this
> > > > and export it:
> > > >
> > > > bool rcu_cpu_in_eqs(int cpu)
> > > > {
> > > > struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
> > > > int snap;
> > > >
> > > > smp_mb(); /* Obtain consistent snapshot, pairs with update. */
> > > > snap = READ_ONCE(&rdtp->dynticks);
> > > > smp_mb(); /* See above. */
> > > > return !(snap & RCU_DYNTICK_CTRL_CTR);
> > > > }
> > > >
> > > > Then you could replace your use of rcu_cpu_in_eqs() above with
> > >
> > > Did you mean replace rcu_eqs_special_set()?
> >
> > Yes, apologies for my confusion, and good show figuring it out. ;-)
> >
> > > > the new rcu_cpu_in_eqs(). This would avoid the RMW atomic, and, more
> > > > important, the unnecessary write to ->dynticks.
> > > >
> > > > Or am I missing something?
> > > >
> > > > Thanx, Paul
> > >
> > > This will not work because EQS CPUs will not be charged to call
> > > smp_mb() on exit of EQS.
> >
> > Actually, CPUs are guaranteed to do a value-returning atomic increment
> > of ->dynticks on EQS exit, which implies smp_mb() both before and after
> > that atomic increment.
> >
> > > Lets sync our understanding of IPI and RCU mechanisms.
> > >
> > > Traditional IPI scheme looks like this:
> > >
> > > CPU1: CPU2:
> > > touch shared resource(); /* running any code */
> > > smp_mb();
> > > smp_call_function(); ---> handle_IPI()
> >
> > EQS exit here, so implied
> > smp_mb() on both sides of the
> > ->dynticks increment.
> >
> > > {
> > > /* Make resource visible */
> > > smp_mb();
> > > do_nothing();
> > > }
> > >
> > > And new RCU scheme for eqs CPUs looks like this:
> > >
> > > CPU1: CPU2:
> > > touch shared resource(); /* Running EQS */
> > > smp_mb();
> > >
> > > if (RCU_DYNTICK_CTRL_CTR)
> > > set(RCU_DYNTICK_CTRL_MASK); /* Still in EQS */
> > >
> > > /* And later */
> > > rcu_dynticks_eqs_exit()
> > > {
> > > if (RCU_DYNTICK_CTRL_MASK) {
> > > /* Make resource visible */
> > > smp_mb();
> > > rcu_eqs_special_exit();
> > > }
> > > }
> > >
> > > Is it correct?
> >
> > You are missing the atomic_add_return() that is already in
> > rcu_dynticks_eqs_exit(), and this value-returning atomic operation again
> > implies smp_mb() both before and after. So you should be covered without
> > needing to worry about RCU_DYNTICK_CTRL_MASK.
> >
> > Or am I missing something subtle here?
>
> Ah, now I understand, thank you. I'll collect other comments for more, and
> submit v2 with this change.

Very good, looking forward to seeing v2.

Thanx, Paul


2018-04-01 11:12:53

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Tue, Mar 27, 2018 at 11:21:17AM +0100, Will Deacon wrote:
> On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > work may be done at the exit of this state. Delaying synchronization helps to
> > save power if CPU is in idle state and decrease latency for real-time tasks.
> >
> > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > code to delay syncronization.
> >
> > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > of synchronization work helps to maintain isolated state.
> >
> > I've tested it with test from task isolation series on ThunderX2 for more than
> > 10 hours (10k giga-ticks) without breaking isolation.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > arch/arm64/kernel/insn.c | 2 +-
> > include/linux/smp.h | 2 ++
> > kernel/smp.c | 24 ++++++++++++++++++++++++
> > mm/slab.c | 2 +-
> > 4 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 2718a77da165..9d7c492e920e 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > * synchronization.
> > */
> > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > - kick_all_cpus_sync();
> > + kick_active_cpus_sync();
> > return ret;
> > }
> > }
>
> I think this means that runtime modifications to the kernel text might not
> be picked up by CPUs coming out of idle. Shouldn't we add an ISB on that
> path to avoid executing stale instructions?
>
> Will

commit 153ae9d5667e7baab4d48c48e8ec30fbcbd86f1e
Author: Yury Norov <[email protected]>
Date: Sat Mar 31 15:05:23 2018 +0300

Hi Will, Paul,

On my system there are 3 paths that go thru rcu_dynticks_eqs_exit(),
and so require isb().

First path starts at gic_handle_irq() on secondary_start_kernel stack.
gic_handle_irq() already issues isb(), and so we can do nothing.

Second path starts at el0_svc entry; and third path is the exit from
do_idle() on secondary_start_kernel stack.

For do_idle() path there is arch_cpu_idle_exit() hook that is not used by
arm64 at now, so I picked it. And for el0_svc, I've introduced isb_if_eqs
macro and call it at the beginning of el0_svc_naked.

I've tested it on ThunderX2 machine, and it works for me.

Below is my call traces and patch for them. If you OK with it, I think I'm
ready to submit v2 (but maybe split this patch for better readability).

Yury

[ 585.412095] Call trace:
[ 585.412097] [<fffffc00080878d8>] dump_backtrace+0x0/0x380
[ 585.412099] [<fffffc0008087c6c>] show_stack+0x14/0x20
[ 585.412101] [<fffffc0008a091ec>] dump_stack+0x98/0xbc
[ 585.412104] [<fffffc0008122080>] rcu_dynticks_eqs_exit+0x68/0x70
[ 585.412105] [<fffffc00081260f0>] rcu_irq_enter+0x48/0x50
[ 585.412106] [<fffffc00080c92c4>] irq_enter+0xc/0x70
[ 585.412108] [<fffffc0008113a64>] __handle_domain_irq+0x3c/0x120
[ 585.412109] [<fffffc00080816c4>] gic_handle_irq+0xc4/0x180
[ 585.412110] Exception stack(0xfffffc001130fe20 to 0xfffffc001130ff60)
[ 585.412112] fe20: 00000000000000a0 0000000000000000 0000000000000001 0000000000000000
[ 585.412113] fe40: 0000028f6f0b0000 0000000000000020 0013cd6f53963b31 0000000000000000
[ 585.412144] fe60: 0000000000000002 fffffc001130fed0 0000000000000b80 0000000000003400
[ 585.412146] fe80: 0000000000000000 0000000000000001 0000000000000000 00000000000001db
[ 585.412147] fea0: fffffc0008247a78 000003ff86dc61f8 0000000000000014 fffffc0008fc0000
[ 585.412149] fec0: fffffc00090143e8 fffffc0009014000 fffffc0008fc94a0 0000000000000000
[ 585.412150] fee0: 0000000000000000 fffffe8f46bb1700 0000000000000000 0000000000000000
[ 585.412152] ff00: 0000000000000000 fffffc001130ff60 fffffc0008085034 fffffc001130ff60
[ 585.412153] ff20: fffffc0008085038 0000000000400149 fffffc0009014000 fffffc0008fc94a0
[ 585.412155] ff40: ffffffffffffffff 0000000000000000 fffffc001130ff60 fffffc0008085038
[ 585.412156] [<fffffc0008082fb0>] el1_irq+0xb0/0x124
[ 585.412158] [<fffffc0008085038>] arch_cpu_idle+0x10/0x18
[ 585.412159] [<fffffc00080ff38c>] do_idle+0x10c/0x1d8
[ 585.412160] [<fffffc00080ff5ec>] cpu_startup_entry+0x24/0x28
[ 585.412162] [<fffffc000808db04>] secondary_start_kernel+0x15c/0x1a0
[ 585.412164] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-isolation-160735-g59b71c1-dirty #18

[ 585.412058] Call trace:
[ 585.412060] [<fffffc00080878d8>] dump_backtrace+0x0/0x380
[ 585.412062] [<fffffc0008087c6c>] show_stack+0x14/0x20
[ 585.412064] [<fffffc0008a091ec>] dump_stack+0x98/0xbc
[ 585.412066] [<fffffc0008122080>] rcu_dynticks_eqs_exit+0x68/0x70
[ 585.412068] [<fffffc00081232bc>] rcu_eqs_exit.isra.23+0x64/0x80
[ 585.412069] [<fffffc000812609c>] rcu_user_exit+0xc/0x18
[ 585.412071] [<fffffc000817c34c>] __context_tracking_exit.part.2+0x74/0x98
[ 585.412072] [<fffffc000817c3e0>] context_tracking_exit.part.3+0x40/0x50
[ 585.412074] [<fffffc000817c4e0>] context_tracking_user_exit+0x30/0x38
[ 585.412075] Exception stack(0xfffffc00385afec0 to 0xfffffc00385b0000)
[ 585.412076] fec0: 00000000000000b1 000002aacd702420 0000000000000200 00000000000001f4
[ 585.412078] fee0: 0000000000000000 0000000000000008 000002aabec9af17 ffffffffffffffff
[ 585.412079] ff00: 0000000000000016 ffffffffffffffff 000003ffe7619470 0000000000000057
[ 585.412081] ff20: a3d70a3d70a3d70b 000000000000016d 2ce33e6c02ce33e7 00000000000001db
[ 585.412082] ff40: 000002aabec7d260 000003ff86dc61f8 0000000000000014 00000000000001f4
[ 585.412083] ff60: 0000000000000000 000002aabecab000 000002aacd6e2830 0000000000000001
[ 585.412085] ff80: 000002aacd6e2830 000002aabec58380 0000000000000054 000002aabebccf50
[ 585.412086] ffa0: 0000000000000054 000003ffe7619540 000002aabebcf538 000003ffe7619540
[ 585.412088] ffc0: 000003ff86dc6410 0000000060000000 00000000000000b1 0000000000000016
[ 585.412089] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 585.412091] [<fffffc0008083498>] el0_svc_naked+0xc/0x3c
[ 585.446067] CPU: 68 PID: 0 Comm: swapper/68 Not tainted 4.14.0-isolation-160735-g59b71c1-dirty #18

[ 585.412038] Call trace:
[ 585.412041] [<fffffc00080878d8>] dump_backtrace+0x0/0x380
[ 585.412042] [<fffffc0008087c6c>] show_stack+0x14/0x20
[ 585.412045] [<fffffc0008a091ec>] dump_stack+0x98/0xbc
[ 585.412047] [<fffffc0008122080>] rcu_dynticks_eqs_exit+0x68/0x70
[ 585.412049] [<fffffc00081232bc>] rcu_eqs_exit.isra.23+0x64/0x80
[ 585.412050] [<fffffc0008126080>] rcu_idle_exit+0x18/0x28
[ 585.412052] [<fffffc00080ff398>] do_idle+0x118/0x1d8
[ 585.412053] [<fffffc00080ff5ec>] cpu_startup_entry+0x24/0x28
[ 585.412055] [<fffffc000808db04>] secondary_start_kernel+0x15c/0x1a0
[ 585.412057] CPU: 22 PID: 4315 Comm: nginx Not tainted 4.14.0-isolation-160735-g59b71c1-dirty #18

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e1c59d4008a8..efa5060a2f1c 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -35,22 +35,29 @@
#include <asm/unistd.h>

/*
- * Context tracking subsystem. Used to instrument transitions
- * between user and kernel mode.
+ * Save/restore needed during syscalls. Restore syscall arguments from
+ * the values already saved on stack during kernel_entry.
*/
- .macro ct_user_exit, syscall = 0
-#ifdef CONFIG_CONTEXT_TRACKING
- bl context_tracking_user_exit
- .if \syscall == 1
- /*
- * Save/restore needed during syscalls. Restore syscall arguments from
- * the values already saved on stack during kernel_entry.
- */
+ .macro __restore_syscall_args
ldp x0, x1, [sp]
ldp x2, x3, [sp, #S_X2]
ldp x4, x5, [sp, #S_X4]
ldp x6, x7, [sp, #S_X6]
- .endif
+ .endm
+
+ .macro el0_svc_restore_syscall_args
+#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
+ __restore_syscall_args
+#endif
+ .endm
+
+/*
+ * Context tracking subsystem. Used to instrument transitions
+ * between user and kernel mode.
+ */
+ .macro ct_user_exit
+#ifdef CONFIG_CONTEXT_TRACKING
+ bl context_tracking_user_exit
#endif
.endm

@@ -433,6 +440,20 @@ __bad_stack:
ASM_BUG()
.endm

+/*
+ * Flush I-cache if CPU is in extended quiescent state
+ */
+ .macro isb_if_eqs
+#ifndef CONFIG_TINY_RCU
+ bl rcu_is_watching
+ tst w0, #0xff
+ b.ne 1f
+ /* Pairs with aarch64_insn_patch_text for EQS CPUs. */
+ isb
+1:
+#endif
+ .endm
+
el0_sync_invalid:
inv_entry 0, BAD_SYNC
ENDPROC(el0_sync_invalid)
@@ -840,8 +861,10 @@ el0_svc:
mov wsc_nr, #__NR_syscalls
el0_svc_naked: // compat entry point
stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
+ isb_if_eqs
enable_dbg_and_irq
- ct_user_exit 1
+ ct_user_exit
+ el0_svc_restore_syscall_args

ldr x16, [tsk, #TSK_TI_FLAGS] // check for syscall hooks
tst x16, #_TIF_SYSCALL_WORK
@@ -874,10 +897,7 @@ __sys_trace:
mov x1, sp // pointer to regs
cmp wscno, wsc_nr // check upper syscall limit
b.hs __ni_sys_trace
- ldp x0, x1, [sp] // restore the syscall args
- ldp x2, x3, [sp, #S_X2]
- ldp x4, x5, [sp, #S_X4]
- ldp x6, x7, [sp, #S_X6]
+ __restore_syscall_args
ldr x16, [stbl, xscno, lsl #3] // address in the syscall table
blr x16 // call sys_* routine

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2dc0f8482210..f11afd2aa33a 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -88,6 +88,12 @@ void arch_cpu_idle(void)
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
}

+void arch_cpu_idle_exit(void)
+{
+ if (!rcu_is_watching())
+ isb();
+}
+
#ifdef CONFIG_HOTPLUG_CPU
void arch_cpu_idle_dead(void)
{

2018-04-01 14:14:28

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Sun, Apr 01, 2018 at 02:11:08PM +0300, Yury Norov wrote:
> On Tue, Mar 27, 2018 at 11:21:17AM +0100, Will Deacon wrote:
> > On Sun, Mar 25, 2018 at 08:50:04PM +0300, Yury Norov wrote:
> > > kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast IPI.
> > > If CPU is in extended quiescent state (idle task or nohz_full userspace), this
> > > work may be done at the exit of this state. Delaying synchronization helps to
> > > save power if CPU is in idle state and decrease latency for real-time tasks.
> > >
> > > This patch introduces kick_active_cpus_sync() and uses it in mm/slab and arm64
> > > code to delay syncronization.
> > >
> > > For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU running
> > > isolated task would be fatal, as it breaks isolation. The approach with delaying
> > > of synchronization work helps to maintain isolated state.
> > >
> > > I've tested it with test from task isolation series on ThunderX2 for more than
> > > 10 hours (10k giga-ticks) without breaking isolation.
> > >
> > > Signed-off-by: Yury Norov <[email protected]>
> > > ---
> > > arch/arm64/kernel/insn.c | 2 +-
> > > include/linux/smp.h | 2 ++
> > > kernel/smp.c | 24 ++++++++++++++++++++++++
> > > mm/slab.c | 2 +-
> > > 4 files changed, 28 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > > index 2718a77da165..9d7c492e920e 100644
> > > --- a/arch/arm64/kernel/insn.c
> > > +++ b/arch/arm64/kernel/insn.c
> > > @@ -291,7 +291,7 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
> > > * synchronization.
> > > */
> > > ret = aarch64_insn_patch_text_nosync(addrs[0], insns[0]);
> > > - kick_all_cpus_sync();
> > > + kick_active_cpus_sync();
> > > return ret;
> > > }
> > > }
> >
> > I think this means that runtime modifications to the kernel text might not
> > be picked up by CPUs coming out of idle. Shouldn't we add an ISB on that
> > path to avoid executing stale instructions?
> >
> > Will
>
> commit 153ae9d5667e7baab4d48c48e8ec30fbcbd86f1e
> Author: Yury Norov <[email protected]>
> Date: Sat Mar 31 15:05:23 2018 +0300
>
> Hi Will, Paul,
>
> On my system there are 3 paths that go thru rcu_dynticks_eqs_exit(),
> and so require isb().
>
> First path starts at gic_handle_irq() on secondary_start_kernel stack.
> gic_handle_irq() already issues isb(), and so we can do nothing.
>
> Second path starts at el0_svc entry; and third path is the exit from
> do_idle() on secondary_start_kernel stack.
>
> For do_idle() path there is arch_cpu_idle_exit() hook that is not used by
> arm64 at now, so I picked it. And for el0_svc, I've introduced isb_if_eqs
> macro and call it at the beginning of el0_svc_naked.
>
> I've tested it on ThunderX2 machine, and it works for me.
>
> Below is my call traces and patch for them. If you OK with it, I think I'm
> ready to submit v2 (but maybe split this patch for better readability).

I must defer to Will on this one.

Thanx, Paul

> Yury
>
> [ 585.412095] Call trace:
> [ 585.412097] [<fffffc00080878d8>] dump_backtrace+0x0/0x380
> [ 585.412099] [<fffffc0008087c6c>] show_stack+0x14/0x20
> [ 585.412101] [<fffffc0008a091ec>] dump_stack+0x98/0xbc
> [ 585.412104] [<fffffc0008122080>] rcu_dynticks_eqs_exit+0x68/0x70
> [ 585.412105] [<fffffc00081260f0>] rcu_irq_enter+0x48/0x50
> [ 585.412106] [<fffffc00080c92c4>] irq_enter+0xc/0x70
> [ 585.412108] [<fffffc0008113a64>] __handle_domain_irq+0x3c/0x120
> [ 585.412109] [<fffffc00080816c4>] gic_handle_irq+0xc4/0x180
> [ 585.412110] Exception stack(0xfffffc001130fe20 to 0xfffffc001130ff60)
> [ 585.412112] fe20: 00000000000000a0 0000000000000000 0000000000000001 0000000000000000
> [ 585.412113] fe40: 0000028f6f0b0000 0000000000000020 0013cd6f53963b31 0000000000000000
> [ 585.412144] fe60: 0000000000000002 fffffc001130fed0 0000000000000b80 0000000000003400
> [ 585.412146] fe80: 0000000000000000 0000000000000001 0000000000000000 00000000000001db
> [ 585.412147] fea0: fffffc0008247a78 000003ff86dc61f8 0000000000000014 fffffc0008fc0000
> [ 585.412149] fec0: fffffc00090143e8 fffffc0009014000 fffffc0008fc94a0 0000000000000000
> [ 585.412150] fee0: 0000000000000000 fffffe8f46bb1700 0000000000000000 0000000000000000
> [ 585.412152] ff00: 0000000000000000 fffffc001130ff60 fffffc0008085034 fffffc001130ff60
> [ 585.412153] ff20: fffffc0008085038 0000000000400149 fffffc0009014000 fffffc0008fc94a0
> [ 585.412155] ff40: ffffffffffffffff 0000000000000000 fffffc001130ff60 fffffc0008085038
> [ 585.412156] [<fffffc0008082fb0>] el1_irq+0xb0/0x124
> [ 585.412158] [<fffffc0008085038>] arch_cpu_idle+0x10/0x18
> [ 585.412159] [<fffffc00080ff38c>] do_idle+0x10c/0x1d8
> [ 585.412160] [<fffffc00080ff5ec>] cpu_startup_entry+0x24/0x28
> [ 585.412162] [<fffffc000808db04>] secondary_start_kernel+0x15c/0x1a0
> [ 585.412164] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-isolation-160735-g59b71c1-dirty #18
>
> [ 585.412058] Call trace:
> [ 585.412060] [<fffffc00080878d8>] dump_backtrace+0x0/0x380
> [ 585.412062] [<fffffc0008087c6c>] show_stack+0x14/0x20
> [ 585.412064] [<fffffc0008a091ec>] dump_stack+0x98/0xbc
> [ 585.412066] [<fffffc0008122080>] rcu_dynticks_eqs_exit+0x68/0x70
> [ 585.412068] [<fffffc00081232bc>] rcu_eqs_exit.isra.23+0x64/0x80
> [ 585.412069] [<fffffc000812609c>] rcu_user_exit+0xc/0x18
> [ 585.412071] [<fffffc000817c34c>] __context_tracking_exit.part.2+0x74/0x98
> [ 585.412072] [<fffffc000817c3e0>] context_tracking_exit.part.3+0x40/0x50
> [ 585.412074] [<fffffc000817c4e0>] context_tracking_user_exit+0x30/0x38
> [ 585.412075] Exception stack(0xfffffc00385afec0 to 0xfffffc00385b0000)
> [ 585.412076] fec0: 00000000000000b1 000002aacd702420 0000000000000200 00000000000001f4
> [ 585.412078] fee0: 0000000000000000 0000000000000008 000002aabec9af17 ffffffffffffffff
> [ 585.412079] ff00: 0000000000000016 ffffffffffffffff 000003ffe7619470 0000000000000057
> [ 585.412081] ff20: a3d70a3d70a3d70b 000000000000016d 2ce33e6c02ce33e7 00000000000001db
> [ 585.412082] ff40: 000002aabec7d260 000003ff86dc61f8 0000000000000014 00000000000001f4
> [ 585.412083] ff60: 0000000000000000 000002aabecab000 000002aacd6e2830 0000000000000001
> [ 585.412085] ff80: 000002aacd6e2830 000002aabec58380 0000000000000054 000002aabebccf50
> [ 585.412086] ffa0: 0000000000000054 000003ffe7619540 000002aabebcf538 000003ffe7619540
> [ 585.412088] ffc0: 000003ff86dc6410 0000000060000000 00000000000000b1 0000000000000016
> [ 585.412089] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 585.412091] [<fffffc0008083498>] el0_svc_naked+0xc/0x3c
> [ 585.446067] CPU: 68 PID: 0 Comm: swapper/68 Not tainted 4.14.0-isolation-160735-g59b71c1-dirty #18
>
> [ 585.412038] Call trace:
> [ 585.412041] [<fffffc00080878d8>] dump_backtrace+0x0/0x380
> [ 585.412042] [<fffffc0008087c6c>] show_stack+0x14/0x20
> [ 585.412045] [<fffffc0008a091ec>] dump_stack+0x98/0xbc
> [ 585.412047] [<fffffc0008122080>] rcu_dynticks_eqs_exit+0x68/0x70
> [ 585.412049] [<fffffc00081232bc>] rcu_eqs_exit.isra.23+0x64/0x80
> [ 585.412050] [<fffffc0008126080>] rcu_idle_exit+0x18/0x28
> [ 585.412052] [<fffffc00080ff398>] do_idle+0x118/0x1d8
> [ 585.412053] [<fffffc00080ff5ec>] cpu_startup_entry+0x24/0x28
> [ 585.412055] [<fffffc000808db04>] secondary_start_kernel+0x15c/0x1a0
> [ 585.412057] CPU: 22 PID: 4315 Comm: nginx Not tainted 4.14.0-isolation-160735-g59b71c1-dirty #18
>
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index e1c59d4008a8..efa5060a2f1c 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -35,22 +35,29 @@
> #include <asm/unistd.h>
>
> /*
> - * Context tracking subsystem. Used to instrument transitions
> - * between user and kernel mode.
> + * Save/restore needed during syscalls. Restore syscall arguments from
> + * the values already saved on stack during kernel_entry.
> */
> - .macro ct_user_exit, syscall = 0
> -#ifdef CONFIG_CONTEXT_TRACKING
> - bl context_tracking_user_exit
> - .if \syscall == 1
> - /*
> - * Save/restore needed during syscalls. Restore syscall arguments from
> - * the values already saved on stack during kernel_entry.
> - */
> + .macro __restore_syscall_args
> ldp x0, x1, [sp]
> ldp x2, x3, [sp, #S_X2]
> ldp x4, x5, [sp, #S_X4]
> ldp x6, x7, [sp, #S_X6]
> - .endif
> + .endm
> +
> + .macro el0_svc_restore_syscall_args
> +#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
> + __restore_syscall_args
> +#endif
> + .endm
> +
> +/*
> + * Context tracking subsystem. Used to instrument transitions
> + * between user and kernel mode.
> + */
> + .macro ct_user_exit
> +#ifdef CONFIG_CONTEXT_TRACKING
> + bl context_tracking_user_exit
> #endif
> .endm
>
> @@ -433,6 +440,20 @@ __bad_stack:
> ASM_BUG()
> .endm
>
> +/*
> + * Flush I-cache if CPU is in extended quiescent state
> + */
> + .macro isb_if_eqs
> +#ifndef CONFIG_TINY_RCU
> + bl rcu_is_watching
> + tst w0, #0xff
> + b.ne 1f
> + /* Pairs with aarch64_insn_patch_text for EQS CPUs. */
> + isb
> +1:
> +#endif
> + .endm
> +
> el0_sync_invalid:
> inv_entry 0, BAD_SYNC
> ENDPROC(el0_sync_invalid)
> @@ -840,8 +861,10 @@ el0_svc:
> mov wsc_nr, #__NR_syscalls
> el0_svc_naked: // compat entry point
> stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
> + isb_if_eqs
> enable_dbg_and_irq
> - ct_user_exit 1
> + ct_user_exit
> + el0_svc_restore_syscall_args
>
> ldr x16, [tsk, #TSK_TI_FLAGS] // check for syscall hooks
> tst x16, #_TIF_SYSCALL_WORK
> @@ -874,10 +897,7 @@ __sys_trace:
> mov x1, sp // pointer to regs
> cmp wscno, wsc_nr // check upper syscall limit
> b.hs __ni_sys_trace
> - ldp x0, x1, [sp] // restore the syscall args
> - ldp x2, x3, [sp, #S_X2]
> - ldp x4, x5, [sp, #S_X4]
> - ldp x6, x7, [sp, #S_X6]
> + __restore_syscall_args
> ldr x16, [stbl, xscno, lsl #3] // address in the syscall table
> blr x16 // call sys_* routine
>
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 2dc0f8482210..f11afd2aa33a 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -88,6 +88,12 @@ void arch_cpu_idle(void)
> trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> }
>
> +void arch_cpu_idle_exit(void)
> +{
> + if (!rcu_is_watching())
> + isb();
> +}
> +
> #ifdef CONFIG_HOTPLUG_CPU
> void arch_cpu_idle_dead(void)
> {
>


2018-04-03 13:50:14

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

Hi Yury,

On Sun, Apr 01, 2018 at 02:11:08PM +0300, Yury Norov wrote:
> +/*
> + * Flush I-cache if CPU is in extended quiescent state
> + */

This comment is misleading. An ISB doesn't touch the I-cache; it forces
a context synchronization event.

> + .macro isb_if_eqs
> +#ifndef CONFIG_TINY_RCU
> + bl rcu_is_watching
> + tst w0, #0xff
> + b.ne 1f

The TST+B.NE can be a CBNZ:

bl rcu_is_watching
cbnz x0, 1f
isb
1:

> + /* Pairs with aarch64_insn_patch_text for EQS CPUs. */
> + isb
> +1:
> +#endif
> + .endm
> +
> el0_sync_invalid:
> inv_entry 0, BAD_SYNC
> ENDPROC(el0_sync_invalid)
> @@ -840,8 +861,10 @@ el0_svc:
> mov wsc_nr, #__NR_syscalls
> el0_svc_naked: // compat entry point
> stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
> + isb_if_eqs
> enable_dbg_and_irq
> - ct_user_exit 1
> + ct_user_exit

I don't think this is safe. here we issue the ISB *before* exiting a
quiesecent state, so I think we can race with another CPU that calls
kick_all_active_cpus_sync, e.g.

CPU0 CPU1

ISB
patch_some_text()
kick_all_active_cpus_sync()
ct_user_exit

// not synchronized!
use_of_patched_text()

... and therefore the ISB has no effect, which could be disasterous.

I believe we need the ISB *after* we transition into a non-quiescent
state, so that we can't possibly miss a context synchronization event.

Thanks,
Mark.

2018-04-04 03:38:55

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

Hi Mark,

Thank you for review.

On Tue, Apr 03, 2018 at 02:48:32PM +0100, Mark Rutland wrote:
> Hi Yury,
>
> On Sun, Apr 01, 2018 at 02:11:08PM +0300, Yury Norov wrote:
> > +/*
> > + * Flush I-cache if CPU is in extended quiescent state
> > + */
>
> This comment is misleading. An ISB doesn't touch the I-cache; it forces
> a context synchronization event.
>
> > + .macro isb_if_eqs
> > +#ifndef CONFIG_TINY_RCU
> > + bl rcu_is_watching
> > + tst w0, #0xff
> > + b.ne 1f
>
> The TST+B.NE can be a CBNZ:
>
> bl rcu_is_watching
> cbnz x0, 1f
> isb
> 1:
>
> > + /* Pairs with aarch64_insn_patch_text for EQS CPUs. */
> > + isb
> > +1:
> > +#endif
> > + .endm
> > +
> > el0_sync_invalid:
> > inv_entry 0, BAD_SYNC
> > ENDPROC(el0_sync_invalid)
> > @@ -840,8 +861,10 @@ el0_svc:
> > mov wsc_nr, #__NR_syscalls
> > el0_svc_naked: // compat entry point
> > stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
> > + isb_if_eqs
> > enable_dbg_and_irq
> > - ct_user_exit 1
> > + ct_user_exit
>
> I don't think this is safe. here we issue the ISB *before* exiting a
> quiesecent state, so I think we can race with another CPU that calls
> kick_all_active_cpus_sync, e.g.
>
> CPU0 CPU1
>
> ISB
> patch_some_text()
> kick_all_active_cpus_sync()
> ct_user_exit
>
> // not synchronized!
> use_of_patched_text()
>
> ... and therefore the ISB has no effect, which could be disasterous.
>
> I believe we need the ISB *after* we transition into a non-quiescent
> state, so that we can't possibly miss a context synchronization event.

I decided to put isb() in entry because there's a chance that there will
be patched code prior to exiting a quiescent state. But after some
headscratching, I think it's safe. I'll do like you suggested here.

Thanks,
Yury

2018-04-04 09:10:08

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 2/2] smp: introduce kick_active_cpus_sync()

On Wed, Apr 04, 2018 at 06:36:25AM +0300, Yury Norov wrote:
> On Tue, Apr 03, 2018 at 02:48:32PM +0100, Mark Rutland wrote:
> > On Sun, Apr 01, 2018 at 02:11:08PM +0300, Yury Norov wrote:
> > > @@ -840,8 +861,10 @@ el0_svc:
> > > mov wsc_nr, #__NR_syscalls
> > > el0_svc_naked: // compat entry point
> > > stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number
> > > + isb_if_eqs
> > > enable_dbg_and_irq
> > > - ct_user_exit 1
> > > + ct_user_exit
> >
> > I don't think this is safe. here we issue the ISB *before* exiting a
> > quiesecent state, so I think we can race with another CPU that calls
> > kick_all_active_cpus_sync, e.g.
> >
> > CPU0 CPU1
> >
> > ISB
> > patch_some_text()
> > kick_all_active_cpus_sync()
> > ct_user_exit
> >
> > // not synchronized!
> > use_of_patched_text()
> >
> > ... and therefore the ISB has no effect, which could be disasterous.
> >
> > I believe we need the ISB *after* we transition into a non-quiescent
> > state, so that we can't possibly miss a context synchronization event.
>
> I decided to put isb() in entry because there's a chance that there will
> be patched code prior to exiting a quiescent state.

If we do patch entry text, then I think we have no option but to use
kick_all_active_cpus_sync(), or we risk races similar to the above.

> But after some headscratching, I think it's safe. I'll do like you
> suggested here.

Sounds good.

Thanks,
Mark.