Hello!
This series contains miscellaneous fixes to RCU:
1. Tighten up __call_rcu's alignment check for the rcu_head structure
now that no arch does two-byte alignment of pointers.
2. Remove obsolete comment about when rcu_check_callbacks() was invoked.
3. Remove obsolete comment about which function opportunistically
notes grace periods beginnings and endings.
4. Update the RCU_TRACE Kconfig option's help text to note that
it enables event tracing in addition to debugfs, courtesy of
Nikolay Borisov.
5. Add event tracing for long read-side delays in rcutorture
in order to better debug too-short grace periods.
6. Make expedited grace periods recheck dyntick-idle state to avoid
sending needless IPIs.
7. Don't kick CPUs unless there is grace-period activity currently
in progress.
Thanx, Paul
------------------------------------------------------------------------
include/trace/events/rcu.h | 5 ++++-
kernel/rcu/rcutorture.c | 11 ++++++++++-
kernel/rcu/tree.c | 17 ++++++-----------
kernel/rcu/tree.h | 1 +
kernel/rcu/tree_exp.h | 12 +++++++++++-
lib/Kconfig.debug | 3 ++-
6 files changed, 34 insertions(+), 15 deletions(-)
In the deep past, rcu_check_callbacks() was only invoked if rcu_pending()
returned true. Which was fine, but these days rcu_check_callbacks()
is invoked unconditionally. This commit therefore removes the obsolete
sentence from the header comment.
Reported-by: Michalis Kokologiannakis <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 37e4f7d2be0c..d52780024f9f 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2828,8 +2828,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
* Also schedule RCU core processing.
*
* This function must be called from hardirq context. It is normally
- * invoked from the scheduling-clock interrupt. If rcu_pending returns
- * false, there is no point in invoking rcu_check_callbacks().
+ * invoked from the scheduling-clock interrupt.
*/
void rcu_check_callbacks(int user)
{
--
2.5.2
Commit 720abae3d68ae ("rcu: force alignment on struct
callback_head/rcu_head") forced the rcu_head (AKA callback_head)
structure's alignment to pointer size, that is, to 4-byte boundaries on
32-bit systems and to 8-byte boundaries on 64-bit systems. This
commit therefore checks for this same alignment in __call_rcu(),
which used to contain a looser check for two-byte alignment.
Signed-off-by: Paul E. McKenney <[email protected]>
Tested-by: Geert Uytterhoeven <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
---
kernel/rcu/tree.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 69a5611a7e7c..37e4f7d2be0c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3121,7 +3121,9 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func,
unsigned long flags;
struct rcu_data *rdp;
- WARN_ON_ONCE((unsigned long)head & 0x1); /* Misaligned rcu_head! */
+ /* Misaligned rcu_head! */
+ WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
+
if (debug_rcu_head_queue(head)) {
/* Probable double call_rcu(), so leak the callback. */
WRITE_ONCE(head->func, rcu_leak_callback);
--
2.5.2
The current code can result in spurious kicks when there are no grace
periods in progress and no grace-period-related requests. This is
sort of OK for a diagnostic aid, but the resulting ftrace-dump messages
in dmesg are annoying. This commit therefore avoids spurious kicks
in the common case.
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 865af187498e..96c52e43f7ca 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1304,7 +1304,8 @@ static void rcu_stall_kick_kthreads(struct rcu_state *rsp)
if (!rcu_kick_kthreads)
return;
j = READ_ONCE(rsp->jiffies_kick_kthreads);
- if (time_after(jiffies, j) && rsp->gp_kthread) {
+ if (time_after(jiffies, j) && rsp->gp_kthread &&
+ (rcu_gp_in_progress(rsp) || READ_ONCE(rsp->gp_flags))) {
WARN_ONCE(1, "Kicking %s grace-period kthread\n", rsp->name);
rcu_ftrace_dump(DUMP_ALL);
wake_up_process(rsp->gp_kthread);
--
2.5.2
The __call_rcu() comment about opportunistically noting grace period
beginnings and endings is obsolete. RCU still does such opportunistic
noting, but in __call_rcu_core() rather than __call_rcu(), and there
already is an appropriate comment in __call_rcu_core(). This commit
therefore removes the obsolete comment.
Reported-by: Michalis Kokologiannakis <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d52780024f9f..865af187498e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3131,13 +3131,6 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func,
}
head->func = func;
head->next = NULL;
-
- /*
- * Opportunistically note grace-period endings and beginnings.
- * Note that we might see a beginning right after we see an
- * end, but never vice versa, since this CPU has to pass through
- * a quiescent state betweentimes.
- */
local_irq_save(flags);
rdp = this_cpu_ptr(rsp->rda);
--
2.5.2
From: Nikolay Borisov <[email protected]>
The commit brings the RCU_TRACE Kconfig option's help text up to date
by noting that it enables additional event tracing as well as debugfs.
Signed-off-by: Nikolay Borisov <[email protected]>
[ paulmck: Do some wordsmithing. ]
Signed-off-by: Paul E. McKenney <[email protected]>
---
lib/Kconfig.debug | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 33bc56cf60d7..4a431f18f218 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1430,7 +1430,8 @@ config RCU_TRACE
select TRACE_CLOCK
help
This option provides tracing in RCU which presents stats
- in debugfs for debugging RCU implementation.
+ in debugfs for debugging RCU implementation. It also enables
+ additional tracepoints for ftrace-style event tracing.
Say Y here if you want to enable RCU tracing
Say N if you are unsure.
--
2.5.2
Although rcutorture will occasionally do a 50-millisecond grace-period
delay, these delays are quite rare. And rightly so, because otherwise
the read rate would be quite low. Thie means that it can be important
to identify whether or not a given run contained a long-delay read.
This commit therefore inserts a trace_rcu_torture_read() event to flag
runs containing long delays.
Signed-off-by: Paul E. McKenney <[email protected]>
---
include/trace/events/rcu.h | 5 ++++-
kernel/rcu/rcutorture.c | 11 ++++++++++-
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index d3e756539d44..b31e05bc8e26 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -698,7 +698,10 @@ TRACE_EVENT(rcu_batch_end,
/*
* Tracepoint for rcutorture readers. The first argument is the name
* of the RCU flavor from rcutorture's viewpoint and the second argument
- * is the callback address.
+ * is the callback address. The third callback is the start time in
+ * seconds, and the last two arguments are the grace period numbers
+ * and the beginning and end of the read, respectively. Note that the
+ * callback address can be NULL.
*/
TRACE_EVENT(rcu_torture_read,
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index bf08fee53dc7..87c51225ceec 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -289,15 +289,24 @@ static int rcu_torture_read_lock(void) __acquires(RCU)
static void rcu_read_delay(struct torture_random_state *rrsp)
{
+ unsigned long started;
+ unsigned long completed;
const unsigned long shortdelay_us = 200;
const unsigned long longdelay_ms = 50;
+ unsigned long long ts;
/* We want a short delay sometimes to make a reader delay the grace
* period, and we want a long delay occasionally to trigger
* force_quiescent_state. */
- if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms)))
+ if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
+ started = cur_ops->completed();
+ ts = rcu_trace_clock_local();
mdelay(longdelay_ms);
+ completed = cur_ops->completed();
+ do_trace_rcu_torture_read(cur_ops->name, NULL, ts,
+ started, completed);
+ }
if (!(torture_random(rrsp) % (nrealreaders * 2 * shortdelay_us)))
udelay(shortdelay_us);
#ifdef CONFIG_PREEMPT
--
2.5.2
Expedited grace periods check dyntick-idle state, and avoid sending
IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
kernels, nohz_full CPUs. However, the kernel has been observed checking
a CPU while it was non-idle, but sending the IPI after it has gone
idle. This commit therefore rechecks idle state immediately before
sending the IPI, refraining from IPIing CPUs that have since gone idle.
Reported-by: Rik van Riel <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.h | 1 +
kernel/rcu/tree_exp.h | 12 +++++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index e99a5234d9ed..fe98dd24adf8 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -404,6 +404,7 @@ struct rcu_data {
atomic_long_t exp_workdone1; /* # done by others #1. */
atomic_long_t exp_workdone2; /* # done by others #2. */
atomic_long_t exp_workdone3; /* # done by others #3. */
+ int exp_dynticks_snap; /* Double-check need for IPI. */
/* 7) Callback offloading. */
#ifdef CONFIG_RCU_NOCB_CPU
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 24343eb87b58..d3053e99fdb6 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -358,8 +358,10 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+ rdp->exp_dynticks_snap =
+ atomic_add_return(0, &rdtp->dynticks);
if (raw_smp_processor_id() == cpu ||
- !(atomic_add_return(0, &rdtp->dynticks) & 0x1) ||
+ !(rdp->exp_dynticks_snap & 0x1) ||
!(rnp->qsmaskinitnext & rdp->grpmask))
mask_ofl_test |= rdp->grpmask;
}
@@ -377,9 +379,17 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
/* IPI the remaining CPUs for expedited quiescent state. */
for_each_leaf_node_possible_cpu(rnp, cpu) {
unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
+ struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+
if (!(mask_ofl_ipi & mask))
continue;
retry_ipi:
+ if (atomic_add_return(0, &rdtp->dynticks) !=
+ rdp->exp_dynticks_snap) {
+ mask_ofl_test |= mask;
+ continue;
+ }
ret = smp_call_function_single(cpu, func, rsp, 0);
if (!ret) {
mask_ofl_ipi &= ~mask;
--
2.5.2
On Mon, Nov 14, 2016 at 08:57:11AM -0800, Paul E. McKenney wrote:
> Although rcutorture will occasionally do a 50-millisecond grace-period
> delay, these delays are quite rare. And rightly so, because otherwise
> the read rate would be quite low. Thie means that it can be important
> to identify whether or not a given run contained a long-delay read.
> This commit therefore inserts a trace_rcu_torture_read() event to flag
> runs containing long delays.
>
> Signed-off-by: Paul E. McKenney <[email protected]>
A couple of apparent typos below. With those fixed:
Reviewed-by: Josh Triplett <[email protected]>
> include/trace/events/rcu.h | 5 ++++-
> kernel/rcu/rcutorture.c | 11 ++++++++++-
> 2 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index d3e756539d44..b31e05bc8e26 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -698,7 +698,10 @@ TRACE_EVENT(rcu_batch_end,
> /*
> * Tracepoint for rcutorture readers. The first argument is the name
> * of the RCU flavor from rcutorture's viewpoint and the second argument
> - * is the callback address.
> + * is the callback address. The third callback is the start time in
> + * seconds, and the last two arguments are the grace period numbers
> + * and the beginning and end of the read, respectively. Note that the
> + * callback address can be NULL.
s/third callback/third argument/?
Also, s/and the beginning/of the beginning/?
> TRACE_EVENT(rcu_torture_read,
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index bf08fee53dc7..87c51225ceec 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -289,15 +289,24 @@ static int rcu_torture_read_lock(void) __acquires(RCU)
>
> static void rcu_read_delay(struct torture_random_state *rrsp)
> {
> + unsigned long started;
> + unsigned long completed;
> const unsigned long shortdelay_us = 200;
> const unsigned long longdelay_ms = 50;
> + unsigned long long ts;
>
> /* We want a short delay sometimes to make a reader delay the grace
> * period, and we want a long delay occasionally to trigger
> * force_quiescent_state. */
>
> - if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms)))
> + if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
> + started = cur_ops->completed();
> + ts = rcu_trace_clock_local();
> mdelay(longdelay_ms);
> + completed = cur_ops->completed();
> + do_trace_rcu_torture_read(cur_ops->name, NULL, ts,
> + started, completed);
> + }
> if (!(torture_random(rrsp) % (nrealreaders * 2 * shortdelay_us)))
> udelay(shortdelay_us);
> #ifdef CONFIG_PREEMPT
> --
> 2.5.2
>
On Mon, Nov 14, 2016 at 08:57:12AM -0800, Paul E. McKenney wrote:
> Expedited grace periods check dyntick-idle state, and avoid sending
> IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
> kernels, nohz_full CPUs. However, the kernel has been observed checking
> a CPU while it was non-idle, but sending the IPI after it has gone
> idle. This commit therefore rechecks idle state immediately before
> sending the IPI, refraining from IPIing CPUs that have since gone idle.
>
> Reported-by: Rik van Riel <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
atomic_add_return(0, ...) seems odd. Do you actually want that, rather
than atomic_read(...)? If so, can you please document exactly why?
> kernel/rcu/tree.h | 1 +
> kernel/rcu/tree_exp.h | 12 +++++++++++-
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
> index e99a5234d9ed..fe98dd24adf8 100644
> --- a/kernel/rcu/tree.h
> +++ b/kernel/rcu/tree.h
> @@ -404,6 +404,7 @@ struct rcu_data {
> atomic_long_t exp_workdone1; /* # done by others #1. */
> atomic_long_t exp_workdone2; /* # done by others #2. */
> atomic_long_t exp_workdone3; /* # done by others #3. */
> + int exp_dynticks_snap; /* Double-check need for IPI. */
>
> /* 7) Callback offloading. */
> #ifdef CONFIG_RCU_NOCB_CPU
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 24343eb87b58..d3053e99fdb6 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -358,8 +358,10 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
> struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
>
> + rdp->exp_dynticks_snap =
> + atomic_add_return(0, &rdtp->dynticks);
> if (raw_smp_processor_id() == cpu ||
> - !(atomic_add_return(0, &rdtp->dynticks) & 0x1) ||
> + !(rdp->exp_dynticks_snap & 0x1) ||
> !(rnp->qsmaskinitnext & rdp->grpmask))
> mask_ofl_test |= rdp->grpmask;
> }
> @@ -377,9 +379,17 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
> /* IPI the remaining CPUs for expedited quiescent state. */
> for_each_leaf_node_possible_cpu(rnp, cpu) {
> unsigned long mask = leaf_node_cpu_bit(rnp, cpu);
> + struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
> + struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
> +
> if (!(mask_ofl_ipi & mask))
> continue;
> retry_ipi:
> + if (atomic_add_return(0, &rdtp->dynticks) !=
> + rdp->exp_dynticks_snap) {
> + mask_ofl_test |= mask;
> + continue;
> + }
> ret = smp_call_function_single(cpu, func, rsp, 0);
> if (!ret) {
> mask_ofl_ipi &= ~mask;
> --
> 2.5.2
>
On Mon, Nov 14, 2016 at 08:56:48AM -0800, Paul E. McKenney wrote:
> Hello!
>
> This series contains miscellaneous fixes to RCU:
>
> 1. Tighten up __call_rcu's alignment check for the rcu_head structure
> now that no arch does two-byte alignment of pointers.
>
> 2. Remove obsolete comment about when rcu_check_callbacks() was invoked.
>
> 3. Remove obsolete comment about which function opportunistically
> notes grace periods beginnings and endings.
>
> 4. Update the RCU_TRACE Kconfig option's help text to note that
> it enables event tracing in addition to debugfs, courtesy of
> Nikolay Borisov.
>
> 5. Add event tracing for long read-side delays in rcutorture
> in order to better debug too-short grace periods.
>
> 6. Make expedited grace periods recheck dyntick-idle state to avoid
> sending needless IPIs.
>
> 7. Don't kick CPUs unless there is grace-period activity currently
> in progress.
I replied to patches 5 and 6 with comments. For 1-4 and 7:
Reviewed-by: Josh Triplett <[email protected]>
On Mon, Nov 14, 2016 at 09:25:12AM -0800, Josh Triplett wrote:
> On Mon, Nov 14, 2016 at 08:57:12AM -0800, Paul E. McKenney wrote:
> > Expedited grace periods check dyntick-idle state, and avoid sending
> > IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
> > kernels, nohz_full CPUs. However, the kernel has been observed checking
> > a CPU while it was non-idle, but sending the IPI after it has gone
> > idle. This commit therefore rechecks idle state immediately before
> > sending the IPI, refraining from IPIing CPUs that have since gone idle.
> >
> > Reported-by: Rik van Riel <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> atomic_add_return(0, ...) seems odd. Do you actually want that, rather
> than atomic_read(...)? If so, can you please document exactly why?
Yes that is weird. The only effective difference is that it would do a
load-exclusive instead of a regular load.
On Mon, Nov 14, 2016 at 06:37:33PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 14, 2016 at 09:25:12AM -0800, Josh Triplett wrote:
> > On Mon, Nov 14, 2016 at 08:57:12AM -0800, Paul E. McKenney wrote:
> > > Expedited grace periods check dyntick-idle state, and avoid sending
> > > IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
> > > kernels, nohz_full CPUs. However, the kernel has been observed checking
> > > a CPU while it was non-idle, but sending the IPI after it has gone
> > > idle. This commit therefore rechecks idle state immediately before
> > > sending the IPI, refraining from IPIing CPUs that have since gone idle.
> > >
> > > Reported-by: Rik van Riel <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > atomic_add_return(0, ...) seems odd. Do you actually want that, rather
> > than atomic_read(...)? If so, can you please document exactly why?
>
> Yes that is weird. The only effective difference is that it would do a
> load-exclusive instead of a regular load.
It is weird, and checking to see if it is safe to convert it and its
friends to something with less overhead is on my list. This starts
with a patch series I will post soon that consolidates all these
atomic_add_return() calls into a single function, which will ease testing
and other verification.
All that aside, please keep in mind that much is required from this load.
It is part of a network of ordered operations that guarantee that any
operation from any CPU preceding a given grace period is seen to precede
any other operation from any CPU following that same grace period.
And each and every CPU must agree on the order of those two operations,
otherwise, RCU is broken.
In addition, please note also that these operations are nowhere near
any fastpaths.
In fact, the specific operation you are concerned about is in an expedited
grace period, which has significant overhead. This this added IPI can
substitute for an IPI, so, believe it or not, is an optimization.
Thanx, Paul
On Mon, Nov 14, 2016 at 09:21:10AM -0800, Josh Triplett wrote:
> On Mon, Nov 14, 2016 at 08:57:11AM -0800, Paul E. McKenney wrote:
> > Although rcutorture will occasionally do a 50-millisecond grace-period
> > delay, these delays are quite rare. And rightly so, because otherwise
> > the read rate would be quite low. Thie means that it can be important
> > to identify whether or not a given run contained a long-delay read.
> > This commit therefore inserts a trace_rcu_torture_read() event to flag
> > runs containing long delays.
> >
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> A couple of apparent typos below. With those fixed:
> Reviewed-by: Josh Triplett <[email protected]>
>
> > include/trace/events/rcu.h | 5 ++++-
> > kernel/rcu/rcutorture.c | 11 ++++++++++-
> > 2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index d3e756539d44..b31e05bc8e26 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -698,7 +698,10 @@ TRACE_EVENT(rcu_batch_end,
> > /*
> > * Tracepoint for rcutorture readers. The first argument is the name
> > * of the RCU flavor from rcutorture's viewpoint and the second argument
> > - * is the callback address.
> > + * is the callback address. The third callback is the start time in
> > + * seconds, and the last two arguments are the grace period numbers
> > + * and the beginning and end of the read, respectively. Note that the
> > + * callback address can be NULL.
>
> s/third callback/third argument/?
Good catch, fixed!
> Also, s/and the beginning/of the beginning/?
Let's see... "the last two arguments are the grace period numbers and
the beginning and end of the read, respectively." -ENONSENSE for sure.
I believe that the "and" needs to become "at" as follows:
"the last two arguments are the grace period numbers at the beginning
and end of the read, respectively."
Does that help?
Thanx, Paul
> > TRACE_EVENT(rcu_torture_read,
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index bf08fee53dc7..87c51225ceec 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -289,15 +289,24 @@ static int rcu_torture_read_lock(void) __acquires(RCU)
> >
> > static void rcu_read_delay(struct torture_random_state *rrsp)
> > {
> > + unsigned long started;
> > + unsigned long completed;
> > const unsigned long shortdelay_us = 200;
> > const unsigned long longdelay_ms = 50;
> > + unsigned long long ts;
> >
> > /* We want a short delay sometimes to make a reader delay the grace
> > * period, and we want a long delay occasionally to trigger
> > * force_quiescent_state. */
> >
> > - if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms)))
> > + if (!(torture_random(rrsp) % (nrealreaders * 2000 * longdelay_ms))) {
> > + started = cur_ops->completed();
> > + ts = rcu_trace_clock_local();
> > mdelay(longdelay_ms);
> > + completed = cur_ops->completed();
> > + do_trace_rcu_torture_read(cur_ops->name, NULL, ts,
> > + started, completed);
> > + }
> > if (!(torture_random(rrsp) % (nrealreaders * 2 * shortdelay_us)))
> > udelay(shortdelay_us);
> > #ifdef CONFIG_PREEMPT
> > --
> > 2.5.2
> >
>
On Mon, Nov 14, 2016 at 10:44:21AM -0800, Paul E. McKenney wrote:
> On Mon, Nov 14, 2016 at 09:21:10AM -0800, Josh Triplett wrote:
> > On Mon, Nov 14, 2016 at 08:57:11AM -0800, Paul E. McKenney wrote:
> > > Although rcutorture will occasionally do a 50-millisecond grace-period
> > > delay, these delays are quite rare. And rightly so, because otherwise
> > > the read rate would be quite low. Thie means that it can be important
> > > to identify whether or not a given run contained a long-delay read.
> > > This commit therefore inserts a trace_rcu_torture_read() event to flag
> > > runs containing long delays.
> > >
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > A couple of apparent typos below. With those fixed:
> > Reviewed-by: Josh Triplett <[email protected]>
> >
> > > include/trace/events/rcu.h | 5 ++++-
> > > kernel/rcu/rcutorture.c | 11 ++++++++++-
> > > 2 files changed, 14 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index d3e756539d44..b31e05bc8e26 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -698,7 +698,10 @@ TRACE_EVENT(rcu_batch_end,
> > > /*
> > > * Tracepoint for rcutorture readers. The first argument is the name
> > > * of the RCU flavor from rcutorture's viewpoint and the second argument
> > > - * is the callback address.
> > > + * is the callback address. The third callback is the start time in
> > > + * seconds, and the last two arguments are the grace period numbers
> > > + * and the beginning and end of the read, respectively. Note that the
> > > + * callback address can be NULL.
> >
> > s/third callback/third argument/?
>
> Good catch, fixed!
>
> > Also, s/and the beginning/of the beginning/?
>
> Let's see... "the last two arguments are the grace period numbers and
> the beginning and end of the read, respectively." -ENONSENSE for sure.
>
> I believe that the "and" needs to become "at" as follows:
>
> "the last two arguments are the grace period numbers at the beginning
> and end of the read, respectively."
>
> Does that help?
That works, yes.
On Mon, Nov 14, 2016 at 10:12:37AM -0800, Paul E. McKenney wrote:
> On Mon, Nov 14, 2016 at 06:37:33PM +0100, Peter Zijlstra wrote:
> > On Mon, Nov 14, 2016 at 09:25:12AM -0800, Josh Triplett wrote:
> > > On Mon, Nov 14, 2016 at 08:57:12AM -0800, Paul E. McKenney wrote:
> > > > Expedited grace periods check dyntick-idle state, and avoid sending
> > > > IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
> > > > kernels, nohz_full CPUs. However, the kernel has been observed checking
> > > > a CPU while it was non-idle, but sending the IPI after it has gone
> > > > idle. This commit therefore rechecks idle state immediately before
> > > > sending the IPI, refraining from IPIing CPUs that have since gone idle.
> > > >
> > > > Reported-by: Rik van Riel <[email protected]>
> > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > atomic_add_return(0, ...) seems odd. Do you actually want that, rather
> > > than atomic_read(...)? If so, can you please document exactly why?
> >
> > Yes that is weird. The only effective difference is that it would do a
> > load-exclusive instead of a regular load.
>
> It is weird, and checking to see if it is safe to convert it and its
> friends to something with less overhead is on my list. This starts
> with a patch series I will post soon that consolidates all these
> atomic_add_return() calls into a single function, which will ease testing
> and other verification.
>
> All that aside, please keep in mind that much is required from this load.
> It is part of a network of ordered operations that guarantee that any
> operation from any CPU preceding a given grace period is seen to precede
> any other operation from any CPU following that same grace period.
> And each and every CPU must agree on the order of those two operations,
> otherwise, RCU is broken.
OK, so something similar to:
smp_mb();
atomic_read();
then? That would order, with global transitivity, against prior
operations.
> In addition, please note also that these operations are nowhere near
> any fastpaths.
My concern is mostly that it reads very weird. I appreciate this not
being fast path code, but confusing code is bad in any form.
On Tue, Nov 15, 2016 at 09:16:55AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 14, 2016 at 10:12:37AM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 14, 2016 at 06:37:33PM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 14, 2016 at 09:25:12AM -0800, Josh Triplett wrote:
> > > > On Mon, Nov 14, 2016 at 08:57:12AM -0800, Paul E. McKenney wrote:
> > > > > Expedited grace periods check dyntick-idle state, and avoid sending
> > > > > IPIs to idle CPUs, including those running guest OSes, and, on NOHZ_FULL
> > > > > kernels, nohz_full CPUs. However, the kernel has been observed checking
> > > > > a CPU while it was non-idle, but sending the IPI after it has gone
> > > > > idle. This commit therefore rechecks idle state immediately before
> > > > > sending the IPI, refraining from IPIing CPUs that have since gone idle.
> > > > >
> > > > > Reported-by: Rik van Riel <[email protected]>
> > > > > Signed-off-by: Paul E. McKenney <[email protected]>
> > > >
> > > > atomic_add_return(0, ...) seems odd. Do you actually want that, rather
> > > > than atomic_read(...)? If so, can you please document exactly why?
> > >
> > > Yes that is weird. The only effective difference is that it would do a
> > > load-exclusive instead of a regular load.
> >
> > It is weird, and checking to see if it is safe to convert it and its
> > friends to something with less overhead is on my list. This starts
> > with a patch series I will post soon that consolidates all these
> > atomic_add_return() calls into a single function, which will ease testing
> > and other verification.
> >
> > All that aside, please keep in mind that much is required from this load.
> > It is part of a network of ordered operations that guarantee that any
> > operation from any CPU preceding a given grace period is seen to precede
> > any other operation from any CPU following that same grace period.
> > And each and every CPU must agree on the order of those two operations,
> > otherwise, RCU is broken.
>
> OK, so something similar to:
>
> smp_mb();
> atomic_read();
>
> then? That would order, with global transitivity, against prior
> operations.
Maybe. The consolidation in the later patch series is a first step
towards potential weakening.
> > In addition, please note also that these operations are nowhere near
> > any fastpaths.
>
> My concern is mostly that it reads very weird. I appreciate this not
> being fast path code, but confusing code is bad in any form.
It is the long-standing code that has been checking dyntick-idle counters
for quite some time. Just applying that same code to a new use case
in within the expedited grace periods, as you can see by looking a bit
earlier in that same function.
Thanx, Paul