2020-06-18 22:13:39

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

rcu_segcblist_accelerate() returns true if a GP is to be
started/requested and false if not. During tracing, I found that it is
asking that GPs be requested

The exact flow seems to be something like:
1. Callbacks are queued on CPU A - into the NEXT list.
2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU A.
4. CPU A's softirq runs again presumably because of #3.
5. CPU A's softirq now does acceleration again, this time no CBs are
accelerated since last attempt, but it still requests GP X+1 which
could be useless.

The fix is, prevent the useless GP start if we detect no CBs are there
to accelerate.

With this, we have the following improvement in short runs of
rcutorture (5 seconds each):
+----+-------------------+-------------------+
| # | Number of GPs | Number of Wakeups |
+====+=========+=========+=========+=========+
| 1 | With | Without | With | Without |
+----+---------+---------+---------+---------+
| 2 | 75 | 89 | 113 | 119 |
+----+---------+---------+---------+---------+
| 3 | 62 | 91 | 105 | 123 |
+----+---------+---------+---------+---------+
| 4 | 60 | 79 | 98 | 110 |
+----+---------+---------+---------+---------+
| 5 | 63 | 79 | 99 | 112 |
+----+---------+---------+---------+---------+
| 6 | 57 | 89 | 96 | 123 |
+----+---------+---------+---------+---------+
| 7 | 64 | 85 | 97 | 118 |
+----+---------+---------+---------+---------+
| 8 | 58 | 83 | 98 | 113 |
+----+---------+---------+---------+---------+
| 9 | 57 | 77 | 89 | 104 |
+----+---------+---------+---------+---------+
| 10 | 66 | 82 | 98 | 119 |
+----+---------+---------+---------+---------+
| 11 | 52 | 82 | 83 | 117 |
+----+---------+---------+---------+---------+

Cc: [email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/rcu_segcblist.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9a0f66133b4b3..4782cf17bf4f9 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -475,8 +475,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
* Also advance to the oldest segment of callbacks whose
* ->gp_seq[] completion is at or after that passed in via "seq",
* skipping any empty segments.
+ *
+ * Note that "i" is the youngest segment of the list after which
+ * any older segments than "i" would not be mutated or assigned
+ * GPs. For example, if i == WAIT_TAIL, then neither WAIT_TAIL,
+ * nor DONE_TAIL will be touched. Only CBs in NEXT_TAIL will be
+ * merged with NEXT_READY_TAIL and the GP numbers of both of
+ * them would be updated.
*/
- if (++i >= RCU_NEXT_TAIL)
+ if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
return false;

/*
--
2.27.0.111.gc72c7da667-goog


2020-06-18 22:13:57

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

The gp_seq value can come from either of rdp, rsp or rnp.

Only the rsp is the global source of truth (most accurate GP info). The
rnp can be off by ~1 and the rdp can be off by way more. Add some more
context to traces to clarify where it comes from.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
include/trace/events/rcu.h | 12 ++++++++----
kernel/rcu/tree.c | 32 ++++++++++++++++----------------
kernel/rcu/tree_plugin.h | 2 +-
3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index a6d49864dcc27..cb5363564f7ed 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
*/
TRACE_EVENT_RCU(rcu_grace_period,

- TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
+ TP_PROTO(const char *rcuname, const char *gp_seq_src,
+ unsigned long gp_seq, const char *gpevent),

- TP_ARGS(rcuname, gp_seq, gpevent),
+ TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),

TP_STRUCT__entry(
__field(const char *, rcuname)
+ __field(const char *, gp_seq_src)
__field(unsigned long, gp_seq)
__field(const char *, gpevent)
),

TP_fast_assign(
__entry->rcuname = rcuname;
+ __entry->gp_seq_src = gp_seq_src;
__entry->gp_seq = gp_seq;
__entry->gpevent = gpevent;
),

- TP_printk("%s %lu %s",
- __entry->rcuname, __entry->gp_seq, __entry->gpevent)
+ TP_printk("%s %s_gp_seq=%lu %s",
+ __entry->rcuname, __entry->gp_seq_src,
+ __entry->gp_seq, __entry->gpevent)
);

/*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index c61af6a33fbfd..81df1b837dd9d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
goto unlock_out;
}
- trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
+ trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
ret = true; /* Caller must wake GP kthread. */
unlock_out:
/* Push furthest requested GP to leaf node and rcu_data structure. */
@@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)

/* Trace depending on how much we were able to accelerate. */
if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
- trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
+ trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
else
- trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
+ trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));

/* Count CBs for tracing. */
rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
@@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
if (!offloaded)
ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
rdp->core_needs_qs = false;
- trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
+ trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
} else {
if (!offloaded)
ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
@@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
* set up to detect a quiescent state, otherwise don't
* go looking for one.
*/
- trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
+ trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
need_qs = !!(rnp->qsmask & rdp->grpmask);
rdp->cpu_no_qs.b.norm = need_qs;
rdp->core_needs_qs = need_qs;
@@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
/* Record GP times before starting GP, hence rcu_seq_start(). */
rcu_seq_start(&rcu_state.gp_seq);
ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
raw_spin_unlock_irq_rcu_node(rnp);

/*
@@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
jiffies + (j ? 3 * j : 2));
}
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
TPS("fqswait"));
rcu_state.gp_state = RCU_GP_WAIT_FQS;
ret = swait_event_idle_timeout_exclusive(
@@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
(gf & RCU_GP_FLAG_FQS)) {
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
TPS("fqsstart"));
rcu_gp_fqs(first_gp_fqs);
gf = 0;
@@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
first_gp_fqs = false;
gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
}
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
TPS("fqsend"));
cond_resched_tasks_rcu_qs();
WRITE_ONCE(rcu_state.gp_activity, jiffies);
@@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
cond_resched_tasks_rcu_qs();
WRITE_ONCE(rcu_state.gp_activity, jiffies);
WARN_ON(signal_pending(current));
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
TPS("fqswaitsig"));
ret = 1; /* Keep old FQS timing. */
j = jiffies;
@@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */

/* Declare grace period done, trace first to use old GP number. */
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
rcu_seq_end(&rcu_state.gp_seq);
ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
rcu_state.gp_state = RCU_GP_IDLE;
@@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
- trace_rcu_grace_period(rcu_state.name,
+ trace_rcu_grace_period(rcu_state.name, "rsp",
rcu_state.gp_seq,
TPS("newreq"));
} else {
@@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)

/* Handle grace-period start. */
for (;;) {
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
TPS("reqwait"));
rcu_state.gp_state = RCU_GP_WAIT_GPS;
swait_event_idle_exclusive(rcu_state.gp_wq,
@@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
cond_resched_tasks_rcu_qs();
WRITE_ONCE(rcu_state.gp_activity, jiffies);
WARN_ON(signal_pending(current));
- trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
TPS("reqwaitsig"));
}

@@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
return 0;

blkd = !!(rnp->qsmask & rdp->grpmask);
- trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
+ trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
return 0;
}
@@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
rdp->core_needs_qs = false;
rdp->rcu_iw_pending = false;
rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
- trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
+ trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
rcu_prepare_kthreads(cpu);
rcu_spawn_cpu_nocb_kthread(cpu);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 982fc5be52698..32f761cf16c33 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -262,7 +262,7 @@ static void rcu_qs(void)
{
RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
- trace_rcu_grace_period(TPS("rcu_preempt"),
+ trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
__this_cpu_read(rcu_data.gp_seq),
TPS("cpuqs"));
__this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
--
2.27.0.111.gc72c7da667-goog

2020-06-18 22:28:39

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint

During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is
the value used for acceleration - it is the value of gp_seq at which it
is safe the execute all callbacks in the callback list.

The rdp's gp_seq is not very useful for this scenario. Make
rcu_grace_period report the rsp's gp_seq instead as it allows one to
reason about how the acceleration works.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 81df1b837dd9d..c3bae7a83d792 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)

/* Trace depending on how much we were able to accelerate. */
if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
- trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccWaitCB"));
else
- trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
+ trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccReadyCB"));

/* Count CBs for tracing. */
rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
--
2.27.0.111.gc72c7da667-goog

2020-06-18 22:54:05

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:

First, this looks like a very nice optimization, thank you!

> rcu_segcblist_accelerate() returns true if a GP is to be
> started/requested and false if not. During tracing, I found that it is
> asking that GPs be requested

s/GPs/unnecessary GPs/? Plus "." at end of the sentence.

> The exact flow seems to be something like:
> 1. Callbacks are queued on CPU A - into the NEXT list.
> 2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
> 3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU A.
> 4. CPU A's softirq runs again presumably because of #3.

Yes, that is one reason RCU softirq might run again.

> 5. CPU A's softirq now does acceleration again, this time no CBs are
> accelerated since last attempt, but it still requests GP X+1 which
> could be useless.

I can't think of a case where this request helps. How about: "but
it still unnecessarily requests GP X+1"?

> The fix is, prevent the useless GP start if we detect no CBs are there
> to accelerate.
>
> With this, we have the following improvement in short runs of
> rcutorture (5 seconds each):
> +----+-------------------+-------------------+
> | # | Number of GPs | Number of Wakeups |
> +====+=========+=========+=========+=========+
> | 1 | With | Without | With | Without |
> +----+---------+---------+---------+---------+
> | 2 | 75 | 89 | 113 | 119 |
> +----+---------+---------+---------+---------+
> | 3 | 62 | 91 | 105 | 123 |
> +----+---------+---------+---------+---------+
> | 4 | 60 | 79 | 98 | 110 |
> +----+---------+---------+---------+---------+
> | 5 | 63 | 79 | 99 | 112 |
> +----+---------+---------+---------+---------+
> | 6 | 57 | 89 | 96 | 123 |
> +----+---------+---------+---------+---------+
> | 7 | 64 | 85 | 97 | 118 |
> +----+---------+---------+---------+---------+
> | 8 | 58 | 83 | 98 | 113 |
> +----+---------+---------+---------+---------+
> | 9 | 57 | 77 | 89 | 104 |
> +----+---------+---------+---------+---------+
> | 10 | 66 | 82 | 98 | 119 |
> +----+---------+---------+---------+---------+
> | 11 | 52 | 82 | 83 | 117 |
> +----+---------+---------+---------+---------+

So the reductions in wakeups ranges from 5% to 40%, with almost a 20%
overall reduction in wakeups across all the runs. That should be of
some use to someone. ;-)

I do run rcutorture quite a bit, but is there a more real-world
benchmark that could be tried?

> Cc: [email protected]
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/rcu_segcblist.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f66133b4b3..4782cf17bf4f9 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -475,8 +475,15 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
> * Also advance to the oldest segment of callbacks whose
> * ->gp_seq[] completion is at or after that passed in via "seq",
> * skipping any empty segments.
> + *
> + * Note that "i" is the youngest segment of the list after which
> + * any older segments than "i" would not be mutated or assigned
> + * GPs. For example, if i == WAIT_TAIL, then neither WAIT_TAIL,
> + * nor DONE_TAIL will be touched. Only CBs in NEXT_TAIL will be
> + * merged with NEXT_READY_TAIL and the GP numbers of both of
> + * them would be updated.

In this case, only the GP number of NEXT_READY_TAIL would be updated,
correct? Or am I missing something subtle in the loop just past the
end of this patch?

Thanx, Paul

> */
> - if (++i >= RCU_NEXT_TAIL)
> + if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
> return false;
>
> /*
> --
> 2.27.0.111.gc72c7da667-goog
>

2020-06-18 23:06:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint

On Thu, Jun 18, 2020 at 04:29:53PM -0400, Joel Fernandes (Google) wrote:
> During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is
> the value used for acceleration - it is the value of gp_seq at which it
> is safe the execute all callbacks in the callback list.
>
> The rdp's gp_seq is not very useful for this scenario. Make
> rcu_grace_period report the rsp's gp_seq instead as it allows one to
> reason about how the acceleration works.

Good catch, but please instead trace the gp_seq_req local variable.

Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/tree.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 81df1b837dd9d..c3bae7a83d792 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>
> /* Trace depending on how much we were able to accelerate. */
> if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> - trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccWaitCB"));
> else
> - trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccReadyCB"));
>
> /* Count CBs for tracing. */
> rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> --
> 2.27.0.111.gc72c7da667-goog
>

2020-06-18 23:21:55

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

On Thu, Jun 18, 2020 at 03:11:19PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:
>
> First, this looks like a very nice optimization, thank you!
>
> > rcu_segcblist_accelerate() returns true if a GP is to be
> > started/requested and false if not. During tracing, I found that it is
> > asking that GPs be requested
>
> s/GPs/unnecessary GPs/? Plus "." at end of the sentence.
>
> > The exact flow seems to be something like:
> > 1. Callbacks are queued on CPU A - into the NEXT list.
> > 2. softirq runs on CPU A, accelerate all CBs from NEXT->WAIT and request a GP X.
> > 3. GP thread wakes up on another CPU, starts the GP X and requests QS from CPU A.
> > 4. CPU A's softirq runs again presumably because of #3.
>
> Yes, that is one reason RCU softirq might run again.
>
> > 5. CPU A's softirq now does acceleration again, this time no CBs are
> > accelerated since last attempt, but it still requests GP X+1 which
> > could be useless.
>
> I can't think of a case where this request helps. How about: "but
> it still unnecessarily requests GP X+1"?
>
> > The fix is, prevent the useless GP start if we detect no CBs are there
> > to accelerate.
> >
> > With this, we have the following improvement in short runs of
> > rcutorture (5 seconds each):
> > +----+-------------------+-------------------+
> > | # | Number of GPs | Number of Wakeups |
> > +====+=========+=========+=========+=========+
> > | 1 | With | Without | With | Without |
> > +----+---------+---------+---------+---------+
> > | 2 | 75 | 89 | 113 | 119 |
> > +----+---------+---------+---------+---------+
> > | 3 | 62 | 91 | 105 | 123 |
> > +----+---------+---------+---------+---------+
> > | 4 | 60 | 79 | 98 | 110 |
> > +----+---------+---------+---------+---------+
> > | 5 | 63 | 79 | 99 | 112 |
> > +----+---------+---------+---------+---------+
> > | 6 | 57 | 89 | 96 | 123 |
> > +----+---------+---------+---------+---------+
> > | 7 | 64 | 85 | 97 | 118 |
> > +----+---------+---------+---------+---------+
> > | 8 | 58 | 83 | 98 | 113 |
> > +----+---------+---------+---------+---------+
> > | 9 | 57 | 77 | 89 | 104 |
> > +----+---------+---------+---------+---------+
> > | 10 | 66 | 82 | 98 | 119 |
> > +----+---------+---------+---------+---------+
> > | 11 | 52 | 82 | 83 | 117 |
> > +----+---------+---------+---------+---------+
>
> So the reductions in wakeups ranges from 5% to 40%, with almost a 20%
> overall reduction in wakeups across all the runs. That should be of
> some use to someone. ;-)
>
> I do run rcutorture quite a bit, but is there a more real-world
> benchmark that could be tried?
>
> > Cc: [email protected]
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>

As discussed over IRC, I updated the patch as shown below. Does that
work for you?

Thanx, Paul

------------------------------------------------------------------------

commit ec037e1f438074eb16fd68a63d699fc419c9ba0c
Author: Joel Fernandes (Google) <[email protected]>
Date: Thu Jun 18 16:29:49 2020 -0400

rcu/segcblist: Prevent useless GP start if no CBs to accelerate

The rcu_segcblist_accelerate() function returns true iff it is necessary
to request another grace period. A tracing session showed that this
function unnecessarily requests grace periods.

For exmaple, consider the following sequence of events:
1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
there are no new callbacks. However, rcu_segcblist_accelerate()
nevertheless (uselessly) requests a new grace period N+1.

This extra grace period results in additional lock contention and also
additional wakeups, all for no good reason.

This commit therefore adds a check to rcu_segcblist_accelerate() that
prevents the return of true when there are no new callbacks.

This change reduces the number of grace periods (GPs) and wakeups in each
of eleven five-second rcutorture runs as follows:

+----+-------------------+-------------------+
| # | Number of GPs | Number of Wakeups |
+====+=========+=========+=========+=========+
| 1 | With | Without | With | Without |
+----+---------+---------+---------+---------+
| 2 | 75 | 89 | 113 | 119 |
+----+---------+---------+---------+---------+
| 3 | 62 | 91 | 105 | 123 |
+----+---------+---------+---------+---------+
| 4 | 60 | 79 | 98 | 110 |
+----+---------+---------+---------+---------+
| 5 | 63 | 79 | 99 | 112 |
+----+---------+---------+---------+---------+
| 6 | 57 | 89 | 96 | 123 |
+----+---------+---------+---------+---------+
| 7 | 64 | 85 | 97 | 118 |
+----+---------+---------+---------+---------+
| 8 | 58 | 83 | 98 | 113 |
+----+---------+---------+---------+---------+
| 9 | 57 | 77 | 89 | 104 |
+----+---------+---------+---------+---------+
| 10 | 66 | 82 | 98 | 119 |
+----+---------+---------+---------+---------+
| 11 | 52 | 82 | 83 | 117 |
+----+---------+---------+---------+---------+

The reduction in the number of wakeups ranges from 5% to 40%.

Cc: [email protected]
[ paulmck: Rework commit log and comment. ]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 9a0f661..2d2a6b6b9 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -475,8 +475,16 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
* Also advance to the oldest segment of callbacks whose
* ->gp_seq[] completion is at or after that passed in via "seq",
* skipping any empty segments.
+ *
+ * Note that segment "i" (and any lower-numbered segments
+ * containing older callbacks) will be unaffected, and their
+ * grace-period numbers remain unchanged. For example, if i ==
+ * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
+ * Instead, the CBs in NEXT_TAIL will be merged with those in
+ * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
+ * would be updated. NEXT_TAIL would then be empty.
*/
- if (++i >= RCU_NEXT_TAIL)
+ if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
return false;

/*

2020-06-18 23:54:25

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > The gp_seq value can come from either of rdp, rsp or rnp.
> >
> > Only the rsp is the global source of truth (most accurate GP info). The
> > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > context to traces to clarify where it comes from.
>
> This would be better done in scripting that processes the trace messages.
> I must pass on this one.

I don't think so. I think people reading traces would get confused. Recently
I had to talk with someone about this as well when going over traces. I
myself struggle with it.

> For future reference, the TPS() around strings is not optional. Without
> it, trace messages from crash dumps are garbled, if I remember correctly.

Ok. I will fix that.

thanks,

- Joel

>
> Thanx, Paul
>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > include/trace/events/rcu.h | 12 ++++++++----
> > kernel/rcu/tree.c | 32 ++++++++++++++++----------------
> > kernel/rcu/tree_plugin.h | 2 +-
> > 3 files changed, 25 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > index a6d49864dcc27..cb5363564f7ed 100644
> > --- a/include/trace/events/rcu.h
> > +++ b/include/trace/events/rcu.h
> > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > */
> > TRACE_EVENT_RCU(rcu_grace_period,
> >
> > - TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > + TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > + unsigned long gp_seq, const char *gpevent),
> >
> > - TP_ARGS(rcuname, gp_seq, gpevent),
> > + TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> >
> > TP_STRUCT__entry(
> > __field(const char *, rcuname)
> > + __field(const char *, gp_seq_src)
> > __field(unsigned long, gp_seq)
> > __field(const char *, gpevent)
> > ),
> >
> > TP_fast_assign(
> > __entry->rcuname = rcuname;
> > + __entry->gp_seq_src = gp_seq_src;
> > __entry->gp_seq = gp_seq;
> > __entry->gpevent = gpevent;
> > ),
> >
> > - TP_printk("%s %lu %s",
> > - __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > + TP_printk("%s %s_gp_seq=%lu %s",
> > + __entry->rcuname, __entry->gp_seq_src,
> > + __entry->gp_seq, __entry->gpevent)
> > );
> >
> > /*
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index c61af6a33fbfd..81df1b837dd9d 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > goto unlock_out;
> > }
> > - trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > + trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > ret = true; /* Caller must wake GP kthread. */
> > unlock_out:
> > /* Push furthest requested GP to leaf node and rcu_data structure. */
> > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >
> > /* Trace depending on how much we were able to accelerate. */
> > if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > else
> > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> >
> > /* Count CBs for tracing. */
> > rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > if (!offloaded)
> > ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > rdp->core_needs_qs = false;
> > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > } else {
> > if (!offloaded)
> > ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > * set up to detect a quiescent state, otherwise don't
> > * go looking for one.
> > */
> > - trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > + trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > need_qs = !!(rnp->qsmask & rdp->grpmask);
> > rdp->cpu_no_qs.b.norm = need_qs;
> > rdp->core_needs_qs = need_qs;
> > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > rcu_seq_start(&rcu_state.gp_seq);
> > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > raw_spin_unlock_irq_rcu_node(rnp);
> >
> > /*
> > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > jiffies + (j ? 3 * j : 2));
> > }
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > TPS("fqswait"));
> > rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > ret = swait_event_idle_timeout_exclusive(
> > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > /* If time for quiescent-state forcing, do it. */
> > if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > (gf & RCU_GP_FLAG_FQS)) {
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > TPS("fqsstart"));
> > rcu_gp_fqs(first_gp_fqs);
> > gf = 0;
> > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > first_gp_fqs = false;
> > gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > }
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > TPS("fqsend"));
> > cond_resched_tasks_rcu_qs();
> > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > cond_resched_tasks_rcu_qs();
> > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > WARN_ON(signal_pending(current));
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > TPS("fqswaitsig"));
> > ret = 1; /* Keep old FQS timing. */
> > j = jiffies;
> > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> >
> > /* Declare grace period done, trace first to use old GP number. */
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > rcu_seq_end(&rcu_state.gp_seq);
> > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > rcu_state.gp_state = RCU_GP_IDLE;
> > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > - trace_rcu_grace_period(rcu_state.name,
> > + trace_rcu_grace_period(rcu_state.name, "rsp",
> > rcu_state.gp_seq,
> > TPS("newreq"));
> > } else {
> > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> >
> > /* Handle grace-period start. */
> > for (;;) {
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > TPS("reqwait"));
> > rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > swait_event_idle_exclusive(rcu_state.gp_wq,
> > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > cond_resched_tasks_rcu_qs();
> > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > WARN_ON(signal_pending(current));
> > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > TPS("reqwaitsig"));
> > }
> >
> > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > return 0;
> >
> > blkd = !!(rnp->qsmask & rdp->grpmask);
> > - trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > + trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > return 0;
> > }
> > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > rdp->core_needs_qs = false;
> > rdp->rcu_iw_pending = false;
> > rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > rcu_prepare_kthreads(cpu);
> > rcu_spawn_cpu_nocb_kthread(cpu);
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 982fc5be52698..32f761cf16c33 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > {
> > RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > - trace_rcu_grace_period(TPS("rcu_preempt"),
> > + trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > __this_cpu_read(rcu_data.gp_seq),
> > TPS("cpuqs"));
> > __this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > --
> > 2.27.0.111.gc72c7da667-goog
> >

2020-06-19 00:00:28

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 5/7] rcu/trace: Use rsp's gp_seq in acceleration's rcu_grace_period tracepoint

On Thu, Jun 18, 2020 at 03:27:21PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 04:29:53PM -0400, Joel Fernandes (Google) wrote:
> > During acceleration of CB, the rsp's gp_seq is rcu_seq_snap'd. This is
> > the value used for acceleration - it is the value of gp_seq at which it
> > is safe the execute all callbacks in the callback list.
> >
> > The rdp's gp_seq is not very useful for this scenario. Make
> > rcu_grace_period report the rsp's gp_seq instead as it allows one to
> > reason about how the acceleration works.
>
> Good catch, but please instead trace the gp_seq_req local variable.

Oh sure, I will do that. Actually I was planning to, but was wondering if it
would be inconsistent with other trace_rcu_grace_period() invocations which
pass a real gp_seq (one that is as old as rcu_state.gp_seq). But agreed, this
is better since the acceleration uses it.

thanks,

- Joel

> Thanx, Paul
>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/rcu/tree.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 81df1b837dd9d..c3bae7a83d792 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> >
> > /* Trace depending on how much we were able to accelerate. */
> > if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > - trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccWaitCB"));
> > else
> > - trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("AccReadyCB"));
> >
> > /* Count CBs for tracing. */
> > rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > --
> > 2.27.0.111.gc72c7da667-goog
> >

2020-06-19 00:04:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, 18 Jun 2020 15:19:01 -0700
"Paul E. McKenney" <[email protected]> wrote:

> For future reference, the TPS() around strings is not optional. Without
> it, trace messages from crash dumps are garbled, if I remember correctly.

When you pass in a string like this, only the pointer to the string is
saved in the ring buffer. User space tools have no idea what those
pointers are. The TPS() around strings maps those pointers to the
string and shows them in the /sys/kernel/tracing/printk_formats file,
such that perf and trace-cmd know how to make sense of those strings.

-- Steve

2020-06-19 00:06:59

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 1/7] rcu/segcblist: Prevent useless GP start if no CBs to accelerate

On Thu, Jun 18, 2020 at 04:09:34PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 03:11:19PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 04:29:49PM -0400, Joel Fernandes (Google) wrote:
> >
> > First, this looks like a very nice optimization, thank you!

Thanks!

> > > Cc: [email protected]
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
>
> As discussed over IRC, I updated the patch as shown below. Does that
> work for you?

Yes, that works for me. Thanks!

- Joel


> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit ec037e1f438074eb16fd68a63d699fc419c9ba0c
> Author: Joel Fernandes (Google) <[email protected]>
> Date: Thu Jun 18 16:29:49 2020 -0400
>
> rcu/segcblist: Prevent useless GP start if no CBs to accelerate
>
> The rcu_segcblist_accelerate() function returns true iff it is necessary
> to request another grace period. A tracing session showed that this
> function unnecessarily requests grace periods.
>
> For exmaple, consider the following sequence of events:
> 1. Callbacks are queued only on the NEXT segment of CPU A's callback list.
> 2. CPU A runs RCU_SOFTIRQ, accelerating these callbacks from NEXT to WAIT.
> 3. Thus rcu_segcblist_accelerate() returns true, requesting grace period N.
> 4. RCU's grace-period kthread wakes up on CPU B and starts grace period N.
> 4. CPU A notices the new grace period and invokes RCU_SOFTIRQ.
> 5. CPU A's RCU_SOFTIRQ again invokes rcu_segcblist_accelerate(), but
> there are no new callbacks. However, rcu_segcblist_accelerate()
> nevertheless (uselessly) requests a new grace period N+1.
>
> This extra grace period results in additional lock contention and also
> additional wakeups, all for no good reason.
>
> This commit therefore adds a check to rcu_segcblist_accelerate() that
> prevents the return of true when there are no new callbacks.
>
> This change reduces the number of grace periods (GPs) and wakeups in each
> of eleven five-second rcutorture runs as follows:
>
> +----+-------------------+-------------------+
> | # | Number of GPs | Number of Wakeups |
> +====+=========+=========+=========+=========+
> | 1 | With | Without | With | Without |
> +----+---------+---------+---------+---------+
> | 2 | 75 | 89 | 113 | 119 |
> +----+---------+---------+---------+---------+
> | 3 | 62 | 91 | 105 | 123 |
> +----+---------+---------+---------+---------+
> | 4 | 60 | 79 | 98 | 110 |
> +----+---------+---------+---------+---------+
> | 5 | 63 | 79 | 99 | 112 |
> +----+---------+---------+---------+---------+
> | 6 | 57 | 89 | 96 | 123 |
> +----+---------+---------+---------+---------+
> | 7 | 64 | 85 | 97 | 118 |
> +----+---------+---------+---------+---------+
> | 8 | 58 | 83 | 98 | 113 |
> +----+---------+---------+---------+---------+
> | 9 | 57 | 77 | 89 | 104 |
> +----+---------+---------+---------+---------+
> | 10 | 66 | 82 | 98 | 119 |
> +----+---------+---------+---------+---------+
> | 11 | 52 | 82 | 83 | 117 |
> +----+---------+---------+---------+---------+
>
> The reduction in the number of wakeups ranges from 5% to 40%.
>
> Cc: [email protected]
> [ paulmck: Rework commit log and comment. ]
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index 9a0f661..2d2a6b6b9 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -475,8 +475,16 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq)
> * Also advance to the oldest segment of callbacks whose
> * ->gp_seq[] completion is at or after that passed in via "seq",
> * skipping any empty segments.
> + *
> + * Note that segment "i" (and any lower-numbered segments
> + * containing older callbacks) will be unaffected, and their
> + * grace-period numbers remain unchanged. For example, if i ==
> + * WAIT_TAIL, then neither WAIT_TAIL nor DONE_TAIL will be touched.
> + * Instead, the CBs in NEXT_TAIL will be merged with those in
> + * NEXT_READY_TAIL and the grace-period number of NEXT_READY_TAIL
> + * would be updated. NEXT_TAIL would then be empty.
> */
> - if (++i >= RCU_NEXT_TAIL)
> + if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL)
> return false;
>
> /*

2020-06-19 00:14:06

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH 4/7] rcu/trace: Print negative GP numbers correctly

GP numbers start from -300 and gp_seq numbers start of -1200 (for a
shift of 2). These negative numbers are printed as unsigned long which
not only takes up more text space, but is rather confusing to the reader
as they have to constantly expend energy to truncate the number. Just
print the negative numbering directly.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
include/trace/events/rcu.h | 62 ++++++++++++++++++++------------------
1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index cb5363564f7ed..bc24862790623 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -76,18 +76,18 @@ TRACE_EVENT_RCU(rcu_grace_period,
TP_STRUCT__entry(
__field(const char *, rcuname)
__field(const char *, gp_seq_src)
- __field(unsigned long, gp_seq)
+ __field(long, gp_seq)
__field(const char *, gpevent)
),

TP_fast_assign(
__entry->rcuname = rcuname;
__entry->gp_seq_src = gp_seq_src;
- __entry->gp_seq = gp_seq;
+ __entry->gp_seq = (long)gp_seq;
__entry->gpevent = gpevent;
),

- TP_printk("%s %s_gp_seq=%lu %s",
+ TP_printk("%s %s_gp_seq=%ld %s",
__entry->rcuname, __entry->gp_seq_src,
__entry->gp_seq, __entry->gpevent)
);
@@ -118,8 +118,8 @@ TRACE_EVENT_RCU(rcu_future_grace_period,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gp_seq)
- __field(unsigned long, gp_seq_req)
+ __field(long, gp_seq)
+ __field(long, gp_seq_req)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
@@ -128,16 +128,16 @@ TRACE_EVENT_RCU(rcu_future_grace_period,

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gp_seq = gp_seq;
- __entry->gp_seq_req = gp_seq_req;
+ __entry->gp_seq = (long)gp_seq;
+ __entry->gp_seq_req = (long)gp_seq_req;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
__entry->gpevent = gpevent;
),

- TP_printk("%s %lu %lu %u %d %d %s",
- __entry->rcuname, __entry->gp_seq, __entry->gp_seq_req, __entry->level,
+ TP_printk("%s %ld %ld %u %d %d %s",
+ __entry->rcuname, (long)__entry->gp_seq, (long)__entry->gp_seq_req, __entry->level,
__entry->grplo, __entry->grphi, __entry->gpevent)
);

@@ -157,7 +157,7 @@ TRACE_EVENT_RCU(rcu_grace_period_init,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gp_seq)
+ __field(long, gp_seq)
__field(u8, level)
__field(int, grplo)
__field(int, grphi)
@@ -166,14 +166,14 @@ TRACE_EVENT_RCU(rcu_grace_period_init,

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gp_seq = gp_seq;
+ __entry->gp_seq = (long)gp_seq;
__entry->level = level;
__entry->grplo = grplo;
__entry->grphi = grphi;
__entry->qsmask = qsmask;
),

- TP_printk("%s %lu %u %d %d %lx",
+ TP_printk("%s %ld %u %d %d %lx",
__entry->rcuname, __entry->gp_seq, __entry->level,
__entry->grplo, __entry->grphi, __entry->qsmask)
);
@@ -201,17 +201,17 @@ TRACE_EVENT_RCU(rcu_exp_grace_period,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gpseq)
+ __field(long, gpseq)
__field(const char *, gpevent)
),

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gpseq = gpseq;
+ __entry->gpseq = (long)gpseq;
__entry->gpevent = gpevent;
),

- TP_printk("%s %lu %s",
+ TP_printk("%s %ld %s",
__entry->rcuname, __entry->gpseq, __entry->gpevent)
);

@@ -320,17 +320,17 @@ TRACE_EVENT_RCU(rcu_preempt_task,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gp_seq)
+ __field(long, gp_seq)
__field(int, pid)
),

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gp_seq = gp_seq;
+ __entry->gp_seq = (long)gp_seq;
__entry->pid = pid;
),

- TP_printk("%s %lu %d",
+ TP_printk("%s %ld %d",
__entry->rcuname, __entry->gp_seq, __entry->pid)
);

@@ -347,17 +347,17 @@ TRACE_EVENT_RCU(rcu_unlock_preempted_task,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gp_seq)
+ __field(long, gp_seq)
__field(int, pid)
),

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gp_seq = gp_seq;
+ __entry->gp_seq = (long)gp_seq;
__entry->pid = pid;
),

- TP_printk("%s %lu %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
+ TP_printk("%s %ld %d", __entry->rcuname, __entry->gp_seq, __entry->pid)
);

/*
@@ -378,7 +378,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gp_seq)
+ __field(long, gp_seq)
__field(unsigned long, mask)
__field(unsigned long, qsmask)
__field(u8, level)
@@ -389,7 +389,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gp_seq = gp_seq;
+ __entry->gp_seq = (long)gp_seq;
__entry->mask = mask;
__entry->qsmask = qsmask;
__entry->level = level;
@@ -398,7 +398,7 @@ TRACE_EVENT_RCU(rcu_quiescent_state_report,
__entry->gp_tasks = gp_tasks;
),

- TP_printk("%s %lu %lx>%lx %u %d %d %u",
+ TP_printk("%s %ld %lx>%lx %u %d %d %u",
__entry->rcuname, __entry->gp_seq,
__entry->mask, __entry->qsmask, __entry->level,
__entry->grplo, __entry->grphi, __entry->gp_tasks)
@@ -419,19 +419,19 @@ TRACE_EVENT_RCU(rcu_fqs,

TP_STRUCT__entry(
__field(const char *, rcuname)
- __field(unsigned long, gp_seq)
+ __field(long, gp_seq)
__field(int, cpu)
__field(const char *, qsevent)
),

TP_fast_assign(
__entry->rcuname = rcuname;
- __entry->gp_seq = gp_seq;
+ __entry->gp_seq = (long)gp_seq;
__entry->cpu = cpu;
__entry->qsevent = qsevent;
),

- TP_printk("%s %lu %d %s",
+ TP_printk("%s %ld %d %s",
__entry->rcuname, __entry->gp_seq,
__entry->cpu, __entry->qsevent)
);
@@ -520,17 +520,19 @@ TRACE_EVENT_RCU(rcu_segcb,
TP_STRUCT__entry(
__field(const char *, ctx)
__array(int, cb_count, 4)
- __array(unsigned long, gp_seq, 4)
+ __array(long, gp_seq, 4)
),

TP_fast_assign(
+ int i;
__entry->ctx = ctx;
memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
- memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
+ for (i = 0; i < 4; i++)
+ __entry->gp_seq[i] = (long)(gp_seq[i]);
),

TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
- "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
+ "gp_seq: (DONE=%ld, WAIT=%ld, NEXT_READY=%ld, NEXT=%ld)", __entry->ctx,
__entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
__entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])

--
2.27.0.111.gc72c7da667-goog

2020-06-19 00:20:55

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH 2/7] rcu/trace: Add tracing for how segcb list changes

Track how the segcb list changes before/after acceleration, during
queuing and during dequeuing.

This has proved useful to discover an optimization to avoid unwanted GP
requests when there are no callbacks accelerated.

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
include/trace/events/rcu.h | 25 +++++++++++++++++++++++++
kernel/rcu/rcu_segcblist.c | 37 +++++++++++++++++++++++++++++++++++++
kernel/rcu/rcu_segcblist.h | 7 +++++++
kernel/rcu/tree.c | 24 ++++++++++++++++++++++++
4 files changed, 93 insertions(+)

diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index 02dcd119f3263..a6d49864dcc27 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -507,6 +507,31 @@ TRACE_EVENT_RCU(rcu_callback,
__entry->qlen)
);

+TRACE_EVENT_RCU(rcu_segcb,
+
+ TP_PROTO(const char *ctx, int *cb_count, unsigned long *gp_seq),
+
+ TP_ARGS(ctx, cb_count, gp_seq),
+
+ TP_STRUCT__entry(
+ __field(const char *, ctx)
+ __array(int, cb_count, 4)
+ __array(unsigned long, gp_seq, 4)
+ ),
+
+ TP_fast_assign(
+ __entry->ctx = ctx;
+ memcpy(__entry->cb_count, cb_count, 4 * sizeof(int));
+ memcpy(__entry->gp_seq, gp_seq, 4 * sizeof(unsigned long));
+ ),
+
+ TP_printk("%s cb_count: (DONE=%d, WAIT=%d, NEXT_READY=%d, NEXT=%d) "
+ "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx,
+ __entry->cb_count[0], __entry->cb_count[1], __entry->cb_count[2], __entry->cb_count[3],
+ __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3])
+
+);
+
/*
* Tracepoint for the registration of a single RCU callback of the special
* kfree() form. The first argument is the RCU type, the second argument
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index 4782cf17bf4f9..036d4abac7c5a 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -316,6 +316,43 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp,
WRITE_ONCE(rsclp->tails[i], &rsclp->head);
}

+/*
+ * Return how many CBs each segment along with their gp_seq values.
+ *
+ * This function is O(N) where N is the number of callbacks. Only used from
+ * tracing code which is usually disabled in production.
+ */
+#ifdef CONFIG_RCU_TRACE
+void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+ int cbcount[RCU_CBLIST_NSEGS],
+ unsigned long gpseq[RCU_CBLIST_NSEGS])
+{
+ struct rcu_head **cur_tail, *h;
+ int i, c;
+
+ for (i = 0; i < RCU_CBLIST_NSEGS; i++)
+ cbcount[i] = 0;
+
+ cur_tail = &(rsclp->head);
+
+ for (i = 0; i < RCU_CBLIST_NSEGS; i++) {
+ c = 0;
+ // List empty?
+ if (rsclp->tails[i] != cur_tail) {
+ // The loop skips the last node
+ c = 1;
+ for (h = *cur_tail; h->next != *(rsclp->tails[i]); h = h->next) {
+ c++;
+ }
+ }
+
+ cbcount[i] = c;
+ gpseq[i] = rsclp->gp_seq[i];
+ cur_tail = rsclp->tails[i];
+ }
+}
+#endif
+
/*
* Extract only those callbacks still pending (not yet ready to be
* invoked) from the specified rcu_segcblist structure and place them in
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 5c293afc07b8e..0a8dbac438529 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -104,3 +104,10 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq);
bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq);
void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp,
struct rcu_segcblist *src_rsclp);
+#ifdef CONFIG_RCU_TRACE
+void rcu_segcblist_countseq(struct rcu_segcblist *rsclp,
+ int cbcount[RCU_CBLIST_NSEGS],
+ unsigned long gpseq[RCU_CBLIST_NSEGS]);
+#else
+#define rcu_segcblist_countseq(...)
+#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ebce14e470d0e..c61af6a33fbfd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1407,6 +1407,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
{
unsigned long gp_seq_req;
bool ret = false;
+ int cbs[RCU_CBLIST_NSEGS];
+ unsigned long gps[RCU_CBLIST_NSEGS];

rcu_lockdep_assert_cblist_protected(rdp);
raw_lockdep_assert_held_rcu_node(rnp);
@@ -1415,6 +1417,10 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
if (!rcu_segcblist_pend_cbs(&rdp->cblist))
return false;

+ /* Count CBs for tracing. */
+ rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+ trace_rcu_segcb("SegCbPreAcc", cbs, gps);
+
/*
* Callbacks are often registered with incomplete grace-period
* information. Something about the fact that getting exact
@@ -1434,6 +1440,11 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
else
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
+
+ /* Count CBs for tracing. */
+ rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+ trace_rcu_segcb("SegCbPostAcc", cbs, gps);
+
return ret;
}

@@ -2316,6 +2327,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
long bl, count;
long pending, tlimit = 0;
+ int cbs[RCU_CBLIST_NSEGS];
+ unsigned long gps[RCU_CBLIST_NSEGS];

/* If no callbacks are ready, just return. */
if (!rcu_segcblist_ready_cbs(&rdp->cblist)) {
@@ -2350,6 +2363,11 @@ static void rcu_do_batch(struct rcu_data *rdp)
/* Invoke callbacks. */
tick_dep_set_task(current, TICK_DEP_BIT_RCU);
rhp = rcu_cblist_dequeue(&rcl);
+
+ /* Count CBs for tracing. */
+ rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+ trace_rcu_segcb("SegCbDequeued", cbs, gps);
+
for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) {
rcu_callback_t f;

@@ -2808,6 +2826,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
unsigned long flags;
struct rcu_data *rdp;
bool was_alldone;
+ int cbs[RCU_CBLIST_NSEGS];
+ unsigned long gps[RCU_CBLIST_NSEGS];

/* Misaligned rcu_head! */
WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
@@ -2852,6 +2872,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
trace_rcu_callback(rcu_state.name, head,
rcu_segcblist_n_cbs(&rdp->cblist));

+ /* Count CBs for tracing. */
+ rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
+ trace_rcu_segcb("SegCBQueued", cbs, gps);
+
/* Go handle any RCU core processing required. */
if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) {
--
2.27.0.111.gc72c7da667-goog

2020-06-19 01:14:03

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> The gp_seq value can come from either of rdp, rsp or rnp.
>
> Only the rsp is the global source of truth (most accurate GP info). The
> rnp can be off by ~1 and the rdp can be off by way more. Add some more
> context to traces to clarify where it comes from.

This would be better done in scripting that processes the trace messages.
I must pass on this one.

For future reference, the TPS() around strings is not optional. Without
it, trace messages from crash dumps are garbled, if I remember correctly.

Thanx, Paul

> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> include/trace/events/rcu.h | 12 ++++++++----
> kernel/rcu/tree.c | 32 ++++++++++++++++----------------
> kernel/rcu/tree_plugin.h | 2 +-
> 3 files changed, 25 insertions(+), 21 deletions(-)
>
> diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> index a6d49864dcc27..cb5363564f7ed 100644
> --- a/include/trace/events/rcu.h
> +++ b/include/trace/events/rcu.h
> @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> */
> TRACE_EVENT_RCU(rcu_grace_period,
>
> - TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> + TP_PROTO(const char *rcuname, const char *gp_seq_src,
> + unsigned long gp_seq, const char *gpevent),
>
> - TP_ARGS(rcuname, gp_seq, gpevent),
> + TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
>
> TP_STRUCT__entry(
> __field(const char *, rcuname)
> + __field(const char *, gp_seq_src)
> __field(unsigned long, gp_seq)
> __field(const char *, gpevent)
> ),
>
> TP_fast_assign(
> __entry->rcuname = rcuname;
> + __entry->gp_seq_src = gp_seq_src;
> __entry->gp_seq = gp_seq;
> __entry->gpevent = gpevent;
> ),
>
> - TP_printk("%s %lu %s",
> - __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> + TP_printk("%s %s_gp_seq=%lu %s",
> + __entry->rcuname, __entry->gp_seq_src,
> + __entry->gp_seq, __entry->gpevent)
> );
>
> /*
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index c61af6a33fbfd..81df1b837dd9d 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> goto unlock_out;
> }
> - trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> + trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> ret = true; /* Caller must wake GP kthread. */
> unlock_out:
> /* Push furthest requested GP to leaf node and rcu_data structure. */
> @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
>
> /* Trace depending on how much we were able to accelerate. */
> if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> else
> - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
>
> /* Count CBs for tracing. */
> rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> if (!offloaded)
> ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> rdp->core_needs_qs = false;
> - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> } else {
> if (!offloaded)
> ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> * set up to detect a quiescent state, otherwise don't
> * go looking for one.
> */
> - trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> + trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> need_qs = !!(rnp->qsmask & rdp->grpmask);
> rdp->cpu_no_qs.b.norm = need_qs;
> rdp->core_needs_qs = need_qs;
> @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> /* Record GP times before starting GP, hence rcu_seq_start(). */
> rcu_seq_start(&rcu_state.gp_seq);
> ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> raw_spin_unlock_irq_rcu_node(rnp);
>
> /*
> @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> jiffies + (j ? 3 * j : 2));
> }
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> TPS("fqswait"));
> rcu_state.gp_state = RCU_GP_WAIT_FQS;
> ret = swait_event_idle_timeout_exclusive(
> @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> /* If time for quiescent-state forcing, do it. */
> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> (gf & RCU_GP_FLAG_FQS)) {
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> TPS("fqsstart"));
> rcu_gp_fqs(first_gp_fqs);
> gf = 0;
> @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> first_gp_fqs = false;
> gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> }
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> TPS("fqsend"));
> cond_resched_tasks_rcu_qs();
> WRITE_ONCE(rcu_state.gp_activity, jiffies);
> @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> cond_resched_tasks_rcu_qs();
> WRITE_ONCE(rcu_state.gp_activity, jiffies);
> WARN_ON(signal_pending(current));
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> TPS("fqswaitsig"));
> ret = 1; /* Keep old FQS timing. */
> j = jiffies;
> @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
>
> /* Declare grace period done, trace first to use old GP number. */
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> rcu_seq_end(&rcu_state.gp_seq);
> ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> rcu_state.gp_state = RCU_GP_IDLE;
> @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> - trace_rcu_grace_period(rcu_state.name,
> + trace_rcu_grace_period(rcu_state.name, "rsp",
> rcu_state.gp_seq,
> TPS("newreq"));
> } else {
> @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
>
> /* Handle grace-period start. */
> for (;;) {
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> TPS("reqwait"));
> rcu_state.gp_state = RCU_GP_WAIT_GPS;
> swait_event_idle_exclusive(rcu_state.gp_wq,
> @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> cond_resched_tasks_rcu_qs();
> WRITE_ONCE(rcu_state.gp_activity, jiffies);
> WARN_ON(signal_pending(current));
> - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> TPS("reqwaitsig"));
> }
>
> @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> return 0;
>
> blkd = !!(rnp->qsmask & rdp->grpmask);
> - trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> + trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> return 0;
> }
> @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> rdp->core_needs_qs = false;
> rdp->rcu_iw_pending = false;
> rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> rcu_prepare_kthreads(cpu);
> rcu_spawn_cpu_nocb_kthread(cpu);
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 982fc5be52698..32f761cf16c33 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -262,7 +262,7 @@ static void rcu_qs(void)
> {
> RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> - trace_rcu_grace_period(TPS("rcu_preempt"),
> + trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> __this_cpu_read(rcu_data.gp_seq),
> TPS("cpuqs"));
> __this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> --
> 2.27.0.111.gc72c7da667-goog
>

2020-06-19 02:06:58

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, Jun 18, 2020 at 05:12:01PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 07:51:17PM -0400, Joel Fernandes wrote:
> > On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > > > The gp_seq value can come from either of rdp, rsp or rnp.
> > > >
> > > > Only the rsp is the global source of truth (most accurate GP info). The
> > > > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > > > context to traces to clarify where it comes from.
> > >
> > > This would be better done in scripting that processes the trace messages.
> > > I must pass on this one.
> >
> > I don't think so. I think people reading traces would get confused. Recently
> > I had to talk with someone about this as well when going over traces. I
> > myself struggle with it.
> >
> > > For future reference, the TPS() around strings is not optional. Without
> > > it, trace messages from crash dumps are garbled, if I remember correctly.
> >
> > Ok. I will fix that.
>
> Then please sort this patch after the others. I am quite unconvinced
> about this one, but some of the others looked quite valuable.

Sure sounds good :) will do :) thanks!

- Joel

> Thanx, Paul
>
> > thanks,
> >
> > - Joel
> >
> > >
> > > Thanx, Paul
> > >
> > > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > > ---
> > > > include/trace/events/rcu.h | 12 ++++++++----
> > > > kernel/rcu/tree.c | 32 ++++++++++++++++----------------
> > > > kernel/rcu/tree_plugin.h | 2 +-
> > > > 3 files changed, 25 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > index a6d49864dcc27..cb5363564f7ed 100644
> > > > --- a/include/trace/events/rcu.h
> > > > +++ b/include/trace/events/rcu.h
> > > > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > > > */
> > > > TRACE_EVENT_RCU(rcu_grace_period,
> > > >
> > > > - TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > > > + TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > > > + unsigned long gp_seq, const char *gpevent),
> > > >
> > > > - TP_ARGS(rcuname, gp_seq, gpevent),
> > > > + TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> > > >
> > > > TP_STRUCT__entry(
> > > > __field(const char *, rcuname)
> > > > + __field(const char *, gp_seq_src)
> > > > __field(unsigned long, gp_seq)
> > > > __field(const char *, gpevent)
> > > > ),
> > > >
> > > > TP_fast_assign(
> > > > __entry->rcuname = rcuname;
> > > > + __entry->gp_seq_src = gp_seq_src;
> > > > __entry->gp_seq = gp_seq;
> > > > __entry->gpevent = gpevent;
> > > > ),
> > > >
> > > > - TP_printk("%s %lu %s",
> > > > - __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > > > + TP_printk("%s %s_gp_seq=%lu %s",
> > > > + __entry->rcuname, __entry->gp_seq_src,
> > > > + __entry->gp_seq, __entry->gpevent)
> > > > );
> > > >
> > > > /*
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index c61af6a33fbfd..81df1b837dd9d 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > > > trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > > > goto unlock_out;
> > > > }
> > > > - trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > > > ret = true; /* Caller must wake GP kthread. */
> > > > unlock_out:
> > > > /* Push furthest requested GP to leaf node and rcu_data structure. */
> > > > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >
> > > > /* Trace depending on how much we were able to accelerate. */
> > > > if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > > > else
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > > >
> > > > /* Count CBs for tracing. */
> > > > rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > > > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > > if (!offloaded)
> > > > ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > > > rdp->core_needs_qs = false;
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > > > } else {
> > > > if (!offloaded)
> > > > ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > > > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > > * set up to detect a quiescent state, otherwise don't
> > > > * go looking for one.
> > > > */
> > > > - trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > > > need_qs = !!(rnp->qsmask & rdp->grpmask);
> > > > rdp->cpu_no_qs.b.norm = need_qs;
> > > > rdp->core_needs_qs = need_qs;
> > > > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > > rcu_seq_start(&rcu_state.gp_seq);
> > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > > > raw_spin_unlock_irq_rcu_node(rnp);
> > > >
> > > > /*
> > > > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > > > WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > > > jiffies + (j ? 3 * j : 2));
> > > > }
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqswait"));
> > > > rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > > > ret = swait_event_idle_timeout_exclusive(
> > > > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > > > /* If time for quiescent-state forcing, do it. */
> > > > if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > > > (gf & RCU_GP_FLAG_FQS)) {
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqsstart"));
> > > > rcu_gp_fqs(first_gp_fqs);
> > > > gf = 0;
> > > > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > > > first_gp_fqs = false;
> > > > gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > > > }
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqsend"));
> > > > cond_resched_tasks_rcu_qs();
> > > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > > > cond_resched_tasks_rcu_qs();
> > > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > WARN_ON(signal_pending(current));
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqswaitsig"));
> > > > ret = 1; /* Keep old FQS timing. */
> > > > j = jiffies;
> > > > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > > > raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> > > >
> > > > /* Declare grace period done, trace first to use old GP number. */
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > > > rcu_seq_end(&rcu_state.gp_seq);
> > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > rcu_state.gp_state = RCU_GP_IDLE;
> > > > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > > > if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > > > WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > > > WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > > > - trace_rcu_grace_period(rcu_state.name,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp",
> > > > rcu_state.gp_seq,
> > > > TPS("newreq"));
> > > > } else {
> > > > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > >
> > > > /* Handle grace-period start. */
> > > > for (;;) {
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("reqwait"));
> > > > rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > > > swait_event_idle_exclusive(rcu_state.gp_wq,
> > > > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > > cond_resched_tasks_rcu_qs();
> > > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > WARN_ON(signal_pending(current));
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("reqwaitsig"));
> > > > }
> > > >
> > > > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > > > return 0;
> > > >
> > > > blkd = !!(rnp->qsmask & rdp->grpmask);
> > > > - trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > > > blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > > > return 0;
> > > > }
> > > > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > > rdp->core_needs_qs = false;
> > > > rdp->rcu_iw_pending = false;
> > > > rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > rcu_prepare_kthreads(cpu);
> > > > rcu_spawn_cpu_nocb_kthread(cpu);
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 982fc5be52698..32f761cf16c33 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > > > {
> > > > RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > > > if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > > > - trace_rcu_grace_period(TPS("rcu_preempt"),
> > > > + trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > > > __this_cpu_read(rcu_data.gp_seq),
> > > > TPS("cpuqs"));
> > > > __this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > > > --
> > > > 2.27.0.111.gc72c7da667-goog
> > > >

2020-06-19 02:10:55

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, Jun 18, 2020 at 05:12:01PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 18, 2020 at 07:51:17PM -0400, Joel Fernandes wrote:
> > On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > > > The gp_seq value can come from either of rdp, rsp or rnp.
> > > >
> > > > Only the rsp is the global source of truth (most accurate GP info). The
> > > > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > > > context to traces to clarify where it comes from.
> > >
> > > This would be better done in scripting that processes the trace messages.
> > > I must pass on this one.
> >
> > I don't think so. I think people reading traces would get confused. Recently
> > I had to talk with someone about this as well when going over traces. I
> > myself struggle with it.
> >
> > > For future reference, the TPS() around strings is not optional. Without
> > > it, trace messages from crash dumps are garbled, if I remember correctly.
> >
> > Ok. I will fix that.
>
> Then please sort this patch after the others. I am quite unconvinced
> about this one, but some of the others looked quite valuable.

To summarize what's left for me to do right now:

1 and 7 were pulled.
2 - requires the per-cblist counter I'll work on in the near future.
3 - have to add TPS()
4 - The negative number fixes which I'll resend.
5 - changing to gp_seq_req in this tracepoint.
6 - dropping for now and pushing it to a git tag.

So on top of rcu/dev I'll rework 3 and 5, and resend 4.

Please do consider again if we can find a place to 6 :) thanks!

thanks,

- Joel


>
> Thanx, Paul
>
> > thanks,
> >
> > - Joel
> >
> > >
> > > Thanx, Paul
> > >
> > > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > > ---
> > > > include/trace/events/rcu.h | 12 ++++++++----
> > > > kernel/rcu/tree.c | 32 ++++++++++++++++----------------
> > > > kernel/rcu/tree_plugin.h | 2 +-
> > > > 3 files changed, 25 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > > index a6d49864dcc27..cb5363564f7ed 100644
> > > > --- a/include/trace/events/rcu.h
> > > > +++ b/include/trace/events/rcu.h
> > > > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > > > */
> > > > TRACE_EVENT_RCU(rcu_grace_period,
> > > >
> > > > - TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > > > + TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > > > + unsigned long gp_seq, const char *gpevent),
> > > >
> > > > - TP_ARGS(rcuname, gp_seq, gpevent),
> > > > + TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> > > >
> > > > TP_STRUCT__entry(
> > > > __field(const char *, rcuname)
> > > > + __field(const char *, gp_seq_src)
> > > > __field(unsigned long, gp_seq)
> > > > __field(const char *, gpevent)
> > > > ),
> > > >
> > > > TP_fast_assign(
> > > > __entry->rcuname = rcuname;
> > > > + __entry->gp_seq_src = gp_seq_src;
> > > > __entry->gp_seq = gp_seq;
> > > > __entry->gpevent = gpevent;
> > > > ),
> > > >
> > > > - TP_printk("%s %lu %s",
> > > > - __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > > > + TP_printk("%s %s_gp_seq=%lu %s",
> > > > + __entry->rcuname, __entry->gp_seq_src,
> > > > + __entry->gp_seq, __entry->gpevent)
> > > > );
> > > >
> > > > /*
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index c61af6a33fbfd..81df1b837dd9d 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > > > trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > > > goto unlock_out;
> > > > }
> > > > - trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > > > ret = true; /* Caller must wake GP kthread. */
> > > > unlock_out:
> > > > /* Push furthest requested GP to leaf node and rcu_data structure. */
> > > > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> > > >
> > > > /* Trace depending on how much we were able to accelerate. */
> > > > if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > > > else
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > > >
> > > > /* Count CBs for tracing. */
> > > > rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > > > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > > if (!offloaded)
> > > > ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > > > rdp->core_needs_qs = false;
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > > > } else {
> > > > if (!offloaded)
> > > > ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > > > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > > * set up to detect a quiescent state, otherwise don't
> > > > * go looking for one.
> > > > */
> > > > - trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > > > need_qs = !!(rnp->qsmask & rdp->grpmask);
> > > > rdp->cpu_no_qs.b.norm = need_qs;
> > > > rdp->core_needs_qs = need_qs;
> > > > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > > rcu_seq_start(&rcu_state.gp_seq);
> > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > > > raw_spin_unlock_irq_rcu_node(rnp);
> > > >
> > > > /*
> > > > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > > > WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > > > jiffies + (j ? 3 * j : 2));
> > > > }
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqswait"));
> > > > rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > > > ret = swait_event_idle_timeout_exclusive(
> > > > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > > > /* If time for quiescent-state forcing, do it. */
> > > > if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > > > (gf & RCU_GP_FLAG_FQS)) {
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqsstart"));
> > > > rcu_gp_fqs(first_gp_fqs);
> > > > gf = 0;
> > > > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > > > first_gp_fqs = false;
> > > > gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > > > }
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqsend"));
> > > > cond_resched_tasks_rcu_qs();
> > > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > > > cond_resched_tasks_rcu_qs();
> > > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > WARN_ON(signal_pending(current));
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("fqswaitsig"));
> > > > ret = 1; /* Keep old FQS timing. */
> > > > j = jiffies;
> > > > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > > > raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> > > >
> > > > /* Declare grace period done, trace first to use old GP number. */
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > > > rcu_seq_end(&rcu_state.gp_seq);
> > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > rcu_state.gp_state = RCU_GP_IDLE;
> > > > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > > > if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > > > WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > > > WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > > > - trace_rcu_grace_period(rcu_state.name,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp",
> > > > rcu_state.gp_seq,
> > > > TPS("newreq"));
> > > > } else {
> > > > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > >
> > > > /* Handle grace-period start. */
> > > > for (;;) {
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("reqwait"));
> > > > rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > > > swait_event_idle_exclusive(rcu_state.gp_wq,
> > > > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > > cond_resched_tasks_rcu_qs();
> > > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > > WARN_ON(signal_pending(current));
> > > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > > TPS("reqwaitsig"));
> > > > }
> > > >
> > > > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > > > return 0;
> > > >
> > > > blkd = !!(rnp->qsmask & rdp->grpmask);
> > > > - trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > > > + trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > > > blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > > > return 0;
> > > > }
> > > > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > > rdp->core_needs_qs = false;
> > > > rdp->rcu_iw_pending = false;
> > > > rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > rcu_prepare_kthreads(cpu);
> > > > rcu_spawn_cpu_nocb_kthread(cpu);
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index 982fc5be52698..32f761cf16c33 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > > > {
> > > > RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > > > if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > > > - trace_rcu_grace_period(TPS("rcu_preempt"),
> > > > + trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > > > __this_cpu_read(rcu_data.gp_seq),
> > > > TPS("cpuqs"));
> > > > __this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > > > --
> > > > 2.27.0.111.gc72c7da667-goog
> > > >

2020-06-19 02:53:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, Jun 18, 2020 at 07:51:17PM -0400, Joel Fernandes wrote:
> On Thu, Jun 18, 2020 at 03:19:01PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 18, 2020 at 04:29:51PM -0400, Joel Fernandes (Google) wrote:
> > > The gp_seq value can come from either of rdp, rsp or rnp.
> > >
> > > Only the rsp is the global source of truth (most accurate GP info). The
> > > rnp can be off by ~1 and the rdp can be off by way more. Add some more
> > > context to traces to clarify where it comes from.
> >
> > This would be better done in scripting that processes the trace messages.
> > I must pass on this one.
>
> I don't think so. I think people reading traces would get confused. Recently
> I had to talk with someone about this as well when going over traces. I
> myself struggle with it.
>
> > For future reference, the TPS() around strings is not optional. Without
> > it, trace messages from crash dumps are garbled, if I remember correctly.
>
> Ok. I will fix that.

Then please sort this patch after the others. I am quite unconvinced
about this one, but some of the others looked quite valuable.

Thanx, Paul

> thanks,
>
> - Joel
>
> >
> > Thanx, Paul
> >
> > > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > > ---
> > > include/trace/events/rcu.h | 12 ++++++++----
> > > kernel/rcu/tree.c | 32 ++++++++++++++++----------------
> > > kernel/rcu/tree_plugin.h | 2 +-
> > > 3 files changed, 25 insertions(+), 21 deletions(-)
> > >
> > > diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
> > > index a6d49864dcc27..cb5363564f7ed 100644
> > > --- a/include/trace/events/rcu.h
> > > +++ b/include/trace/events/rcu.h
> > > @@ -68,24 +68,28 @@ TRACE_EVENT(rcu_utilization,
> > > */
> > > TRACE_EVENT_RCU(rcu_grace_period,
> > >
> > > - TP_PROTO(const char *rcuname, unsigned long gp_seq, const char *gpevent),
> > > + TP_PROTO(const char *rcuname, const char *gp_seq_src,
> > > + unsigned long gp_seq, const char *gpevent),
> > >
> > > - TP_ARGS(rcuname, gp_seq, gpevent),
> > > + TP_ARGS(rcuname, gp_seq_src, gp_seq, gpevent),
> > >
> > > TP_STRUCT__entry(
> > > __field(const char *, rcuname)
> > > + __field(const char *, gp_seq_src)
> > > __field(unsigned long, gp_seq)
> > > __field(const char *, gpevent)
> > > ),
> > >
> > > TP_fast_assign(
> > > __entry->rcuname = rcuname;
> > > + __entry->gp_seq_src = gp_seq_src;
> > > __entry->gp_seq = gp_seq;
> > > __entry->gpevent = gpevent;
> > > ),
> > >
> > > - TP_printk("%s %lu %s",
> > > - __entry->rcuname, __entry->gp_seq, __entry->gpevent)
> > > + TP_printk("%s %s_gp_seq=%lu %s",
> > > + __entry->rcuname, __entry->gp_seq_src,
> > > + __entry->gp_seq, __entry->gpevent)
> > > );
> > >
> > > /*
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index c61af6a33fbfd..81df1b837dd9d 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1334,7 +1334,7 @@ static bool rcu_start_this_gp(struct rcu_node *rnp_start, struct rcu_data *rdp,
> > > trace_rcu_this_gp(rnp, rdp, gp_seq_req, TPS("NoGPkthread"));
> > > goto unlock_out;
> > > }
> > > - trace_rcu_grace_period(rcu_state.name, data_race(rcu_state.gp_seq), TPS("newreq"));
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", data_race(rcu_state.gp_seq), TPS("newreq"));
> > > ret = true; /* Caller must wake GP kthread. */
> > > unlock_out:
> > > /* Push furthest requested GP to leaf node and rcu_data structure. */
> > > @@ -1437,9 +1437,9 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp)
> > >
> > > /* Trace depending on how much we were able to accelerate. */
> > > if (rcu_segcblist_restempty(&rdp->cblist, RCU_WAIT_TAIL))
> > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccWaitCB"));
> > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccWaitCB"));
> > > else
> > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("AccReadyCB"));
> > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("AccReadyCB"));
> > >
> > > /* Count CBs for tracing. */
> > > rcu_segcblist_countseq(&rdp->cblist, cbs, gps);
> > > @@ -1543,7 +1543,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > if (!offloaded)
> > > ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > > rdp->core_needs_qs = false;
> > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuend"));
> > > } else {
> > > if (!offloaded)
> > > ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > > @@ -1559,7 +1559,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > > * set up to detect a quiescent state, otherwise don't
> > > * go looking for one.
> > > */
> > > - trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > > + trace_rcu_grace_period(rcu_state.name, "rnp", rnp->gp_seq, TPS("cpustart"));
> > > need_qs = !!(rnp->qsmask & rdp->grpmask);
> > > rdp->cpu_no_qs.b.norm = need_qs;
> > > rdp->core_needs_qs = need_qs;
> > > @@ -1660,7 +1660,7 @@ static bool rcu_gp_init(void)
> > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > rcu_seq_start(&rcu_state.gp_seq);
> > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("start"));
> > > raw_spin_unlock_irq_rcu_node(rnp);
> > >
> > > /*
> > > @@ -1828,7 +1828,7 @@ static void rcu_gp_fqs_loop(void)
> > > WRITE_ONCE(rcu_state.jiffies_kick_kthreads,
> > > jiffies + (j ? 3 * j : 2));
> > > }
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > TPS("fqswait"));
> > > rcu_state.gp_state = RCU_GP_WAIT_FQS;
> > > ret = swait_event_idle_timeout_exclusive(
> > > @@ -1843,7 +1843,7 @@ static void rcu_gp_fqs_loop(void)
> > > /* If time for quiescent-state forcing, do it. */
> > > if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> > > (gf & RCU_GP_FLAG_FQS)) {
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > TPS("fqsstart"));
> > > rcu_gp_fqs(first_gp_fqs);
> > > gf = 0;
> > > @@ -1851,7 +1851,7 @@ static void rcu_gp_fqs_loop(void)
> > > first_gp_fqs = false;
> > > gf = rcu_state.cbovld ? RCU_GP_FLAG_OVLD : 0;
> > > }
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > TPS("fqsend"));
> > > cond_resched_tasks_rcu_qs();
> > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > @@ -1862,7 +1862,7 @@ static void rcu_gp_fqs_loop(void)
> > > cond_resched_tasks_rcu_qs();
> > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > WARN_ON(signal_pending(current));
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > TPS("fqswaitsig"));
> > > ret = 1; /* Keep old FQS timing. */
> > > j = jiffies;
> > > @@ -1945,7 +1945,7 @@ static void rcu_gp_cleanup(void)
> > > raw_spin_lock_irq_rcu_node(rnp); /* GP before ->gp_seq update. */
> > >
> > > /* Declare grace period done, trace first to use old GP number. */
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end"));
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq, TPS("end"));
> > > rcu_seq_end(&rcu_state.gp_seq);
> > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > rcu_state.gp_state = RCU_GP_IDLE;
> > > @@ -1962,7 +1962,7 @@ static void rcu_gp_cleanup(void)
> > > if ((offloaded || !rcu_accelerate_cbs(rnp, rdp)) && needgp) {
> > > WRITE_ONCE(rcu_state.gp_flags, RCU_GP_FLAG_INIT);
> > > WRITE_ONCE(rcu_state.gp_req_activity, jiffies);
> > > - trace_rcu_grace_period(rcu_state.name,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp",
> > > rcu_state.gp_seq,
> > > TPS("newreq"));
> > > } else {
> > > @@ -1982,7 +1982,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > >
> > > /* Handle grace-period start. */
> > > for (;;) {
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > TPS("reqwait"));
> > > rcu_state.gp_state = RCU_GP_WAIT_GPS;
> > > swait_event_idle_exclusive(rcu_state.gp_wq,
> > > @@ -1996,7 +1996,7 @@ static int __noreturn rcu_gp_kthread(void *unused)
> > > cond_resched_tasks_rcu_qs();
> > > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > > WARN_ON(signal_pending(current));
> > > - trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", rcu_state.gp_seq,
> > > TPS("reqwaitsig"));
> > > }
> > >
> > > @@ -2240,7 +2240,7 @@ int rcutree_dying_cpu(unsigned int cpu)
> > > return 0;
> > >
> > > blkd = !!(rnp->qsmask & rdp->grpmask);
> > > - trace_rcu_grace_period(rcu_state.name, READ_ONCE(rnp->gp_seq),
> > > + trace_rcu_grace_period(rcu_state.name, "rsp", READ_ONCE(rnp->gp_seq),
> > > blkd ? TPS("cpuofl") : TPS("cpuofl-bgp"));
> > > return 0;
> > > }
> > > @@ -3733,7 +3733,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > > rdp->core_needs_qs = false;
> > > rdp->rcu_iw_pending = false;
> > > rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
> > > - trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > > + trace_rcu_grace_period(rcu_state.name, "rdp", rdp->gp_seq, TPS("cpuonl"));
> > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > rcu_prepare_kthreads(cpu);
> > > rcu_spawn_cpu_nocb_kthread(cpu);
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 982fc5be52698..32f761cf16c33 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -262,7 +262,7 @@ static void rcu_qs(void)
> > > {
> > > RCU_LOCKDEP_WARN(preemptible(), "rcu_qs() invoked with preemption enabled!!!\n");
> > > if (__this_cpu_read(rcu_data.cpu_no_qs.s)) {
> > > - trace_rcu_grace_period(TPS("rcu_preempt"),
> > > + trace_rcu_grace_period(TPS("rcu_preempt"), "rdp",
> > > __this_cpu_read(rcu_data.gp_seq),
> > > TPS("cpuqs"));
> > > __this_cpu_write(rcu_data.cpu_no_qs.b.norm, false);
> > > --
> > > 2.27.0.111.gc72c7da667-goog
> > >

2020-06-19 04:04:02

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH 3/7] rcu/trace: Add name of the source for gp_seq

On Thu, Jun 18, 2020 at 08:01:38PM -0400, Steven Rostedt wrote:
> On Thu, 18 Jun 2020 15:19:01 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > For future reference, the TPS() around strings is not optional. Without
> > it, trace messages from crash dumps are garbled, if I remember correctly.
>
> When you pass in a string like this, only the pointer to the string is
> saved in the ring buffer. User space tools have no idea what those
> pointers are. The TPS() around strings maps those pointers to the
> string and shows them in the /sys/kernel/tracing/printk_formats file,
> such that perf and trace-cmd know how to make sense of those strings.

Makes sense. Quite a valuable feature!

thanks,

- Joel