PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
concurrent NOCB (de-)offloading.
Thomas suggested to drop the local_lock() based solution and simply
check the offloaded state while context looks safe but that's not
enough. Here is a bit of rework.
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
rcu/rt
HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
Thanks,
Frederic
---
Frederic Weisbecker (10):
rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
rcu/nocb: Prepare state machine for a new step
rcu/nocb: Invoke rcu_core() at the start of deoffloading
rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
rcu/nocb: Limit number of softirq callbacks only on softirq
rcu: Fix callbacks processing time limit retaining cond_resched()
rcu: Apply callbacks processing time limit only on softirq
rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
Thomas Gleixner (1):
rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
include/linux/rcu_segcblist.h | 51 +++++++++++++++++++-------
kernel/rcu/rcu_segcblist.c | 10 ++---
kernel/rcu/rcu_segcblist.h | 7 ++--
kernel/rcu/tree.c | 85 ++++++++++++++++++++++++++++++-------------
kernel/rcu/tree.h | 16 +++++---
kernel/rcu/tree_nocb.h | 29 ++++++++++++---
6 files changed, 136 insertions(+), 62 deletions(-)
From: Thomas Gleixner <[email protected]>
While reporting a quiescent state for a given CPU, rcu_core() takes
advantage of the freshly loaded grace period sequence number and the
locked rnp to accelerate the callbacks whose sequence number have been
assigned a stale value.
This action is only necessary when the rdp isn't offloaded, otherwise
the NOCB kthreads already take care of the callbacks progression.
However the check for the offloaded state is volatile because it is
performed outside the IRQs disabled section. It's possible for the
offloading process to preempt rcu_core() at that point on PREEMPT_RT.
This is dangerous because rcu_core() may end up accelerating callbacks
concurrently with NOCB kthreads without appropriate locking.
Fix this with moving the offloaded check inside the rnp locking section.
Reported-by: Valentin Schneider <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/rcu/tree.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1512efc52816..32303070b20b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2288,7 +2288,6 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
unsigned long flags;
unsigned long mask;
bool needwake = false;
- const bool offloaded = rcu_rdp_is_offloaded(rdp);
struct rcu_node *rnp;
WARN_ON_ONCE(rdp->cpu != smp_processor_id());
@@ -2315,8 +2314,10 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
/*
* This GP can't end until cpu checks in, so all of our
* callbacks can be processed during the next GP.
+ *
+ * NOCB kthreads have their own way to deal with that.
*/
- if (!offloaded)
+ if (!rcu_rdp_is_offloaded(rdp))
needwake = rcu_accelerate_cbs(rnp, rdp);
rcu_disable_urgency_upon_qs(rdp);
--
2.25.1
rcu_nocb_lock_irqsave() can be preempted between the call to
rcu_segcblist_is_offloaded() and the actual locking. This matters now
that rcu_core() is preemptible on PREEMPT_RT and the (de-)offloading
process can interrupt the softirq or the rcuc kthread.
As a result we may locklessly call into code that requires nocb locking.
In practice this is a problem while we accelerate callbacks on rcu_core().
Simply disabling interrupts before (instead of after) checking the NOCB
offload state fixes the issue.
Reported-by: Valentin Schneider <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.h | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 70188cb42473..deeaf2fee714 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -439,12 +439,16 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
#ifdef CONFIG_RCU_NOCB_CPU
static void __init rcu_organize_nocb_kthreads(void);
-#define rcu_nocb_lock_irqsave(rdp, flags) \
-do { \
- if (!rcu_segcblist_is_offloaded(&(rdp)->cblist)) \
- local_irq_save(flags); \
- else \
- raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags)); \
+
+/*
+ * Disable IRQs before checking offloaded state so that local
+ * locking is safe against concurrent de-offloading.
+ */
+#define rcu_nocb_lock_irqsave(rdp, flags) \
+do { \
+ local_irq_save(flags); \
+ if (rcu_segcblist_is_offloaded(&(rdp)->cblist)) \
+ raw_spin_lock(&(rdp)->nocb_lock); \
} while (0)
#else /* #ifdef CONFIG_RCU_NOCB_CPU */
#define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags)
--
2.25.1
Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
segcblist flags because it is an exclusive state that doesn't mix up
with the other flags. Remove it in favour of:
_ A flag specifying that rcu_core() needs to perform callbacks execution
and acceleration
and
_ A flag specifying we want the nocb lock to be held in any needed
circumstances
This clarifies the code and is more flexible: It allows to have a state
where rcu_core() runs with locking while offloading hasn't started yet.
This is a necessary step to prepare for triggering rcu_core() at the
very beginning of the de-offloading process so that rcu_core() won't
dismiss work while being preempted by the de-offloading process, at
least not without a pending subsequent rcu_core() that will quickly
catch up.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/rcu_segcblist.h | 37 ++++++++++++++++++++++-------------
kernel/rcu/rcu_segcblist.c | 6 +++---
kernel/rcu/rcu_segcblist.h | 7 +++----
kernel/rcu/tree.c | 2 +-
kernel/rcu/tree_nocb.h | 20 +++++++++++++------
5 files changed, 44 insertions(+), 28 deletions(-)
diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 3db96c4f45fd..812961b1d064 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -69,7 +69,7 @@ struct rcu_cblist {
*
*
* ----------------------------------------------------------------------------
- * | SEGCBLIST_SOFTIRQ_ONLY |
+ * | SEGCBLIST_RCU_CORE |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, without holding nocb_lock. |
@@ -77,7 +77,7 @@ struct rcu_cblist {
* |
* v
* ----------------------------------------------------------------------------
- * | SEGCBLIST_OFFLOADED |
+ * | SEGCBLIST_RCU_CORE | SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, while holding nocb_lock. Waking up CB and GP kthreads, |
@@ -89,7 +89,9 @@ struct rcu_cblist {
* | |
* v v
* --------------------------------------- ----------------------------------|
- * | SEGCBLIST_OFFLOADED | | | SEGCBLIST_OFFLOADED | |
+ * | SEGCBLIST_RCU_CORE | | | SEGCBLIST_RCU_CORE | |
+ * | SEGCBLIST_LOCKING | | | SEGCBLIST_LOCKING | |
+ * | SEGCBLIST_OFFLOADED | | | SEGCBLIST_OFFLOADED | |
* | SEGCBLIST_KTHREAD_CB | | SEGCBLIST_KTHREAD_GP |
* | | | |
* | | | |
@@ -104,9 +106,10 @@ struct rcu_cblist {
* |
* v
* |--------------------------------------------------------------------------|
- * | SEGCBLIST_OFFLOADED | |
- * | SEGCBLIST_KTHREAD_CB | |
- * | SEGCBLIST_KTHREAD_GP |
+ * | SEGCBLIST_LOCKING | |
+ * | SEGCBLIST_OFFLOADED | |
+ * | SEGCBLIST_KTHREAD_GP | |
+ * | SEGCBLIST_KTHREAD_CB |
* | |
* | Kthreads handle callbacks holding nocb_lock, local rcu_core() stops |
* | handling callbacks. Enable bypass queueing. |
@@ -120,7 +123,8 @@ struct rcu_cblist {
*
*
* |--------------------------------------------------------------------------|
- * | SEGCBLIST_OFFLOADED | |
+ * | SEGCBLIST_LOCKING | |
+ * | SEGCBLIST_OFFLOADED | |
* | SEGCBLIST_KTHREAD_CB | |
* | SEGCBLIST_KTHREAD_GP |
* | |
@@ -130,6 +134,8 @@ struct rcu_cblist {
* |
* v
* |--------------------------------------------------------------------------|
+ * | SEGCBLIST_RCU_CORE | |
+ * | SEGCBLIST_LOCKING | |
* | SEGCBLIST_KTHREAD_CB | |
* | SEGCBLIST_KTHREAD_GP |
* | |
@@ -143,7 +149,9 @@ struct rcu_cblist {
* | |
* v v
* ---------------------------------------------------------------------------|
- * | |
+ * | | |
+ * | SEGCBLIST_RCU_CORE | | SEGCBLIST_RCU_CORE | |
+ * | SEGCBLIST_LOCKING | | SEGCBLIST_LOCKING | |
* | SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP |
* | | |
* | GP kthread woke up and | CB kthread woke up and |
@@ -159,7 +167,7 @@ struct rcu_cblist {
* |
* v
* ----------------------------------------------------------------------------
- * | 0 |
+ * | SEGCBLIST_RCU_CORE | SEGCBLIST_LOCKING |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, while holding nocb_lock. Forbid nocb_timer to be armed. |
@@ -168,17 +176,18 @@ struct rcu_cblist {
* |
* v
* ----------------------------------------------------------------------------
- * | SEGCBLIST_SOFTIRQ_ONLY |
+ * | SEGCBLIST_RCU_CORE |
* | |
* | Callbacks processed by rcu_core() from softirqs or local |
* | rcuc kthread, without holding nocb_lock. |
* ----------------------------------------------------------------------------
*/
#define SEGCBLIST_ENABLED BIT(0)
-#define SEGCBLIST_SOFTIRQ_ONLY BIT(1)
-#define SEGCBLIST_KTHREAD_CB BIT(2)
-#define SEGCBLIST_KTHREAD_GP BIT(3)
-#define SEGCBLIST_OFFLOADED BIT(4)
+#define SEGCBLIST_RCU_CORE BIT(1)
+#define SEGCBLIST_LOCKING BIT(2)
+#define SEGCBLIST_KTHREAD_CB BIT(3)
+#define SEGCBLIST_KTHREAD_GP BIT(4)
+#define SEGCBLIST_OFFLOADED BIT(5)
struct rcu_segcblist {
struct rcu_head *head;
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index aaa111237b60..c07aab6e39ef 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -261,14 +261,14 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
}
/*
- * Mark the specified rcu_segcblist structure as offloaded.
+ * Mark the specified rcu_segcblist structure as offloaded (or not)
*/
void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
{
if (offload) {
- rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY);
- rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED);
+ rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
} else {
+ rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
}
}
diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index 9a19328ff251..0fc9048c4936 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -84,7 +84,7 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
{
if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
- !rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY))
+ rcu_segcblist_test_flags(rsclp, SEGCBLIST_LOCKING))
return true;
return false;
@@ -92,9 +92,8 @@ static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rsclp)
{
- int flags = SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP | SEGCBLIST_OFFLOADED;
-
- if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) && (rsclp->flags & flags) == flags)
+ if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
+ !rcu_segcblist_test_flags(rsclp, SEGCBLIST_RCU_CORE))
return true;
return false;
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 572f644f3240..ef417769bab2 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -79,7 +79,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
.dynticks = ATOMIC_INIT(1),
#ifdef CONFIG_RCU_NOCB_CPU
- .cblist.flags = SEGCBLIST_SOFTIRQ_ONLY,
+ .cblist.flags = SEGCBLIST_RCU_CORE,
#endif
};
static struct rcu_state rcu_state = {
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 368ef7b9af4f..71a28f50b40f 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1000,12 +1000,12 @@ static long rcu_nocb_rdp_deoffload(void *arg)
*/
rcu_nocb_lock_irqsave(rdp, flags);
/*
- * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb
+ * Theoretically we could clear SEGCBLIST_LOCKING after the nocb
* lock is released but how about being paranoid for once?
*/
- rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
+ rcu_segcblist_clear_flags(cblist, SEGCBLIST_LOCKING);
/*
- * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
+ * Without SEGCBLIST_LOCKING, we can't use
* rcu_nocb_unlock_irqrestore() anymore.
*/
raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
@@ -1058,14 +1058,14 @@ static long rcu_nocb_rdp_offload(void *arg)
pr_info("Offloading %d\n", rdp->cpu);
/*
- * Can't use rcu_nocb_lock_irqsave() while we are in
- * SEGCBLIST_SOFTIRQ_ONLY mode.
+ * Can't use rcu_nocb_lock_irqsave() before SEGCBLIST_LOCKING
+ * is set.
*/
raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
/*
* We didn't take the nocb lock while working on the
- * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode.
+ * rdp->cblist with SEGCBLIST_LOCKING cleared (pure softirq/rcuc mode).
* Every modifications that have been done previously on
* rdp->cblist must be visible remotely by the nocb kthreads
* upon wake up after reading the cblist flags.
@@ -1084,6 +1084,14 @@ static long rcu_nocb_rdp_offload(void *arg)
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) &&
rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP));
+ /*
+ * All kthreads are ready to work, we can finally relieve rcu_core() and
+ * enable nocb bypass.
+ */
+ rcu_nocb_lock_irqsave(rdp, flags);
+ rcu_segcblist_clear_flags(cblist, SEGCBLIST_RCU_CORE);
+ rcu_nocb_unlock_irqrestore(rdp, flags);
+
return ret;
}
--
2.25.1
Instead of hardcoding IRQ save and nocb lock, use the consolidated
API.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b1fc6e498d90..1971a4e15e96 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2492,8 +2492,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
* races with call_rcu() from interrupt handlers. Leave the
* callback counts, as rcu_barrier() needs to be conservative.
*/
- local_irq_save(flags);
- rcu_nocb_lock(rdp);
+ rcu_nocb_lock_irqsave(rdp, flags);
WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
pending = rcu_segcblist_n_cbs(&rdp->cblist);
div = READ_ONCE(rcu_divisor);
@@ -2556,8 +2555,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
}
}
- local_irq_save(flags);
- rcu_nocb_lock(rdp);
+ rcu_nocb_lock_irqsave(rdp, flags);
rdp->n_cbs_invoked += count;
trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(),
is_idle_task(current), rcu_is_callbacks_kthread());
--
2.25.1
The current condition to limit the number of callbacks executed in a
row checks the offloaded state of the rdp. Not only is it volatile
but it is also misleading: the rcu_core() may well be executing
callbacks concurrently with NOCB kthreads, and the offloaded state
would then be verified on both cases. As a result the limit would
spuriously not apply anymore on softirq while in the middle of
(de-)offloading process.
Another issue with the condition is that rcu_is_callbacks_kthread()
doesn't check if we are actually running callbacks from rcuc itself or
from a softirq interrupting rcuc.
Fix and clarify the condition with those constraints in mind:
_ If callbacks are processed either by rcuc or NOCB kthread, the call
to cond_resched_tasks_rcu_qs() is enough to take care of the overload.
_ If instead callbacks are processed by softirqs:
* If need_resched(), exit the callbacks processing
* Otherwise if CPU is idle we can continue
* Otherwise exit because a softirq shouldn't interrupt a task for too
long nor deprive other pending softirq vectors of the CPU.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 1971a4e15e96..35c4abc8df99 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2535,9 +2535,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
/*
* Stop only if limit reached and CPU has something to do.
*/
- if (count >= bl && !offloaded &&
- (need_resched() ||
- (!is_idle_task(current) && !rcu_is_callbacks_kthread())))
+ if (count >= bl && in_serving_softirq() &&
+ (need_resched() || !is_idle_task(current)))
break;
if (unlikely(tlimit)) {
/* only call local_clock() every 32 callbacks */
--
2.25.1
Time limit only makes sense when callbacks are serviced in softirq mode
because:
_ In case we need to get back to the scheduler,
cond_resched_tasks_rcu_qs() is called after each callback.
_ In case some other softirq vector needs the CPU, the call to
local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
them via a call to do_softirq().
_ The time spent on other tasks after scheduling out, or on softirqs
processing, is spuriously accounted to the time limit.
Therefore, make sure the time limit only applies to softirq mode.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bfc8e2727e35..b538094e21d9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2498,7 +2498,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
div = READ_ONCE(rcu_divisor);
div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
bl = max(rdp->blimit, pending >> div);
- if (unlikely(bl > 100)) {
+ if (in_serving_softirq() && unlikely(bl > 100)) {
long rrn = READ_ONCE(rcu_resched_ns);
rrn = rrn < NSEC_PER_MSEC ? NSEC_PER_MSEC : rrn > NSEC_PER_SEC ? NSEC_PER_SEC : rrn;
@@ -2538,6 +2538,17 @@ static void rcu_do_batch(struct rcu_data *rdp)
if (in_serving_softirq()) {
if (count >= bl && (need_resched() || !is_idle_task(current)))
break;
+ /*
+ * Make sure we don't spend too much time here and deprive other
+ * softirq vectors of CPU cycles.
+ */
+ if (unlikely(tlimit)) {
+ /* only call local_clock() every 32 callbacks */
+ if (likely((count & 31) || local_clock() < tlimit))
+ continue;
+ /* Exceeded the time limit, so leave. */
+ break;
+ }
} else {
local_bh_enable();
lockdep_assert_irqs_enabled();
@@ -2545,18 +2556,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
lockdep_assert_irqs_enabled();
local_bh_disable();
}
-
- /*
- * Make sure we don't spend too much time here and deprive other
- * softirq vectors of CPU cycles.
- */
- if (unlikely(tlimit)) {
- /* only call local_clock() every 32 callbacks */
- if (likely((count & 31) || local_clock() < tlimit))
- continue;
- /* Exceeded the time limit, so leave. */
- break;
- }
}
rcu_nocb_lock_irqsave(rdp, flags);
--
2.25.1
It's unclear why rdp->qlen_last_fqs_check is updated only on offloaded
rdp. Also rcu_core() and NOCB kthreads can run concurrently, should
we do the update on both cases? Probably not because one can erase the
initialized value of the other.
For now shutdown the RT warning due to volatile offloading state check
and wait for a debate after this half-baked patch.
Reported-by: Valentin Schneider <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 73971b8024d8..b1fc6e498d90 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2508,7 +2508,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
trace_rcu_batch_start(rcu_state.name,
rcu_segcblist_n_cbs(&rdp->cblist), bl);
rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl);
- if (offloaded)
+ if (rcu_rdp_is_offloaded(rdp))
rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued"));
--
2.25.1
On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
some work, such as callbacks acceleration and invocation, may be left
unattended due to the volatile checks on the offloaded state.
In the worst case this work is postponed until the next rcu_pending()
check that can take a jiffy to reach, which can be a problem in case
of callbacks flooding.
Solve that with invoking rcu_core() early in the de-offloading process.
This way any work dismissed by an ongoing rcu_core() call fooled by
a preempting deoffloading process will be caught up by a nearby future
recall to rcu_core(), this time fully aware of the de-offloading state.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
include/linux/rcu_segcblist.h | 14 ++++++++++++++
kernel/rcu/rcu_segcblist.c | 6 ++----
kernel/rcu/tree.c | 17 +++++++++++++++++
kernel/rcu/tree_nocb.h | 9 +++++++++
4 files changed, 42 insertions(+), 4 deletions(-)
diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h
index 812961b1d064..659d13a7ddaa 100644
--- a/include/linux/rcu_segcblist.h
+++ b/include/linux/rcu_segcblist.h
@@ -136,6 +136,20 @@ struct rcu_cblist {
* |--------------------------------------------------------------------------|
* | SEGCBLIST_RCU_CORE | |
* | SEGCBLIST_LOCKING | |
+ * | SEGCBLIST_OFFLOADED | |
+ * | SEGCBLIST_KTHREAD_CB | |
+ * | SEGCBLIST_KTHREAD_GP |
+ * | |
+ * | CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core() |
+ * | handles callbacks concurrently. Bypass enqueue is enabled. |
+ * | Invoke RCU core so we make sure not to preempt it in the middle with |
+ * | leaving some urgent work unattended within a jiffy. |
+ * ----------------------------------------------------------------------------
+ * |
+ * v
+ * |--------------------------------------------------------------------------|
+ * | SEGCBLIST_RCU_CORE | |
+ * | SEGCBLIST_LOCKING | |
* | SEGCBLIST_KTHREAD_CB | |
* | SEGCBLIST_KTHREAD_GP |
* | |
diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
index c07aab6e39ef..81145c3ece25 100644
--- a/kernel/rcu/rcu_segcblist.c
+++ b/kernel/rcu/rcu_segcblist.c
@@ -265,12 +265,10 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp)
*/
void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload)
{
- if (offload) {
+ if (offload)
rcu_segcblist_set_flags(rsclp, SEGCBLIST_LOCKING | SEGCBLIST_OFFLOADED);
- } else {
- rcu_segcblist_set_flags(rsclp, SEGCBLIST_RCU_CORE);
+ else
rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED);
- }
}
/*
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ef417769bab2..1512efc52816 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2717,6 +2717,23 @@ static __latent_entropy void rcu_core(void)
unsigned long flags;
struct rcu_data *rdp = raw_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
+ /*
+ * On RT rcu_core() can be preempted when IRQs aren't disabled.
+ * Therefore this function can race with concurrent NOCB (de-)offloading
+ * on this CPU and the below condition must be considered volatile.
+ * However if we race with:
+ *
+ * _ Offloading: In the worst case we accelerate or process callbacks
+ * concurrently with NOCB kthreads. We are guaranteed to
+ * call rcu_nocb_lock() if that happens.
+ *
+ * _ Deoffloading: In the worst case we miss callbacks acceleration or
+ * processing. This is fine because the early stage
+ * of deoffloading invokes rcu_core() after setting
+ * SEGCBLIST_RCU_CORE. So we guarantee that we'll process
+ * what could have been dismissed without the need to wait
+ * for the next rcu_pending() check in the next jiffy.
+ */
const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist);
if (cpu_is_offline(smp_processor_id()))
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 71a28f50b40f..3b470113ae38 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
* will refuse to put anything into the bypass.
*/
WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
+ /*
+ * Start with invoking rcu_core() early. This way if the current thread
+ * happens to preempt an ongoing call to rcu_core() in the middle,
+ * leaving some work dismissed because rcu_core() still thinks the rdp is
+ * completely offloaded, we are guaranteed a nearby future instance of
+ * rcu_core() to catch up.
+ */
+ rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
+ invoke_rcu_core();
ret = rdp_offload_toggle(rdp, false, flags);
swait_event_exclusive(rdp->nocb_state_wq,
!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
--
2.25.1
The callbacks processing time limit makes sure we are not exceeding a
given amount of time executing the queue.
However its "continue" clause bypasses the cond_resched() call on
rcuc and NOCB kthreads, delaying it until we reach the limit, which can
be very long...
Make sure the scheduler has a higher priority than the time limit.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 35c4abc8df99..bfc8e2727e35 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2535,9 +2535,21 @@ static void rcu_do_batch(struct rcu_data *rdp)
/*
* Stop only if limit reached and CPU has something to do.
*/
- if (count >= bl && in_serving_softirq() &&
- (need_resched() || !is_idle_task(current)))
- break;
+ if (in_serving_softirq()) {
+ if (count >= bl && (need_resched() || !is_idle_task(current)))
+ break;
+ } else {
+ local_bh_enable();
+ lockdep_assert_irqs_enabled();
+ cond_resched_tasks_rcu_qs();
+ lockdep_assert_irqs_enabled();
+ local_bh_disable();
+ }
+
+ /*
+ * Make sure we don't spend too much time here and deprive other
+ * softirq vectors of CPU cycles.
+ */
if (unlikely(tlimit)) {
/* only call local_clock() every 32 callbacks */
if (likely((count & 31) || local_clock() < tlimit))
@@ -2545,13 +2557,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
/* Exceeded the time limit, so leave. */
break;
}
- if (!in_serving_softirq()) {
- local_bh_enable();
- lockdep_assert_irqs_enabled();
- cond_resched_tasks_rcu_qs();
- lockdep_assert_irqs_enabled();
- local_bh_disable();
- }
}
rcu_nocb_lock_irqsave(rdp, flags);
--
2.25.1
rcu_core() tries to ensure that its self-invocation in case of callbacks
overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
to trigger local RCU core from nocb_cb kthread since it can execute
on a CPU different from the target rdp. Also in case of overload, the
nocb_cb kthread simply iterates a new loop of callbacks processing.
However the "offloaded" check that aims at preventing that is wrong.
First of all that state is volatile and second: softirq/rcuc can
execute while the target rdp is offloaded. As a result rcu_core() can
Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
irrespective of the rdp offloaded state.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b538094e21d9..600bffe4f121 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2470,7 +2470,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
int div;
bool __maybe_unused empty;
unsigned long flags;
- const bool offloaded = rcu_rdp_is_offloaded(rdp);
struct rcu_head *rhp;
struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
long bl, count = 0;
@@ -2592,9 +2591,6 @@ static void rcu_do_batch(struct rcu_data *rdp)
rcu_nocb_unlock_irqrestore(rdp, flags);
- /* Re-invoke RCU core processing if there are callbacks remaining. */
- if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist))
- invoke_rcu_core();
tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
}
@@ -2781,8 +2777,12 @@ static __latent_entropy void rcu_core(void)
/* If there are callbacks ready, invoke them. */
if (do_batch && rcu_segcblist_ready_cbs(&rdp->cblist) &&
- likely(READ_ONCE(rcu_scheduler_fully_active)))
+ likely(READ_ONCE(rcu_scheduler_fully_active))) {
rcu_do_batch(rdp);
+ /* Re-invoke RCU core processing if there are callbacks remaining. */
+ if (rcu_segcblist_ready_cbs(&rdp->cblist))
+ invoke_rcu_core();
+ }
/* Do any needed deferred wakeups of rcuo kthreads. */
do_nocb_deferred_wakeup(rdp);
--
2.25.1
When callbacks are offloaded, the NOCB kthreads handle the callbacks
progression on behalf of rcu_core().
However during the (de-)offloading process, the kthread may not be
entirely up to the task. As a result some callbacks grace period
sequence number may remain stale for a while because rcu_core() won't
take care of them either.
Fix this with forcing callbacks acceleration from rcu_core() as long
as the offloading process isn't complete.
Reported-by: Valentin Schneider <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Uladzislau Rezki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
---
kernel/rcu/tree.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 32303070b20b..73971b8024d8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2288,6 +2288,7 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
unsigned long flags;
unsigned long mask;
bool needwake = false;
+ bool needacc = false;
struct rcu_node *rnp;
WARN_ON_ONCE(rdp->cpu != smp_processor_id());
@@ -2315,16 +2316,29 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
* This GP can't end until cpu checks in, so all of our
* callbacks can be processed during the next GP.
*
- * NOCB kthreads have their own way to deal with that.
+ * NOCB kthreads have their own way to deal with that...
*/
- if (!rcu_rdp_is_offloaded(rdp))
+ if (!rcu_rdp_is_offloaded(rdp)) {
needwake = rcu_accelerate_cbs(rnp, rdp);
+ } else if (!rcu_segcblist_completely_offloaded(&rdp->cblist)) {
+ /*
+ * ...but NOCB kthreads may miss or delay callbacks acceleration
+ * if in the middle of a (de-)offloading process.
+ */
+ needacc = true;
+ }
rcu_disable_urgency_upon_qs(rdp);
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
/* ^^^ Released rnp->lock */
if (needwake)
rcu_gp_kthread_wake();
+
+ if (needacc) {
+ rcu_nocb_lock_irqsave(rdp, flags);
+ rcu_accelerate_cbs_unlocked(rnp, rdp);
+ rcu_nocb_unlock_irqrestore(rdp, flags);
+ }
}
}
--
2.25.1
On 2021-09-30 00:10:12 [+0200], Frederic Weisbecker wrote:
> rcu_core() tries to ensure that its self-invocation in case of callbacks
> overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
> to trigger local RCU core from nocb_cb kthread since it can execute
> on a CPU different from the target rdp. Also in case of overload, the
> nocb_cb kthread simply iterates a new loop of callbacks processing.
>
> However the "offloaded" check that aims at preventing that is wrong.
- that?
> First of all that state is volatile and second: softirq/rcuc can
> execute while the target rdp is offloaded. As a result rcu_core() can
can what?
> Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
> irrespective of the rdp offloaded state.
Sebastian
On 30/09/21 00:10, Frederic Weisbecker wrote:
> Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
> segcblist flags because it is an exclusive state that doesn't mix up
> with the other flags. Remove it in favour of:
>
> _ A flag specifying that rcu_core() needs to perform callbacks execution
> and acceleration
>
> and
>
> _ A flag specifying we want the nocb lock to be held in any needed
> circumstances
>
> This clarifies the code and is more flexible: It allows to have a state
> where rcu_core() runs with locking while offloading hasn't started yet.
> This is a necessary step to prepare for triggering rcu_core() at the
> very beginning of the de-offloading process so that rcu_core() won't
> dismiss work while being preempted by the de-offloading process, at
> least not without a pending subsequent rcu_core() that will quickly
> catch up.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
One question and a comment nit below, other than that:
Reviewed-by: Valentin Schneider <[email protected]>
> @@ -84,7 +84,7 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
> static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
It doesn't show up on the diff but there's a SEGCBLIST_SOFTIRQ_ONLY
straggler in the comment above (the last one according to grep).
> {
> if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> - !rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY))
> + rcu_segcblist_test_flags(rsclp, SEGCBLIST_LOCKING))
> return true;
>
> return false;
> @@ -1000,12 +1000,12 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> */
> rcu_nocb_lock_irqsave(rdp, flags);
> /*
> - * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb
> + * Theoretically we could clear SEGCBLIST_LOCKING after the nocb
> * lock is released but how about being paranoid for once?
> */
> - rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
> + rcu_segcblist_clear_flags(cblist, SEGCBLIST_LOCKING);
Thought experiment for me; AFAICT the comment still holds: we can't offload
until deoffload has finished, and we shouldn't be able to preempt
rcu_core() while it holds ->nocb_lock. With that said, I'm all for
paranoia.
> /*
> - * With SEGCBLIST_SOFTIRQ_ONLY, we can't use
> + * Without SEGCBLIST_LOCKING, we can't use
> * rcu_nocb_unlock_irqrestore() anymore.
> */
> raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
On 30/09/21 00:10, Frederic Weisbecker wrote:
> PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
> concurrent NOCB (de-)offloading.
>
> Thomas suggested to drop the local_lock() based solution and simply
> check the offloaded state while context looks safe but that's not
> enough. Here is a bit of rework.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> rcu/rt
>
> HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
>
> Thanks,
> Frederic
FWIW I've had RCU torture with NOCB toggling running for half a day on my
Arm Juno and nothing to report. I still need to rebase this on an -rt tree
and give it a spin with CONFIG_PREEMPT_RT.
On 30/09/21 00:10, Frederic Weisbecker wrote:
> From: Thomas Gleixner <[email protected]>
>
> While reporting a quiescent state for a given CPU, rcu_core() takes
> advantage of the freshly loaded grace period sequence number and the
> locked rnp to accelerate the callbacks whose sequence number have been
> assigned a stale value.
>
> This action is only necessary when the rdp isn't offloaded, otherwise
> the NOCB kthreads already take care of the callbacks progression.
>
> However the check for the offloaded state is volatile because it is
> performed outside the IRQs disabled section. It's possible for the
> offloading process to preempt rcu_core() at that point on PREEMPT_RT.
>
> This is dangerous because rcu_core() may end up accelerating callbacks
> concurrently with NOCB kthreads without appropriate locking.
>
> Fix this with moving the offloaded check inside the rnp locking section.
>
> Reported-by: Valentin Schneider <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
On 30/09/21 00:10, Frederic Weisbecker wrote:
> On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
> some work, such as callbacks acceleration and invocation, may be left
> unattended due to the volatile checks on the offloaded state.
>
> In the worst case this work is postponed until the next rcu_pending()
> check that can take a jiffy to reach, which can be a problem in case
> of callbacks flooding.
>
> Solve that with invoking rcu_core() early in the de-offloading process.
> This way any work dismissed by an ongoing rcu_core() call fooled by
> a preempting deoffloading process will be caught up by a nearby future
> recall to rcu_core(), this time fully aware of the de-offloading state.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
One comment/question below.
> @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> * will refuse to put anything into the bypass.
> */
> WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> + /*
> + * Start with invoking rcu_core() early. This way if the current thread
> + * happens to preempt an ongoing call to rcu_core() in the middle,
> + * leaving some work dismissed because rcu_core() still thinks the rdp is
> + * completely offloaded, we are guaranteed a nearby future instance of
> + * rcu_core() to catch up.
> + */
> + rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> + invoke_rcu_core();
I think your approach is a bit neater, but would there have been any issue
with keeping the setting of SEGCBLIST_RCU_CORE within
rcu_segcblist_offload() and bundling it with an invoke_rcu_core()?
> ret = rdp_offload_toggle(rdp, false, flags);
> swait_event_exclusive(rdp->nocb_state_wq,
> !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB |
> --
> 2.25.1
On 30/09/21 00:10, Frederic Weisbecker wrote:
> When callbacks are offloaded, the NOCB kthreads handle the callbacks
> progression on behalf of rcu_core().
>
> However during the (de-)offloading process, the kthread may not be
> entirely up to the task. As a result some callbacks grace period
> sequence number may remain stale for a while because rcu_core() won't
> take care of them either.
>
But that should be taken care of at the tail end of the (de)offloading
process, either by rcu_core() or by the NOCB kthreads, no?
Or is it e.g. in the case of offloading, we want to make sure an rcu_core()
invocation runs callback acceleration because even if the NOCB GP/CB
kthreads are being set up, we're not guaranteed is going to do that
straight away?
IIUC it would be a similar case for deoffload when we stash the NOCB GP/CB
kthreads and get rcu_core() running concurrently.
> Fix this with forcing callbacks acceleration from rcu_core() as long
> as the offloading process isn't complete.
>
On 30/09/21 00:10, Frederic Weisbecker wrote:
> Instead of hardcoding IRQ save and nocb lock, use the consolidated
> API.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
Just one comment nit below.
Reviewed-by: Valentin Schneider <[email protected]>
> ---
> kernel/rcu/tree.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b1fc6e498d90..1971a4e15e96 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2492,8 +2492,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
While at it:
- * Extract the list of ready callbacks, disabling to prevent
+- * Extract the list of ready callbacks, disabling IRQs to prevent
> * races with call_rcu() from interrupt handlers. Leave the
> * callback counts, as rcu_barrier() needs to be conservative.
> */
> - local_irq_save(flags);
> - rcu_nocb_lock(rdp);
> + rcu_nocb_lock_irqsave(rdp, flags);
> WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
> pending = rcu_segcblist_n_cbs(&rdp->cblist);
> div = READ_ONCE(rcu_divisor);
On 30/09/21 00:10, Frederic Weisbecker wrote:
> The callbacks processing time limit makes sure we are not exceeding a
> given amount of time executing the queue.
>
> However its "continue" clause bypasses the cond_resched() call on
> rcuc and NOCB kthreads, delaying it until we reach the limit, which can
> be very long...
>
> Make sure the scheduler has a higher priority than the time limit.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
On 30/09/21 00:10, Frederic Weisbecker wrote:
> The current condition to limit the number of callbacks executed in a
> row checks the offloaded state of the rdp. Not only is it volatile
> but it is also misleading: the rcu_core() may well be executing
> callbacks concurrently with NOCB kthreads, and the offloaded state
> would then be verified on both cases. As a result the limit would
> spuriously not apply anymore on softirq while in the middle of
> (de-)offloading process.
>
> Another issue with the condition is that rcu_is_callbacks_kthread()
> doesn't check if we are actually running callbacks from rcuc itself or
> from a softirq interrupting rcuc.
>
Doesn't rcutree.use_softirq imply rcuc is never woken, in which case
RCU_SOFTIRQ can't interrupt rcuc (e.g. while run atop an IRQ exit)?
I suppose during the (de)offload sequence we could have RCU_SOFTIRQ running
atop the NOCB CB kthread, but that's not something
rcu_is_callbacks_kthread() detects.
Also, why is rcu_is_callbacks_kthread() hardcoded to false for
!CONFIG_RCU_BOOST? Isn't it relevant for do_rcu_batch() ratelimiting
regardless (at least before your patches)?
On 30/09/21 00:10, Frederic Weisbecker wrote:
> Time limit only makes sense when callbacks are serviced in softirq mode
> because:
>
> _ In case we need to get back to the scheduler,
> cond_resched_tasks_rcu_qs() is called after each callback.
>
> _ In case some other softirq vector needs the CPU, the call to
> local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
> them via a call to do_softirq().
>
> _ The time spent on other tasks after scheduling out, or on softirqs
> processing, is spuriously accounted to the time limit.
>
That wasn't the case before ("rcu: Fix callbacks processing time limit
retaining cond_resched()"), though under PREEMPT_RT that *was* true (since
bh-off remains preemptible). So I'd say that's a change we want.
> Therefore, make sure the time limit only applies to softirq mode.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Josh Triplett <[email protected]>
> Cc: Joel Fernandes <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Neeraj Upadhyay <[email protected]>
> Cc: Uladzislau Rezki <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
On Fri, Oct 01, 2021 at 06:48:28PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > Currently SEGCBLIST_SOFTIRQ_ONLY is a bit of an exception among the
> > segcblist flags because it is an exclusive state that doesn't mix up
> > with the other flags. Remove it in favour of:
> >
> > _ A flag specifying that rcu_core() needs to perform callbacks execution
> > and acceleration
> >
> > and
> >
> > _ A flag specifying we want the nocb lock to be held in any needed
> > circumstances
> >
> > This clarifies the code and is more flexible: It allows to have a state
> > where rcu_core() runs with locking while offloading hasn't started yet.
> > This is a necessary step to prepare for triggering rcu_core() at the
> > very beginning of the de-offloading process so that rcu_core() won't
> > dismiss work while being preempted by the de-offloading process, at
> > least not without a pending subsequent rcu_core() that will quickly
> > catch up.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Sebastian Andrzej Siewior <[email protected]>
> > Cc: Josh Triplett <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Cc: Uladzislau Rezki <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
>
> One question and a comment nit below, other than that:
>
> Reviewed-by: Valentin Schneider <[email protected]>
>
> > @@ -84,7 +84,7 @@ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp)
> > static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp)
>
> It doesn't show up on the diff but there's a SEGCBLIST_SOFTIRQ_ONLY
> straggler in the comment above (the last one according to grep).
Ah thanks, I'll remove that.
>
> > {
> > if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> > - !rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY))
> > + rcu_segcblist_test_flags(rsclp, SEGCBLIST_LOCKING))
> > return true;
> >
> > return false;
>
> > @@ -1000,12 +1000,12 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> > */
> > rcu_nocb_lock_irqsave(rdp, flags);
> > /*
> > - * Theoretically we could set SEGCBLIST_SOFTIRQ_ONLY after the nocb
> > + * Theoretically we could clear SEGCBLIST_LOCKING after the nocb
> > * lock is released but how about being paranoid for once?
> > */
> > - rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY);
> > + rcu_segcblist_clear_flags(cblist, SEGCBLIST_LOCKING);
>
> Thought experiment for me; AFAICT the comment still holds: we can't offload
> until deoffload has finished, and we shouldn't be able to preempt
> rcu_core() while it holds ->nocb_lock. With that said, I'm all for
> paranoia.
Exactly :)
Thanks.
On Fri, Oct 01, 2021 at 06:51:08PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > The current condition to limit the number of callbacks executed in a
> > row checks the offloaded state of the rdp. Not only is it volatile
> > but it is also misleading: the rcu_core() may well be executing
> > callbacks concurrently with NOCB kthreads, and the offloaded state
> > would then be verified on both cases. As a result the limit would
> > spuriously not apply anymore on softirq while in the middle of
> > (de-)offloading process.
> >
> > Another issue with the condition is that rcu_is_callbacks_kthread()
> > doesn't check if we are actually running callbacks from rcuc itself or
> > from a softirq interrupting rcuc.
> >
>
> Doesn't rcutree.use_softirq imply rcuc is never woken, in which case
> RCU_SOFTIRQ can't interrupt rcuc (e.g. while run atop an IRQ exit)?
> I suppose during the (de)offload sequence we could have RCU_SOFTIRQ running
> atop the NOCB CB kthread, but that's not something
> rcu_is_callbacks_kthread() detects.
Yes good point, I don't know if rcuc can be ever be interrupted by
irq_exit() -> do_softirq() -> rcu_core() itself after all.
Paul can probably confirm your point?
>
> Also, why is rcu_is_callbacks_kthread() hardcoded to false for
> !CONFIG_RCU_BOOST? Isn't it relevant for do_rcu_batch() ratelimiting
> regardless (at least before your patches)?
I believe rcuc is only used on CONFIG_RCU_BOOST?
On Fri, Oct 01, 2021 at 06:50:04PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > On PREEMPT_RT, if rcu_core() is preempted by the de-offloading process,
> > some work, such as callbacks acceleration and invocation, may be left
> > unattended due to the volatile checks on the offloaded state.
> >
> > In the worst case this work is postponed until the next rcu_pending()
> > check that can take a jiffy to reach, which can be a problem in case
> > of callbacks flooding.
> >
> > Solve that with invoking rcu_core() early in the de-offloading process.
> > This way any work dismissed by an ongoing rcu_core() call fooled by
> > a preempting deoffloading process will be caught up by a nearby future
> > recall to rcu_core(), this time fully aware of the de-offloading state.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Sebastian Andrzej Siewior <[email protected]>
> > Cc: Josh Triplett <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Cc: Uladzislau Rezki <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
>
> One comment/question below.
>
> > @@ -990,6 +990,15 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> > * will refuse to put anything into the bypass.
> > */
> > WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> > + /*
> > + * Start with invoking rcu_core() early. This way if the current thread
> > + * happens to preempt an ongoing call to rcu_core() in the middle,
> > + * leaving some work dismissed because rcu_core() still thinks the rdp is
> > + * completely offloaded, we are guaranteed a nearby future instance of
> > + * rcu_core() to catch up.
> > + */
> > + rcu_segcblist_set_flags(cblist, SEGCBLIST_RCU_CORE);
> > + invoke_rcu_core();
>
> I think your approach is a bit neater, but would there have been any issue
> with keeping the setting of SEGCBLIST_RCU_CORE within
> rcu_segcblist_offload() and bundling it with an invoke_rcu_core()?
Probably not in practice.
But in theory, it may be more comfortable to read the following in order:
1) Set SEGCBLIST_RCU_CORE so subsequent invocations of rcu_core() handle
callbacks
2) Invoke rcu_core()
3) Only once we achieved the above we can clear SEGCBLIST_OFFLOADED which
will stop the nocb kthreads.
If we did 3) first and only then 1) and 2), there would be a risk that callbacks
get completely ignored in the middle.
That said you have a point in that we could do:
1) Set SEGCBLIST_RCU_CORE and clear SEGCBLIST_OFFLOADED at the _very_ same time
(arrange that with a WRITE_ONCE() I guess).
2) Invoke rcu_core()
But well...arranging for rcu_core() to take over before we even consider
starting the de-offloading process provides some unexplainable relief to the
soul. Some code design sometimes rely more on faith than logic :)
Thanks.
On Fri, Oct 01, 2021 at 06:50:39PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > When callbacks are offloaded, the NOCB kthreads handle the callbacks
> > progression on behalf of rcu_core().
> >
> > However during the (de-)offloading process, the kthread may not be
> > entirely up to the task. As a result some callbacks grace period
> > sequence number may remain stale for a while because rcu_core() won't
> > take care of them either.
> >
>
> But that should be taken care of at the tail end of the (de)offloading
> process, either by rcu_core() or by the NOCB kthreads, no?
True but the (de-)offloading process can take a random amount of time to
complete. During this time if the queue of callbacks is already huge, things
can explode.
>
> Or is it e.g. in the case of offloading, we want to make sure an rcu_core()
> invocation runs callback acceleration because even if the NOCB GP/CB
> kthreads are being set up, we're not guaranteed is going to do that
> straight away?
Right.
>
> IIUC it would be a similar case for deoffload when we stash the NOCB GP/CB
> kthreads and get rcu_core() running concurrently.
You got it!
Thanks.
On Fri, Oct 01, 2021 at 06:51:32PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > Time limit only makes sense when callbacks are serviced in softirq mode
> > because:
> >
> > _ In case we need to get back to the scheduler,
> > cond_resched_tasks_rcu_qs() is called after each callback.
> >
> > _ In case some other softirq vector needs the CPU, the call to
> > local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
> > them via a call to do_softirq().
> >
> > _ The time spent on other tasks after scheduling out, or on softirqs
> > processing, is spuriously accounted to the time limit.
> >
>
> That wasn't the case before ("rcu: Fix callbacks processing time limit
> retaining cond_resched()")
But if cond_resched_tasks_rcu_qs() was called and then on the next iteration
tlimit is checked, the time spent scheduling out is included, right?
Thanks.
> , though under PREEMPT_RT that *was* true (since
> bh-off remains preemptible). So I'd say that's a change we want.
>
> > Therefore, make sure the time limit only applies to softirq mode.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Sebastian Andrzej Siewior <[email protected]>
> > Cc: Josh Triplett <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Cc: Uladzislau Rezki <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
>
> Reviewed-by: Valentin Schneider <[email protected]>
On Fri, Oct 01, 2021 at 06:50:55PM +0100, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
> > Instead of hardcoding IRQ save and nocb lock, use the consolidated
> > API.
> >
> > Signed-off-by: Frederic Weisbecker <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Sebastian Andrzej Siewior <[email protected]>
> > Cc: Josh Triplett <[email protected]>
> > Cc: Joel Fernandes <[email protected]>
> > Cc: Boqun Feng <[email protected]>
> > Cc: Neeraj Upadhyay <[email protected]>
> > Cc: Uladzislau Rezki <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
>
> Just one comment nit below.
>
> Reviewed-by: Valentin Schneider <[email protected]>
>
> > ---
> > kernel/rcu/tree.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b1fc6e498d90..1971a4e15e96 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2492,8 +2492,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>
> While at it:
>
> - * Extract the list of ready callbacks, disabling to prevent
> +- * Extract the list of ready callbacks, disabling IRQs to prevent
Good catch, applied that in the patch.
Thanks.
On Mon, Oct 04, 2021 at 03:42:27PM +0200, Frederic Weisbecker wrote:
> On Fri, Oct 01, 2021 at 06:51:08PM +0100, Valentin Schneider wrote:
> > On 30/09/21 00:10, Frederic Weisbecker wrote:
> > > The current condition to limit the number of callbacks executed in a
> > > row checks the offloaded state of the rdp. Not only is it volatile
> > > but it is also misleading: the rcu_core() may well be executing
> > > callbacks concurrently with NOCB kthreads, and the offloaded state
> > > would then be verified on both cases. As a result the limit would
> > > spuriously not apply anymore on softirq while in the middle of
> > > (de-)offloading process.
> > >
> > > Another issue with the condition is that rcu_is_callbacks_kthread()
> > > doesn't check if we are actually running callbacks from rcuc itself or
> > > from a softirq interrupting rcuc.
> > >
> >
> > Doesn't rcutree.use_softirq imply rcuc is never woken, in which case
> > RCU_SOFTIRQ can't interrupt rcuc (e.g. while run atop an IRQ exit)?
> > I suppose during the (de)offload sequence we could have RCU_SOFTIRQ running
> > atop the NOCB CB kthread, but that's not something
> > rcu_is_callbacks_kthread() detects.
>
> Yes good point, I don't know if rcuc can be ever be interrupted by
> irq_exit() -> do_softirq() -> rcu_core() itself after all.
>
> Paul can probably confirm your point?
This is supposed to be prohibited by the fact that rcutree.use_softirq
prevents the rcuc kthreads from being spawned in the first place.
That said, I do not recall having thought through the de-offloading
process with the rcuc kthreads in mind. There could easily be bugs
in this case. (I -think- that it is OK because each rcuc kthread is
confined to running on the corresponding CPU and because it disables BH.
But it is the system's opinion that counts.)
> > Also, why is rcu_is_callbacks_kthread() hardcoded to false for
> > !CONFIG_RCU_BOOST? Isn't it relevant for do_rcu_batch() ratelimiting
> > regardless (at least before your patches)?
>
> I believe rcuc is only used on CONFIG_RCU_BOOST?
They are independent, though by default RT arranges for rcuc kthreads and
CONFIG_RCU_BOOST. Now there are kthreads created for CONFIG_RCU_BOOST,
but these are the rcub kthreads, and they are not per CPU, but rather
per leaf rcu_node structure.
Thanx, Paul
On 04/10/21 15:47, Frederic Weisbecker wrote:
> On Fri, Oct 01, 2021 at 06:51:32PM +0100, Valentin Schneider wrote:
>> On 30/09/21 00:10, Frederic Weisbecker wrote:
>> > Time limit only makes sense when callbacks are serviced in softirq mode
>> > because:
>> >
>> > _ In case we need to get back to the scheduler,
>> > cond_resched_tasks_rcu_qs() is called after each callback.
>> >
>> > _ In case some other softirq vector needs the CPU, the call to
>> > local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
>> > them via a call to do_softirq().
>> >
>> > _ The time spent on other tasks after scheduling out, or on softirqs
>> > processing, is spuriously accounted to the time limit.
>> >
>>
>> That wasn't the case before ("rcu: Fix callbacks processing time limit
>> retaining cond_resched()")
>
> But if cond_resched_tasks_rcu_qs() was called and then on the next iteration
> tlimit is checked, the time spent scheduling out is included, right?
>
if tlimit was set, then that branch would either continue or break; both
cases would have skipped over the cond_resched_tasks_rcu_qs() (which the
aforementioned patch addresses).
On Thu, Sep 30, 2021 at 12:10:01AM +0200, Frederic Weisbecker wrote:
> PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
> concurrent NOCB (de-)offloading.
>
> Thomas suggested to drop the local_lock() based solution and simply
> check the offloaded state while context looks safe but that's not
> enough. Here is a bit of rework.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> rcu/rt
>
> HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
Many of these look quite good, but any chance of getting an official
Tested-by from someone in the -rt community?
Thanx, Paul
> Thanks,
> Frederic
> ---
>
> Frederic Weisbecker (10):
> rcu/nocb: Make local rcu_nocb_lock_irqsave() safe against concurrent deoffloading
> rcu/nocb: Prepare state machine for a new step
> rcu/nocb: Invoke rcu_core() at the start of deoffloading
> rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safe
> rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_check
> rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()
> rcu/nocb: Limit number of softirq callbacks only on softirq
> rcu: Fix callbacks processing time limit retaining cond_resched()
> rcu: Apply callbacks processing time limit only on softirq
> rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthread
>
> Thomas Gleixner (1):
> rcu/nocb: Make rcu_core() callbacks acceleration preempt-safe
>
>
> include/linux/rcu_segcblist.h | 51 +++++++++++++++++++-------
> kernel/rcu/rcu_segcblist.c | 10 ++---
> kernel/rcu/rcu_segcblist.h | 7 ++--
> kernel/rcu/tree.c | 85 ++++++++++++++++++++++++++++++-------------
> kernel/rcu/tree.h | 16 +++++---
> kernel/rcu/tree_nocb.h | 29 ++++++++++++---
> 6 files changed, 136 insertions(+), 62 deletions(-)
On Wed, Oct 06, 2021 at 04:12:12PM +0100, Valentin Schneider wrote:
> On 04/10/21 15:47, Frederic Weisbecker wrote:
> > On Fri, Oct 01, 2021 at 06:51:32PM +0100, Valentin Schneider wrote:
> >> On 30/09/21 00:10, Frederic Weisbecker wrote:
> >> > Time limit only makes sense when callbacks are serviced in softirq mode
> >> > because:
> >> >
> >> > _ In case we need to get back to the scheduler,
> >> > cond_resched_tasks_rcu_qs() is called after each callback.
> >> >
> >> > _ In case some other softirq vector needs the CPU, the call to
> >> > local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about
> >> > them via a call to do_softirq().
> >> >
> >> > _ The time spent on other tasks after scheduling out, or on softirqs
> >> > processing, is spuriously accounted to the time limit.
> >> >
> >>
> >> That wasn't the case before ("rcu: Fix callbacks processing time limit
> >> retaining cond_resched()")
> >
> > But if cond_resched_tasks_rcu_qs() was called and then on the next iteration
> > tlimit is checked, the time spent scheduling out is included, right?
> >
>
> if tlimit was set, then that branch would either continue or break; both
> cases would have skipped over the cond_resched_tasks_rcu_qs() (which the
> aforementioned patch addresses).
Duh, right indeed. I need to clarify the changelog.
Thanks!
On 2021-10-06 08:13:39 [-0700], Paul E. McKenney wrote:
> On Thu, Sep 30, 2021 at 12:10:01AM +0200, Frederic Weisbecker wrote:
> > PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
> > concurrent NOCB (de-)offloading.
> >
> > Thomas suggested to drop the local_lock() based solution and simply
> > check the offloaded state while context looks safe but that's not
> > enough. Here is a bit of rework.
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > rcu/rt
> >
> > HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
>
> Many of these look quite good, but any chance of getting an official
> Tested-by from someone in the -rt community?
I looked over the series, bootet the series and run a quick rcutorture
and didn't notice anything.
Tested-by: Sebastian Andrzej Siewior <[email protected]>
> Thanx, Paul
Sebastian
On Thu, Oct 07, 2021 at 10:49:20AM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-10-06 08:13:39 [-0700], Paul E. McKenney wrote:
> > On Thu, Sep 30, 2021 at 12:10:01AM +0200, Frederic Weisbecker wrote:
> > > PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
> > > concurrent NOCB (de-)offloading.
> > >
> > > Thomas suggested to drop the local_lock() based solution and simply
> > > check the offloaded state while context looks safe but that's not
> > > enough. Here is a bit of rework.
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > > rcu/rt
> > >
> > > HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
> >
> > Many of these look quite good, but any chance of getting an official
> > Tested-by from someone in the -rt community?
>
> I looked over the series, bootet the series and run a quick rcutorture
> and didn't notice anything.
>
> Tested-by: Sebastian Andrzej Siewior <[email protected]>
Thank you, Sebastian!!!
Thanx, Paul
On 01/10/21 18:47, Valentin Schneider wrote:
> On 30/09/21 00:10, Frederic Weisbecker wrote:
>> PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
>> concurrent NOCB (de-)offloading.
>>
>> Thomas suggested to drop the local_lock() based solution and simply
>> check the offloaded state while context looks safe but that's not
>> enough. Here is a bit of rework.
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
>> rcu/rt
>>
>> HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
>>
>> Thanks,
>> Frederic
>
> FWIW I've had RCU torture with NOCB toggling running for half a day on my
> Arm Juno and nothing to report. I still need to rebase this on an -rt tree
> and give it a spin with CONFIG_PREEMPT_RT.
Rebased on top of v5.15-rc4-rt7-rebase with Thomas' patch reverted and ran
the same thing under CONFIG_PREEMPT_RT, nothing seems to catch on fire, so:
Tested-by: Valentin Schneider <[email protected]>
On Fri, Oct 08, 2021 at 03:03:02PM +0100, Valentin Schneider wrote:
> On 01/10/21 18:47, Valentin Schneider wrote:
> > On 30/09/21 00:10, Frederic Weisbecker wrote:
> >> PREEMPT_RT has made rcu_core() preemptible, making it unsafe against
> >> concurrent NOCB (de-)offloading.
> >>
> >> Thomas suggested to drop the local_lock() based solution and simply
> >> check the offloaded state while context looks safe but that's not
> >> enough. Here is a bit of rework.
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> >> rcu/rt
> >>
> >> HEAD: aac1c58961446c731f2e989bd822ca1fd2659bad
> >>
> >> Thanks,
> >> Frederic
> >
> > FWIW I've had RCU torture with NOCB toggling running for half a day on my
> > Arm Juno and nothing to report. I still need to rebase this on an -rt tree
> > and give it a spin with CONFIG_PREEMPT_RT.
>
> Rebased on top of v5.15-rc4-rt7-rebase with Thomas' patch reverted and ran
> the same thing under CONFIG_PREEMPT_RT, nothing seems to catch on fire, so:
>
> Tested-by: Valentin Schneider <[email protected]>
Thank you!!!
Thanx, Paul
Phew, the last changelog is the most careless...
On Thu, Sep 30, 2021 at 05:37:37PM +0200, Sebastian Andrzej Siewior wrote:
> On 2021-09-30 00:10:12 [+0200], Frederic Weisbecker wrote:
> > rcu_core() tries to ensure that its self-invocation in case of callbacks
> > overload only happen in softirq/rcuc mode. Indeed it doesn't make sense
> > to trigger local RCU core from nocb_cb kthread since it can execute
> > on a CPU different from the target rdp. Also in case of overload, the
> > nocb_cb kthread simply iterates a new loop of callbacks processing.
> >
> > However the "offloaded" check that aims at preventing that is wrong.
> - that?
Rephrasing the changelog:
"However the "offloaded" check that aims at preventing misplaced
rcu_core() invocations is wrong."
>
> > First of all that state is volatile and second: softirq/rcuc can
> > execute while the target rdp is offloaded. As a result rcu_core() can
>
> can what?
"As a result rcu_core() can be invoked on the wrong CPU while in the
process of (de-offloading)."
Thanks!
>
> > Fix that with moving the rcu_core() self-invocation to rcu_core() itself,
> > irrespective of the rdp offloaded state.
>
> Sebastian