2020-04-20 15:42:26

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH rcu/dev -fixes 0/4]

Hi,
Some of these patches would prevent breakage on PREEMPT_RT. I have marked them
as "rcu/dev fixes". Please consider applying for v5.8. The exceptions are 3/4
and 4/4 which some small clean-ups. Thanks!

Joel Fernandes (Google) (3):
rcu/tree: Keep kfree_rcu() awake during lock contention
rcu/tree: Skip entry into the page allocator for PREEMPT_RT
rcu/tree: Use consistent style for comments

Sebastian Andrzej Siewior (1):
rcu/tree: Avoid using xchg() in kfree_call_rcu_add_ptr_to_bulk()

kernel/rcu/tree.c | 61 ++++++++++++++++++++++++++++-------------------
1 file changed, 37 insertions(+), 24 deletions(-)

--
2.26.1.301.g55bc3eb7cb9-goog


2020-04-20 15:42:59

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH rcu/dev -fixes 1/4] rcu/tree: Keep kfree_rcu() awake during lock contention

On PREEMPT_RT kernels, contending on the krcp spinlock can cause
sleeping as on these kernels, the spinlock is converted to an rt-mutex.
To prevent breakage of possible usage of kfree_rcu() now or in the
future, make use of raw spinlocks which are not subject to such
conversions.

Vetting all code paths, there is no reason to believe that the raw
spinlock will be held for long time so PREEMPT_RT should not suffer from
lengthy acquirals of the lock.

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index f288477ee1c26..cf68d3d9f5b81 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2905,7 +2905,7 @@ struct kfree_rcu_cpu {
struct kfree_rcu_bulk_data *bhead;
struct kfree_rcu_bulk_data *bcached;
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
- spinlock_t lock;
+ raw_spinlock_t lock;
struct delayed_work monitor_work;
bool monitor_todo;
bool initialized;
@@ -2939,12 +2939,12 @@ static void kfree_rcu_work(struct work_struct *work)
krwp = container_of(to_rcu_work(work),
struct kfree_rcu_cpu_work, rcu_work);
krcp = krwp->krcp;
- spin_lock_irqsave(&krcp->lock, flags);
+ raw_spin_lock_irqsave(&krcp->lock, flags);
head = krwp->head_free;
krwp->head_free = NULL;
bhead = krwp->bhead_free;
krwp->bhead_free = NULL;
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);

/* "bhead" is now private, so traverse locklessly. */
for (; bhead; bhead = bnext) {
@@ -3047,14 +3047,14 @@ static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
krcp->monitor_todo = false;
if (queue_kfree_rcu_work(krcp)) {
// Success! Our job is done here.
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
return;
}

// Previous RCU batch still in progress, try again later.
krcp->monitor_todo = true;
schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
}

/*
@@ -3067,11 +3067,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
monitor_work.work);

- spin_lock_irqsave(&krcp->lock, flags);
+ raw_spin_lock_irqsave(&krcp->lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
else
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
}

static inline bool
@@ -3142,7 +3142,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
local_irq_save(flags); // For safely calling this_cpu_ptr().
krcp = this_cpu_ptr(&krc);
if (krcp->initialized)
- spin_lock(&krcp->lock);
+ raw_spin_lock(&krcp->lock);

// Queue the object but don't yet schedule the batch.
if (debug_rcu_head_queue(head)) {
@@ -3173,7 +3173,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)

unlock_return:
if (krcp->initialized)
- spin_unlock(&krcp->lock);
+ raw_spin_unlock(&krcp->lock);
local_irq_restore(flags);
}
EXPORT_SYMBOL_GPL(kfree_call_rcu);
@@ -3205,11 +3205,11 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

count = krcp->count;
- spin_lock_irqsave(&krcp->lock, flags);
+ raw_spin_lock_irqsave(&krcp->lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
else
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);

sc->nr_to_scan -= count;
freed += count;
@@ -3236,15 +3236,15 @@ void __init kfree_rcu_scheduler_running(void)
for_each_online_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

- spin_lock_irqsave(&krcp->lock, flags);
+ raw_spin_lock_irqsave(&krcp->lock, flags);
if (!krcp->head || krcp->monitor_todo) {
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
continue;
}
krcp->monitor_todo = true;
schedule_delayed_work_on(cpu, &krcp->monitor_work,
KFREE_DRAIN_JIFFIES);
- spin_unlock_irqrestore(&krcp->lock, flags);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
}
}

@@ -4140,7 +4140,7 @@ static void __init kfree_rcu_batch_init(void)
for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

- spin_lock_init(&krcp->lock);
+ raw_spin_lock_init(&krcp->lock);
for (i = 0; i < KFREE_N_BATCHES; i++) {
INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
krcp->krw_arr[i].krcp = krcp;
--
2.26.1.301.g55bc3eb7cb9-goog

2020-04-20 16:41:38

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH rcu/dev -fixes 2/4] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

To keep kfree_rcu() path working on raw non-preemptible sections,
prevent the optional entry into the allocator as it uses sleeping locks.
In fact, even if the caller of kfree_rcu() is preemptible, this path
still is not, as krcp->lock is a raw spinlock as done in previous
patches. With additional page pre-allocation in the works, hitting this
return is going to be much less likely soon so just prevent it for now
so that PREEMPT_RT does not break. Note that page allocation here is an
optimization and skipping it still makes kfree_rcu() work.

Cc: Sebastian Andrzej Siewior <[email protected]>
Co-developed-by: Uladzislau Rezki <[email protected]>
Signed-off-by: Uladzislau Rezki <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cf68d3d9f5b81..cd61649e1b001 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3092,6 +3092,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
if (!bnode) {
WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);

+ /*
+ * To keep this path working on raw non-preemptible
+ * sections, prevent the optional entry into the
+ * allocator as it uses sleeping locks. In fact, even
+ * if the caller of kfree_rcu() is preemptible, this
+ * path still is not, as krcp->lock is a raw spinlock.
+ * With additional page pre-allocation in the works,
+ * hitting this return is going to be much less likely.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ return false;
+
bnode = (struct kfree_rcu_bulk_data *)
__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
}
--
2.26.1.301.g55bc3eb7cb9-goog

2020-04-20 16:41:42

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH rcu/dev -fixes 3/4] rcu/tree: Avoid using xchg() in kfree_call_rcu_add_ptr_to_bulk()

From: Sebastian Andrzej Siewior <[email protected]>

There is no need to use xchg(), the access is serialized by krcp->lock.
The xchg() function adds some atomic barriers which remain hidden in
x86's disassembly but are visible on ARM for instance.

Replace xchg() with a load + store.

Acked-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/rcu/tree.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cd61649e1b001..f6eb3aee0935e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3088,7 +3088,8 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
/* Check if a new block is required. */
if (!krcp->bhead ||
krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
- bnode = xchg(&krcp->bcached, NULL);
+ bnode = krcp->bcached;
+ krcp->bcached = NULL;
if (!bnode) {
WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);

--
2.26.1.301.g55bc3eb7cb9-goog

2020-04-20 17:23:06

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 3/4] rcu/tree: Avoid using xchg() in kfree_call_rcu_add_ptr_to_bulk()

On Mon, Apr 20, 2020 at 11:38:36AM -0400, Joel Fernandes (Google) wrote:
> From: Sebastian Andrzej Siewior <[email protected]>
>
> There is no need to use xchg(), the access is serialized by krcp->lock.
> The xchg() function adds some atomic barriers which remain hidden in
> x86's disassembly but are visible on ARM for instance.
>
> Replace xchg() with a load + store.
>
> Acked-by: Joel Fernandes (Google) <[email protected]>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/rcu/tree.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cd61649e1b001..f6eb3aee0935e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3088,7 +3088,8 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> /* Check if a new block is required. */
> if (!krcp->bhead ||
> krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> - bnode = xchg(&krcp->bcached, NULL);
> + bnode = krcp->bcached;
> + krcp->bcached = NULL;
> if (!bnode) {
> WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
>
>
But we populate the cache without holding the krcp->lock. That is why it
is xchg here. See below:

<snip>
if (cmpxchg(&krcp->bcached, NULL, bhead))
free_page((unsigned long) bhead);
<snip>

we do not hold any krcp->lock here, we do not need it.

--
Vlad Rezki

2020-04-20 18:22:12

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 3/4] rcu/tree: Avoid using xchg() in kfree_call_rcu_add_ptr_to_bulk()

On Mon, Apr 20, 2020 at 07:18:29PM +0200, Uladzislau Rezki wrote:
> On Mon, Apr 20, 2020 at 11:38:36AM -0400, Joel Fernandes (Google) wrote:
> > From: Sebastian Andrzej Siewior <[email protected]>
> >
> > There is no need to use xchg(), the access is serialized by krcp->lock.
> > The xchg() function adds some atomic barriers which remain hidden in
> > x86's disassembly but are visible on ARM for instance.
> >
> > Replace xchg() with a load + store.
> >
> > Acked-by: Joel Fernandes (Google) <[email protected]>
> > Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> > ---
> > kernel/rcu/tree.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index cd61649e1b001..f6eb3aee0935e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3088,7 +3088,8 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > /* Check if a new block is required. */
> > if (!krcp->bhead ||
> > krcp->bhead->nr_records == KFREE_BULK_MAX_ENTR) {
> > - bnode = xchg(&krcp->bcached, NULL);
> > + bnode = krcp->bcached;
> > + krcp->bcached = NULL;
> > if (!bnode) {
> > WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> >
> >
> But we populate the cache without holding the krcp->lock. That is why it
> is xchg here. See below:
>
> <snip>
> if (cmpxchg(&krcp->bcached, NULL, bhead))
> free_page((unsigned long) bhead);
> <snip>
>
> we do not hold any krcp->lock here, we do not need it.

You are right. This patch is not helpful in this situation even though it
does not break things. Let us drop this patch. Please review the other 3 and
provide your Reviewed-by tag if they look Ok to you, thanks.

For the workqueue one that Paul asked us to look into - we are continuing to
discuss there if we need to move it outside the lock or not. If we decide to
move it outside lock, we can add that as one more patch to this series and
I'll send a v2.

thanks,

- Joel

2020-04-21 13:14:12

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 1/4] rcu/tree: Keep kfree_rcu() awake during lock contention

On Mon, Apr 20, 2020 at 11:38:34AM -0400, Joel Fernandes (Google) wrote:
> On PREEMPT_RT kernels, contending on the krcp spinlock can cause
> sleeping as on these kernels, the spinlock is converted to an rt-mutex.
> To prevent breakage of possible usage of kfree_rcu() now or in the
> future, make use of raw spinlocks which are not subject to such
> conversions.
>
> Vetting all code paths, there is no reason to believe that the raw
> spinlock will be held for long time so PREEMPT_RT should not suffer from
> lengthy acquirals of the lock.
>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/tree.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f288477ee1c26..cf68d3d9f5b81 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2905,7 +2905,7 @@ struct kfree_rcu_cpu {
> struct kfree_rcu_bulk_data *bhead;
> struct kfree_rcu_bulk_data *bcached;
> struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
> - spinlock_t lock;
> + raw_spinlock_t lock;
> struct delayed_work monitor_work;
> bool monitor_todo;
> bool initialized;
> @@ -2939,12 +2939,12 @@ static void kfree_rcu_work(struct work_struct *work)
> krwp = container_of(to_rcu_work(work),
> struct kfree_rcu_cpu_work, rcu_work);
> krcp = krwp->krcp;
> - spin_lock_irqsave(&krcp->lock, flags);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> head = krwp->head_free;
> krwp->head_free = NULL;
> bhead = krwp->bhead_free;
> krwp->bhead_free = NULL;
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
>
> /* "bhead" is now private, so traverse locklessly. */
> for (; bhead; bhead = bnext) {
> @@ -3047,14 +3047,14 @@ static inline void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krcp,
> krcp->monitor_todo = false;
> if (queue_kfree_rcu_work(krcp)) {
> // Success! Our job is done here.
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> return;
> }
>
> // Previous RCU batch still in progress, try again later.
> krcp->monitor_todo = true;
> schedule_delayed_work(&krcp->monitor_work, KFREE_DRAIN_JIFFIES);
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> }
>
> /*
> @@ -3067,11 +3067,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
> struct kfree_rcu_cpu *krcp = container_of(work, struct kfree_rcu_cpu,
> monitor_work.work);
>
> - spin_lock_irqsave(&krcp->lock, flags);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> if (krcp->monitor_todo)
> kfree_rcu_drain_unlock(krcp, flags);
> else
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> }
>
> static inline bool
> @@ -3142,7 +3142,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
> local_irq_save(flags); // For safely calling this_cpu_ptr().
> krcp = this_cpu_ptr(&krc);
> if (krcp->initialized)
> - spin_lock(&krcp->lock);
> + raw_spin_lock(&krcp->lock);
>
> // Queue the object but don't yet schedule the batch.
> if (debug_rcu_head_queue(head)) {
> @@ -3173,7 +3173,7 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func)
>
> unlock_return:
> if (krcp->initialized)
> - spin_unlock(&krcp->lock);
> + raw_spin_unlock(&krcp->lock);
> local_irq_restore(flags);
> }
> EXPORT_SYMBOL_GPL(kfree_call_rcu);
> @@ -3205,11 +3205,11 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> count = krcp->count;
> - spin_lock_irqsave(&krcp->lock, flags);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> if (krcp->monitor_todo)
> kfree_rcu_drain_unlock(krcp, flags);
> else
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
>
> sc->nr_to_scan -= count;
> freed += count;
> @@ -3236,15 +3236,15 @@ void __init kfree_rcu_scheduler_running(void)
> for_each_online_cpu(cpu) {
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> - spin_lock_irqsave(&krcp->lock, flags);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> if (!krcp->head || krcp->monitor_todo) {
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> continue;
> }
> krcp->monitor_todo = true;
> schedule_delayed_work_on(cpu, &krcp->monitor_work,
> KFREE_DRAIN_JIFFIES);
> - spin_unlock_irqrestore(&krcp->lock, flags);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> }
> }
>
> @@ -4140,7 +4140,7 @@ static void __init kfree_rcu_batch_init(void)
> for_each_possible_cpu(cpu) {
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> - spin_lock_init(&krcp->lock);
> + raw_spin_lock_init(&krcp->lock);
> for (i = 0; i < KFREE_N_BATCHES; i++) {
> INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
> krcp->krw_arr[i].krcp = krcp;
> --
> 2.26.1.301.g55bc3eb7cb9-goog
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>

If we decide to move the schedule_delayed_work() outside of the critical
section, i think, it would be better to submit separate patch with good
explanation why we do it.

--
Vlad Rezki
>

2020-04-22 12:22:43

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 2/4] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

On Mon, Apr 20, 2020 at 11:38:35AM -0400, Joel Fernandes (Google) wrote:
> To keep kfree_rcu() path working on raw non-preemptible sections,
> prevent the optional entry into the allocator as it uses sleeping locks.
> In fact, even if the caller of kfree_rcu() is preemptible, this path
> still is not, as krcp->lock is a raw spinlock as done in previous
> patches. With additional page pre-allocation in the works, hitting this
> return is going to be much less likely soon so just prevent it for now
> so that PREEMPT_RT does not break. Note that page allocation here is an
> optimization and skipping it still makes kfree_rcu() work.
>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Co-developed-by: Uladzislau Rezki <[email protected]>
> Signed-off-by: Uladzislau Rezki <[email protected]>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/rcu/tree.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index cf68d3d9f5b81..cd61649e1b001 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3092,6 +3092,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> if (!bnode) {
> WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
>
> + /*
> + * To keep this path working on raw non-preemptible
> + * sections, prevent the optional entry into the
> + * allocator as it uses sleeping locks. In fact, even
> + * if the caller of kfree_rcu() is preemptible, this
> + * path still is not, as krcp->lock is a raw spinlock.
> + * With additional page pre-allocation in the works,
> + * hitting this return is going to be much less likely.
> + */
> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> + return false;
> +
> bnode = (struct kfree_rcu_bulk_data *)
> __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> }
This will not be correctly working by just reverting everything to the
"rcu_head path" for CONFIG_PREEMPT_RT kernel. We need to preallocate at
least once. I can add caching on top of this series to address it.

--
Vlad Rezki

2020-04-22 12:24:25

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 2/4] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

On Wed, Apr 22, 2020 at 12:35:36PM +0200, Uladzislau Rezki wrote:
> On Mon, Apr 20, 2020 at 11:38:35AM -0400, Joel Fernandes (Google) wrote:
> > To keep kfree_rcu() path working on raw non-preemptible sections,
> > prevent the optional entry into the allocator as it uses sleeping locks.
> > In fact, even if the caller of kfree_rcu() is preemptible, this path
> > still is not, as krcp->lock is a raw spinlock as done in previous
> > patches. With additional page pre-allocation in the works, hitting this
> > return is going to be much less likely soon so just prevent it for now
> > so that PREEMPT_RT does not break. Note that page allocation here is an
> > optimization and skipping it still makes kfree_rcu() work.
> >
> > Cc: Sebastian Andrzej Siewior <[email protected]>
> > Co-developed-by: Uladzislau Rezki <[email protected]>
> > Signed-off-by: Uladzislau Rezki <[email protected]>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/rcu/tree.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index cf68d3d9f5b81..cd61649e1b001 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3092,6 +3092,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp,
> > if (!bnode) {
> > WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> >
> > + /*
> > + * To keep this path working on raw non-preemptible
> > + * sections, prevent the optional entry into the
> > + * allocator as it uses sleeping locks. In fact, even
> > + * if the caller of kfree_rcu() is preemptible, this
> > + * path still is not, as krcp->lock is a raw spinlock.
> > + * With additional page pre-allocation in the works,
> > + * hitting this return is going to be much less likely.
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > + return false;
> > +
> > bnode = (struct kfree_rcu_bulk_data *)
> > __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> > }
> This will not be correctly working by just reverting everything to the
> "rcu_head path" for CONFIG_PREEMPT_RT kernel. We need to preallocate at
> least once. I can add caching on top of this series to address it.
>
Forgot to add: Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>

Caching should be added as separate patch.

--
Vlad Rezki

2020-04-22 13:30:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 2/4] rcu/tree: Skip entry into the page allocator for PREEMPT_RT

On Wed, Apr 22, 2020 at 09:18:41AM -0400, [email protected] wrote:
>
>
> On April 22, 2020 6:35:36 AM EDT, Uladzislau Rezki <[email protected]> wrote:
> >On Mon, Apr 20, 2020 at 11:38:35AM -0400, Joel Fernandes (Google)
> >wrote:
> >> To keep kfree_rcu() path working on raw non-preemptible sections,
> >> prevent the optional entry into the allocator as it uses sleeping
> >locks.
> >> In fact, even if the caller of kfree_rcu() is preemptible, this path
> >> still is not, as krcp->lock is a raw spinlock as done in previous
> >> patches. With additional page pre-allocation in the works, hitting
> >this
> >> return is going to be much less likely soon so just prevent it for
> >now
> >> so that PREEMPT_RT does not break. Note that page allocation here is
> >an
> >> optimization and skipping it still makes kfree_rcu() work.
> >>
> >> Cc: Sebastian Andrzej Siewior <[email protected]>
> >> Co-developed-by: Uladzislau Rezki <[email protected]>
> >> Signed-off-by: Uladzislau Rezki <[email protected]>
> >> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> >> ---
> >> kernel/rcu/tree.c | 12 ++++++++++++
> >> 1 file changed, 12 insertions(+)
> >>
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index cf68d3d9f5b81..cd61649e1b001 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -3092,6 +3092,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct
> >kfree_rcu_cpu *krcp,
> >> if (!bnode) {
> >> WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
> >>
> >> + /*
> >> + * To keep this path working on raw non-preemptible
> >> + * sections, prevent the optional entry into the
> >> + * allocator as it uses sleeping locks. In fact, even
> >> + * if the caller of kfree_rcu() is preemptible, this
> >> + * path still is not, as krcp->lock is a raw spinlock.
> >> + * With additional page pre-allocation in the works,
> >> + * hitting this return is going to be much less likely.
> >> + */
> >> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >> + return false;
> >> +
> >> bnode = (struct kfree_rcu_bulk_data *)
> >> __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
> >> }
> >This will not be correctly working by just reverting everything to the
> >"rcu_head path" for CONFIG_PREEMPT_RT kernel. We need to preallocate at
> >least once. I can add caching on top of this series to address it.
>
> I discussed with Vlad offline, this patch is fine for mainline as we
> don't have headless support yet. So this patch is good. Future patches
> adding caching will also add the headless support after additional
> caching, so skipping the allocation here is ok.

Sounds good!

But would one of the realtime guys be willing to give a Tested-by?

Thanx, Paul

> Thanks.
>
> - Joel
>
>
>
>
> >--
> >Vlad Rezki
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

2020-04-22 14:37:35

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH rcu/dev -fixes 2/4] rcu/tree: Skip entry into the page allocator for PREEMPT_RT



On April 22, 2020 6:35:36 AM EDT, Uladzislau Rezki <[email protected]> wrote:
>On Mon, Apr 20, 2020 at 11:38:35AM -0400, Joel Fernandes (Google)
>wrote:
>> To keep kfree_rcu() path working on raw non-preemptible sections,
>> prevent the optional entry into the allocator as it uses sleeping
>locks.
>> In fact, even if the caller of kfree_rcu() is preemptible, this path
>> still is not, as krcp->lock is a raw spinlock as done in previous
>> patches. With additional page pre-allocation in the works, hitting
>this
>> return is going to be much less likely soon so just prevent it for
>now
>> so that PREEMPT_RT does not break. Note that page allocation here is
>an
>> optimization and skipping it still makes kfree_rcu() work.
>>
>> Cc: Sebastian Andrzej Siewior <[email protected]>
>> Co-developed-by: Uladzislau Rezki <[email protected]>
>> Signed-off-by: Uladzislau Rezki <[email protected]>
>> Signed-off-by: Joel Fernandes (Google) <[email protected]>
>> ---
>> kernel/rcu/tree.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index cf68d3d9f5b81..cd61649e1b001 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -3092,6 +3092,18 @@ kfree_call_rcu_add_ptr_to_bulk(struct
>kfree_rcu_cpu *krcp,
>> if (!bnode) {
>> WARN_ON_ONCE(sizeof(struct kfree_rcu_bulk_data) > PAGE_SIZE);
>>
>> + /*
>> + * To keep this path working on raw non-preemptible
>> + * sections, prevent the optional entry into the
>> + * allocator as it uses sleeping locks. In fact, even
>> + * if the caller of kfree_rcu() is preemptible, this
>> + * path still is not, as krcp->lock is a raw spinlock.
>> + * With additional page pre-allocation in the works,
>> + * hitting this return is going to be much less likely.
>> + */
>> + if (IS_ENABLED(CONFIG_PREEMPT_RT))
>> + return false;
>> +
>> bnode = (struct kfree_rcu_bulk_data *)
>> __get_free_page(GFP_NOWAIT | __GFP_NOWARN);
>> }
>This will not be correctly working by just reverting everything to the
>"rcu_head path" for CONFIG_PREEMPT_RT kernel. We need to preallocate at
>least once. I can add caching on top of this series to address it.
>

I discussed with Vlad offline, this patch is fine for mainline as we don't have headless support yet. So this patch is good. Future patches adding caching will also add the headless support after additional caching, so skipping the allocation here is ok.

Thanks.

- Joel




>--
>Vlad Rezki

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.