2020-08-14 06:50:19

by Zhang, Qiang

[permalink] [raw]
Subject: [PATCH] rcu: shrink each possible cpu krcp

From: Zqiang <[email protected]>

Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
func, if the shrinker is triggered at this time, we should drain each
possible cpu "krcp".

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9ac716..619ccbb3fe4b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
unsigned long count = 0;

/* Snapshot count of all CPUs */
- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

count += READ_ONCE(krcp->count);
@@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
int cpu, freed = 0;
unsigned long flags;

- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
int count;
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

@@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
int cpu;
unsigned long flags;

- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

raw_spin_lock_irqsave(&krcp->lock, flags);
--
2.17.1


2020-08-14 21:01:36

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

> From: Zqiang <[email protected]>
>
> Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> func, if the shrinker is triggered at this time, we should drain each
> possible cpu "krcp".
>
> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..619ccbb3fe4b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> unsigned long count = 0;
>
> /* Snapshot count of all CPUs */
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> count += READ_ONCE(krcp->count);
> @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> int cpu, freed = 0;
> unsigned long flags;
>
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> int count;
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> int cpu;
> unsigned long flags;
>
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> raw_spin_lock_irqsave(&krcp->lock, flags);
>
I agree that it can happen.

Joel, what is your view?

Thanks!

--
Vlad Rezki

2020-08-17 23:16:11

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
>
> > From: Zqiang <[email protected]>
> >
> > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > func, if the shrinker is triggered at this time, we should drain each
> > possible cpu "krcp".
> >
> > Signed-off-by: Zqiang <[email protected]>
> > ---
> > kernel/rcu/tree.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8ce77d9ac716..619ccbb3fe4b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > unsigned long count = 0;
> >
> > /* Snapshot count of all CPUs */
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > count += READ_ONCE(krcp->count);
> > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > int cpu, freed = 0;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > int count;
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > int cpu;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > raw_spin_lock_irqsave(&krcp->lock, flags);
> >
> I agree that it can happen.
>
> Joel, what is your view?

Yes I also think it is possible. The patch LGTM. Another fix could be
to drain the caches in the CPU offline path and save the memory. But
then it will take hit during __get_free_page(). If CPU
offlining/online is not frequent, then it will save the lost memory.

I wonder how other per-cpu caches in the kernel work in such scenarios.

Thoughts?

thanks,

- Joel

2020-08-18 17:19:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
> >
> > > From: Zqiang <[email protected]>
> > >
> > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > func, if the shrinker is triggered at this time, we should drain each
> > > possible cpu "krcp".
> > >
> > > Signed-off-by: Zqiang <[email protected]>
> > > ---
> > > kernel/rcu/tree.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > unsigned long count = 0;
> > >
> > > /* Snapshot count of all CPUs */
> > > - for_each_online_cpu(cpu) {
> > > + for_each_possible_cpu(cpu) {
> > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > count += READ_ONCE(krcp->count);
> > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > int cpu, freed = 0;
> > > unsigned long flags;
> > >
> > > - for_each_online_cpu(cpu) {
> > > + for_each_possible_cpu(cpu) {
> > > int count;
> > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > int cpu;
> > > unsigned long flags;
> > >
> > > - for_each_online_cpu(cpu) {
> > > + for_each_possible_cpu(cpu) {
> > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > >
> > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > >
> > I agree that it can happen.
> >
> > Joel, what is your view?
>
> Yes I also think it is possible. The patch LGTM. Another fix could be
> to drain the caches in the CPU offline path and save the memory. But
> then it will take hit during __get_free_page(). If CPU
> offlining/online is not frequent, then it will save the lost memory.
>
> I wonder how other per-cpu caches in the kernel work in such scenarios.
>
> Thoughts?

Do I count this as an ack or a review? If not, what precisely would
you like the submitter to do differently?

Thanx, Paul

2020-08-18 19:02:01

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <[email protected]> wrote:
>
> On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
> > >
> > > > From: Zqiang <[email protected]>
> > > >
> > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > func, if the shrinker is triggered at this time, we should drain each
> > > > possible cpu "krcp".
> > > >
> > > > Signed-off-by: Zqiang <[email protected]>
> > > > ---
> > > > kernel/rcu/tree.c | 6 +++---
> > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > unsigned long count = 0;
> > > >
> > > > /* Snapshot count of all CPUs */
> > > > - for_each_online_cpu(cpu) {
> > > > + for_each_possible_cpu(cpu) {
> > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > >
> > > > count += READ_ONCE(krcp->count);
> > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > int cpu, freed = 0;
> > > > unsigned long flags;
> > > >
> > > > - for_each_online_cpu(cpu) {
> > > > + for_each_possible_cpu(cpu) {
> > > > int count;
> > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > >
> > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > int cpu;
> > > > unsigned long flags;
> > > >
> > > > - for_each_online_cpu(cpu) {
> > > > + for_each_possible_cpu(cpu) {
> > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > >
> > > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > >
> > > I agree that it can happen.
> > >
> > > Joel, what is your view?
> >
> > Yes I also think it is possible. The patch LGTM. Another fix could be
> > to drain the caches in the CPU offline path and save the memory. But
> > then it will take hit during __get_free_page(). If CPU
> > offlining/online is not frequent, then it will save the lost memory.
> >
> > I wonder how other per-cpu caches in the kernel work in such scenarios.
> >
> > Thoughts?
>
> Do I count this as an ack or a review? If not, what precisely would
> you like the submitter to do differently?

Hi Paul,
The patch is correct and is definitely an improvement. I was thinking
about whether we should always do what the patch is doing when
offlining CPUs to save memory but now I feel that may not be that much
of a win to justify more complexity.

You can take it with my ack:

Acked-by: Joel Fernandes <[email protected]>

thanks,

- Joel

2020-08-18 21:25:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
> > > >
> > > > > From: Zqiang <[email protected]>
> > > > >
> > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > possible cpu "krcp".
> > > > >
> > > > > Signed-off-by: Zqiang <[email protected]>
> > > > > ---
> > > > > kernel/rcu/tree.c | 6 +++---
> > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > unsigned long count = 0;
> > > > >
> > > > > /* Snapshot count of all CPUs */
> > > > > - for_each_online_cpu(cpu) {
> > > > > + for_each_possible_cpu(cpu) {
> > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > >
> > > > > count += READ_ONCE(krcp->count);
> > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > int cpu, freed = 0;
> > > > > unsigned long flags;
> > > > >
> > > > > - for_each_online_cpu(cpu) {
> > > > > + for_each_possible_cpu(cpu) {
> > > > > int count;
> > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > >
> > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > int cpu;
> > > > > unsigned long flags;
> > > > >
> > > > > - for_each_online_cpu(cpu) {
> > > > > + for_each_possible_cpu(cpu) {
> > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > >
> > > > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > >
> > > > I agree that it can happen.
> > > >
> > > > Joel, what is your view?
> > >
> > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > to drain the caches in the CPU offline path and save the memory. But
> > > then it will take hit during __get_free_page(). If CPU
> > > offlining/online is not frequent, then it will save the lost memory.
> > >
> > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > >
> > > Thoughts?
> >
> > Do I count this as an ack or a review? If not, what precisely would
> > you like the submitter to do differently?
>
> Hi Paul,
> The patch is correct and is definitely an improvement. I was thinking
> about whether we should always do what the patch is doing when
> offlining CPUs to save memory but now I feel that may not be that much
> of a win to justify more complexity.
>
> You can take it with my ack:
>
> Acked-by: Joel Fernandes <[email protected]>

Thank you all! I wordsmithed a bit as shown below, so please let
me know if I messed anything up.

Thanx, Paul

------------------------------------------------------------------------

commit fe5d89cc025b3efe682cac122bc4d39f4722821e
Author: Zqiang <[email protected]>
Date: Fri Aug 14 14:45:57 2020 +0800

rcu: Shrink each possible cpu krcp

CPUs can go offline shortly after kfree_call_rcu() has been invoked,
which can leave memory stranded until those CPUs come back online.
This commit therefore drains the kcrp of each CPU, not just the
ones that happen to be online.

Acked-by: Joel Fernandes <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 02ca8e5..d9f90f6 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
unsigned long count = 0;

/* Snapshot count of all CPUs */
- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

count += READ_ONCE(krcp->count);
@@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
int cpu, freed = 0;
unsigned long flags;

- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
int count;
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

@@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
int cpu;
unsigned long flags;

- for_each_online_cpu(cpu) {
+ for_each_possible_cpu(cpu) {
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

raw_spin_lock_irqsave(&krcp->lock, flags);

2020-08-18 21:56:47

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

> On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
> > > > >
> > > > > > From: Zqiang <[email protected]>
> > > > > >
> > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > possible cpu "krcp".
> > > > > >
> > > > > > Signed-off-by: Zqiang <[email protected]>
> > > > > > ---
> > > > > > kernel/rcu/tree.c | 6 +++---
> > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > unsigned long count = 0;
> > > > > >
> > > > > > /* Snapshot count of all CPUs */
> > > > > > - for_each_online_cpu(cpu) {
> > > > > > + for_each_possible_cpu(cpu) {
> > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > >
> > > > > > count += READ_ONCE(krcp->count);
> > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > int cpu, freed = 0;
> > > > > > unsigned long flags;
> > > > > >
> > > > > > - for_each_online_cpu(cpu) {
> > > > > > + for_each_possible_cpu(cpu) {
> > > > > > int count;
> > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > >
> > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > > int cpu;
> > > > > > unsigned long flags;
> > > > > >
> > > > > > - for_each_online_cpu(cpu) {
> > > > > > + for_each_possible_cpu(cpu) {
> > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > >
> > > > > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > >
> > > > > I agree that it can happen.
> > > > >
> > > > > Joel, what is your view?
> > > >
> > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > to drain the caches in the CPU offline path and save the memory. But
> > > > then it will take hit during __get_free_page(). If CPU
> > > > offlining/online is not frequent, then it will save the lost memory.
> > > >
> > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > >
> > > > Thoughts?
> > >
> > > Do I count this as an ack or a review? If not, what precisely would
> > > you like the submitter to do differently?
> >
> > Hi Paul,
> > The patch is correct and is definitely an improvement. I was thinking
> > about whether we should always do what the patch is doing when
> > offlining CPUs to save memory but now I feel that may not be that much
> > of a win to justify more complexity.
> >
> > You can take it with my ack:
> >
> > Acked-by: Joel Fernandes <[email protected]>
>
> Thank you all! I wordsmithed a bit as shown below, so please let
> me know if I messed anything up.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> Author: Zqiang <[email protected]>
> Date: Fri Aug 14 14:45:57 2020 +0800
>
> rcu: Shrink each possible cpu krcp
>
> CPUs can go offline shortly after kfree_call_rcu() has been invoked,
> which can leave memory stranded until those CPUs come back online.
> This commit therefore drains the kcrp of each CPU, not just the
> ones that happen to be online.
>
> Acked-by: Joel Fernandes <[email protected]>
> Signed-off-by: Zqiang <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 02ca8e5..d9f90f6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> unsigned long count = 0;
>
> /* Snapshot count of all CPUs */
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> count += READ_ONCE(krcp->count);
> @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> int cpu, freed = 0;
> unsigned long flags;
>
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> int count;
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
> int cpu;
> unsigned long flags;
>
> - for_each_online_cpu(cpu) {
> + for_each_possible_cpu(cpu) {
> struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
>
> raw_spin_lock_irqsave(&krcp->lock, flags);
>

Should we just clean a krc of a CPU when it goes offline?

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b8ccd7b5af82..6decb9ad2421 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
{
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
+ struct kfree_rcu_cpu *krcp;

if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
return 0;

+ /* Drain the kcrp of this CPU. IRQs should be disabled? */
+ krcp = this_cpu_ptr(&krc)
+ schedule_delayed_work(&krcp->monitor_work, 0);
+

A cpu can be offlined and its krp will be stuck until a shrinker is involved.
Maybe be never.

--
Vlad Rezki

2020-08-18 22:04:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 11:55:11PM +0200, Uladzislau Rezki wrote:
> > On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
> > > > > >
> > > > > > > From: Zqiang <[email protected]>
> > > > > > >
> > > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > > possible cpu "krcp".
> > > > > > >
> > > > > > > Signed-off-by: Zqiang <[email protected]>
> > > > > > > ---
> > > > > > > kernel/rcu/tree.c | 6 +++---
> > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > unsigned long count = 0;
> > > > > > >
> > > > > > > /* Snapshot count of all CPUs */
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > count += READ_ONCE(krcp->count);
> > > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > int cpu, freed = 0;
> > > > > > > unsigned long flags;
> > > > > > >
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > int count;
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > > > int cpu;
> > > > > > > unsigned long flags;
> > > > > > >
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > >
> > > > > > I agree that it can happen.
> > > > > >
> > > > > > Joel, what is your view?
> > > > >
> > > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > > to drain the caches in the CPU offline path and save the memory. But
> > > > > then it will take hit during __get_free_page(). If CPU
> > > > > offlining/online is not frequent, then it will save the lost memory.
> > > > >
> > > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Do I count this as an ack or a review? If not, what precisely would
> > > > you like the submitter to do differently?
> > >
> > > Hi Paul,
> > > The patch is correct and is definitely an improvement. I was thinking
> > > about whether we should always do what the patch is doing when
> > > offlining CPUs to save memory but now I feel that may not be that much
> > > of a win to justify more complexity.
> > >
> > > You can take it with my ack:
> > >
> > > Acked-by: Joel Fernandes <[email protected]>
> >
> > Thank you all! I wordsmithed a bit as shown below, so please let
> > me know if I messed anything up.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> > Author: Zqiang <[email protected]>
> > Date: Fri Aug 14 14:45:57 2020 +0800
> >
> > rcu: Shrink each possible cpu krcp
> >
> > CPUs can go offline shortly after kfree_call_rcu() has been invoked,
> > which can leave memory stranded until those CPUs come back online.
> > This commit therefore drains the kcrp of each CPU, not just the
> > ones that happen to be online.
> >
> > Acked-by: Joel Fernandes <[email protected]>
> > Signed-off-by: Zqiang <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 02ca8e5..d9f90f6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > unsigned long count = 0;
> >
> > /* Snapshot count of all CPUs */
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > count += READ_ONCE(krcp->count);
> > @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > int cpu, freed = 0;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > int count;
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
> > int cpu;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > raw_spin_lock_irqsave(&krcp->lock, flags);
> >
>
> Should we just clean a krc of a CPU when it goes offline?

That is equally valid from my viewpoint.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b8ccd7b5af82..6decb9ad2421 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> + struct kfree_rcu_cpu *krcp;
>
> if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> return 0;
>
> + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> + krcp = this_cpu_ptr(&krc)
> + schedule_delayed_work(&krcp->monitor_work, 0);
> +
>
> A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> Maybe be never.

Does the same apply to its kmalloc() per-CPU caches? If so, I have a
hard time getting too worried about it. ;-)

Thanx, Paul

2020-08-19 00:06:47

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 11:55:11PM +0200, Uladzislau Rezki wrote:
> > On Tue, Aug 18, 2020 at 03:00:35PM -0400, Joel Fernandes wrote:
> > > On Tue, Aug 18, 2020 at 1:18 PM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Mon, Aug 17, 2020 at 06:03:54PM -0400, Joel Fernandes wrote:
> > > > > On Fri, Aug 14, 2020 at 2:51 PM Uladzislau Rezki <[email protected]> wrote:
> > > > > >
> > > > > > > From: Zqiang <[email protected]>
> > > > > > >
> > > > > > > Due to cpu hotplug. some cpu may be offline after call "kfree_call_rcu"
> > > > > > > func, if the shrinker is triggered at this time, we should drain each
> > > > > > > possible cpu "krcp".
> > > > > > >
> > > > > > > Signed-off-by: Zqiang <[email protected]>
> > > > > > > ---
> > > > > > > kernel/rcu/tree.c | 6 +++---
> > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index 8ce77d9ac716..619ccbb3fe4b 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -3443,7 +3443,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > unsigned long count = 0;
> > > > > > >
> > > > > > > /* Snapshot count of all CPUs */
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > count += READ_ONCE(krcp->count);
> > > > > > > @@ -3458,7 +3458,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > > > > > > int cpu, freed = 0;
> > > > > > > unsigned long flags;
> > > > > > >
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > int count;
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > @@ -3491,7 +3491,7 @@ void __init kfree_rcu_scheduler_running(void)
> > > > > > > int cpu;
> > > > > > > unsigned long flags;
> > > > > > >
> > > > > > > - for_each_online_cpu(cpu) {
> > > > > > > + for_each_possible_cpu(cpu) {
> > > > > > > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> > > > > > >
> > > > > > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > >
> > > > > > I agree that it can happen.
> > > > > >
> > > > > > Joel, what is your view?
> > > > >
> > > > > Yes I also think it is possible. The patch LGTM. Another fix could be
> > > > > to drain the caches in the CPU offline path and save the memory. But
> > > > > then it will take hit during __get_free_page(). If CPU
> > > > > offlining/online is not frequent, then it will save the lost memory.
> > > > >
> > > > > I wonder how other per-cpu caches in the kernel work in such scenarios.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Do I count this as an ack or a review? If not, what precisely would
> > > > you like the submitter to do differently?
> > >
> > > Hi Paul,
> > > The patch is correct and is definitely an improvement. I was thinking
> > > about whether we should always do what the patch is doing when
> > > offlining CPUs to save memory but now I feel that may not be that much
> > > of a win to justify more complexity.
> > >
> > > You can take it with my ack:
> > >
> > > Acked-by: Joel Fernandes <[email protected]>
> >
> > Thank you all! I wordsmithed a bit as shown below, so please let
> > me know if I messed anything up.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit fe5d89cc025b3efe682cac122bc4d39f4722821e
> > Author: Zqiang <[email protected]>
> > Date: Fri Aug 14 14:45:57 2020 +0800
> >
> > rcu: Shrink each possible cpu krcp
> >
> > CPUs can go offline shortly after kfree_call_rcu() has been invoked,
> > which can leave memory stranded until those CPUs come back online.
> > This commit therefore drains the kcrp of each CPU, not just the
> > ones that happen to be online.
> >
> > Acked-by: Joel Fernandes <[email protected]>
> > Signed-off-by: Zqiang <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 02ca8e5..d9f90f6 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3500,7 +3500,7 @@ kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> > unsigned long count = 0;
> >
> > /* Snapshot count of all CPUs */
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > count += READ_ONCE(krcp->count);
> > @@ -3515,7 +3515,7 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> > int cpu, freed = 0;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > int count;
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > @@ -3548,7 +3548,7 @@ void __init kfree_rcu_scheduler_running(void)
> > int cpu;
> > unsigned long flags;
> >
> > - for_each_online_cpu(cpu) {
> > + for_each_possible_cpu(cpu) {
> > struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
> >
> > raw_spin_lock_irqsave(&krcp->lock, flags);
> >
>
> Should we just clean a krc of a CPU when it goes offline?
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b8ccd7b5af82..6decb9ad2421 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> + struct kfree_rcu_cpu *krcp;
>
> if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> return 0;
>
> + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> + krcp = this_cpu_ptr(&krc)
> + schedule_delayed_work(&krcp->monitor_work, 0);
> +
>
> A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> Maybe be never.
>

Yes that is a bug as we discussed on IRC, thanks for following up as well.

We need to acquire the krcp->lock too if no monitor is scheduled then nothing
to do so it does not race with the kfree_rcu_work. So same as what shrinker
does:

raw_spin_lock_irqsave(&krcp->lock, flags);
if (krcp->monitor_todo)
kfree_rcu_drain_unlock(krcp, flags);
else
raw_spin_unlock_irqrestore(&krcp->lock, flags);

thanks!

- Joel

2020-08-19 00:15:44

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b8ccd7b5af82..6decb9ad2421 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > {
> > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > + struct kfree_rcu_cpu *krcp;
> >
> > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > return 0;
> >
> > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > + krcp = this_cpu_ptr(&krc)
> > + schedule_delayed_work(&krcp->monitor_work, 0);
> > +
> >
> > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > Maybe be never.
>
> Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> hard time getting too worried about it. ;-)

Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
on the cache reaper who's job is to flush the per-cpu caches. So I
believe during CPU offlining, the per-cpu slab caches are flushed.

thanks,

- Joel

2020-08-19 03:02:37

by Zhang, Qiang

[permalink] [raw]
Subject: 回复: [PATCH] rcu: shrink each possible cpu k rcp



________________________________________
??????: [email protected] <[email protected]> ???? Joel Fernandes <[email protected]>
????ʱ??: 2020??8??19?? 8:04
?ռ???: Paul E. McKenney
????: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
????: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:

> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b8ccd7b5af82..6decb9ad2421 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > {
> > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > + struct kfree_rcu_cpu *krcp;
> >
> > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > return 0;
> >
> > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > + krcp = this_cpu_ptr(&krc)
> > + schedule_delayed_work(&krcp->monitor_work, 0);
> > +
> >
> > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > Maybe be never.
>
> Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> hard time getting too worried about it. ;-)

>Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
>on the cache reaper who's job is to flush the per-cpu caches. So I
>believe during CPU offlining, the per-cpu slab caches are flushed.
>
>thanks,
>
>- Joel

When cpu going offline, the slub or slab only flush free objects in offline cpu cache, put these free objects in node list or return buddy system, for those who are still in use, they still stay offline cpu cache.

If we want clean per-cpu "krcp" objects when cpu going offline.
we should free "krcp" cache objects in "rcutree_offline_cpu", this func be called before other rcu cpu offline func. and then "rcutree_offline_cpu" will be called in "cpuhp/%u" per-cpu thread.

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8ce77d9ac716..1812d4a1ac1b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
unsigned long flags;
struct rcu_data *rdp;
struct rcu_node *rnp;
+ struct kfree_rcu_cpu *krcp;

rdp = per_cpu_ptr(&rcu_data, cpu);
rnp = rdp->mynode;
@@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)

// nohz_full CPUs need the tick for stop-machine to work quickly
tick_dep_set(TICK_DEP_BIT_RCU);
+
+ krcp = per_cpu_ptr(&krc, cpu);
+ raw_spin_lock_irqsave(&krcp->lock, flags);
+ schedule_delayed_work(&krcp->monitor_work, 0);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
return 0;
}

thanks,

Zqiang

2020-08-19 11:25:07

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Tue, Aug 18, 2020 at 08:04:20PM -0400, Joel Fernandes wrote:
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
>
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > {
> > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > + struct kfree_rcu_cpu *krcp;
> > >
> > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > return 0;
> > >
> > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > + krcp = this_cpu_ptr(&krc)
> > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > hard time getting too worried about it. ;-)
>
> Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> on the cache reaper who's job is to flush the per-cpu caches. So I
> believe during CPU offlining, the per-cpu slab caches are flushed.
>
SLAB does it for sure, same as page allocator. There are special CPU-offline
callbacks for both cases to perform cleanup when CPU dies.

--
Vlad Rezki

2020-08-19 13:09:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
>
>
> ________________________________________
> 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
>
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
>
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > {
> > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > + struct kfree_rcu_cpu *krcp;
> > >
> > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > return 0;
> > >
> > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > + krcp = this_cpu_ptr(&krc)
> > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > hard time getting too worried about it. ;-)
>
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
> >- Joel
>
> When cpu going offline, the slub or slab only flush free objects in offline cpu cache, put these free objects in node list or return buddy system, for those who are still in use, they still stay offline cpu cache.
>
> If we want clean per-cpu "krcp" objects when cpu going offline.
> we should free "krcp" cache objects in "rcutree_offline_cpu", this func be called before other rcu cpu offline func. and then "rcutree_offline_cpu" will be called in "cpuhp/%u" per-cpu thread.

If Joel and Uladislau are good with this, would you be willing to
resend with a commit log and your Signed-off-by?

Thanx, Paul

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> unsigned long flags;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> + struct kfree_rcu_cpu *krcp;
>
> rdp = per_cpu_ptr(&rcu_data, cpu);
> rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>
> // nohz_full CPUs need the tick for stop-machine to work quickly
> tick_dep_set(TICK_DEP_BIT_RCU);
> +
> + krcp = per_cpu_ptr(&krc, cpu);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> + schedule_delayed_work(&krcp->monitor_work, 0);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> return 0;
> }
>
> thanks,
>
> Zqiang

2020-08-19 13:29:07

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 01:22:25PM +0200, Uladzislau Rezki wrote:
> On Tue, Aug 18, 2020 at 08:04:20PM -0400, Joel Fernandes wrote:
> > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > {
> > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > + struct kfree_rcu_cpu *krcp;
> > > >
> > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > return 0;
> > > >
> > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > + krcp = this_cpu_ptr(&krc)
> > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > +
> > > >
> > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > Maybe be never.
> > >
> > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > hard time getting too worried about it. ;-)
> >
> > Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > on the cache reaper who's job is to flush the per-cpu caches. So I
> > believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> SLAB does it for sure, same as page allocator. There are special CPU-offline
> callbacks for both cases to perform cleanup when CPU dies.

Got it, thanks for confirming, makes sense.

thanks,

- Joel

2020-08-19 13:58:12

by Joel Fernandes

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
>
>
> ________________________________________
> 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
>
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
>
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > {
> > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > + struct kfree_rcu_cpu *krcp;
> > >
> > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > return 0;
> > >
> > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > + krcp = this_cpu_ptr(&krc)
> > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > hard time getting too worried about it. ;-)
>
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
> >- Joel
>
> When cpu going offline, the slub or slab only flush free objects in offline
> cpu cache, put these free objects in node list or return buddy system,
> for those who are still in use, they still stay offline cpu cache.
>
> If we want clean per-cpu "krcp" objects when cpu going offline. we should
> free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> called in "cpuhp/%u" per-cpu thread.
>

Could you please wrap text properly when you post to mailing list, thanks. I
fixed it for you above.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> unsigned long flags;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> + struct kfree_rcu_cpu *krcp;
>
> rdp = per_cpu_ptr(&rcu_data, cpu);
> rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>
> // nohz_full CPUs need the tick for stop-machine to work quickly
> tick_dep_set(TICK_DEP_BIT_RCU);
> +
> + krcp = per_cpu_ptr(&krc, cpu);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> + schedule_delayed_work(&krcp->monitor_work, 0);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> return 0;

I realized the above is not good enough for what this is trying to do. Unlike
the slab, the new kfree_rcu objects cannot always be drained / submitted to
RCU because the previous batch may still be waiting for a grace period. So
the above code could very well return with the yet-to-be-submitted kfree_rcu
objects still in the cache.

One option is to spin-wait here for monitor_todo to be false and keep calling
kfree_rcu_drain_unlock() till then.

But then that's not good enough either, because if new objects are queued
when interrupts are enabled in the CPU offline path, then the cache will get
new objects after the previous set was drained. Further, spin waiting may
introduce deadlocks.

Another option is to switch the kfree_rcu() path to non-batching (so new
objects cannot be cached in the offline path and are submitted directly to
RCU), wait for a GP and then submit the work. But then not sure if 1-argument
kfree_rcu() will like that.

Probably Qian's original fix for for_each_possible_cpus() is good enough for
the shrinker case, and then we can tackle the hotplug one.

thanks,

- Joel

2020-08-19 15:24:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> >
> >
> > ________________________________________
> > 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> > 发送时间: 2020年8月19日 8:04
> > 收件人: Paul E. McKenney
> > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> >
> > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > {
> > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > + struct kfree_rcu_cpu *krcp;
> > > >
> > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > return 0;
> > > >
> > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > + krcp = this_cpu_ptr(&krc)
> > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > +
> > > >
> > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > Maybe be never.
> > >
> > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > hard time getting too worried about it. ;-)
> >
> > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > >
> > >thanks,
> > >
> > >- Joel
> >
> > When cpu going offline, the slub or slab only flush free objects in offline
> > cpu cache, put these free objects in node list or return buddy system,
> > for those who are still in use, they still stay offline cpu cache.
> >
> > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > called in "cpuhp/%u" per-cpu thread.
> >
>
> Could you please wrap text properly when you post to mailing list, thanks. I
> fixed it for you above.
>
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8ce77d9ac716..1812d4a1ac1b 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > unsigned long flags;
> > struct rcu_data *rdp;
> > struct rcu_node *rnp;
> > + struct kfree_rcu_cpu *krcp;
> >
> > rdp = per_cpu_ptr(&rcu_data, cpu);
> > rnp = rdp->mynode;
> > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> >
> > // nohz_full CPUs need the tick for stop-machine to work quickly
> > tick_dep_set(TICK_DEP_BIT_RCU);
> > +
> > + krcp = per_cpu_ptr(&krc, cpu);
> > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > + schedule_delayed_work(&krcp->monitor_work, 0);
> > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > return 0;
>
> I realized the above is not good enough for what this is trying to do. Unlike
> the slab, the new kfree_rcu objects cannot always be drained / submitted to
> RCU because the previous batch may still be waiting for a grace period. So
> the above code could very well return with the yet-to-be-submitted kfree_rcu
> objects still in the cache.
>
> One option is to spin-wait here for monitor_todo to be false and keep calling
> kfree_rcu_drain_unlock() till then.
>
> But then that's not good enough either, because if new objects are queued
> when interrupts are enabled in the CPU offline path, then the cache will get
> new objects after the previous set was drained. Further, spin waiting may
> introduce deadlocks.
>
> Another option is to switch the kfree_rcu() path to non-batching (so new
> objects cannot be cached in the offline path and are submitted directly to
> RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> kfree_rcu() will like that.

Or spawn a workqueue that does something like this:

1. Get any pending kvfree_rcu() requests sent off to RCU.

2. Do an rcu_barrier().

3. Do the cleanup actions.

> Probably Qian's original fix for for_each_possible_cpus() is good enough for
> the shrinker case, and then we can tackle the hotplug one.

It might take some experimentation to find the best solution.

Thanx, Paul

2020-08-19 15:56:03

by Joel Fernandes

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > >
> > >
> > > ________________________________________
> > > 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> > > 发送时间: 2020年8月19日 8:04
> > > 收件人: Paul E. McKenney
> > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > >
> > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > {
> > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > > + struct kfree_rcu_cpu *krcp;
> > > > >
> > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > return 0;
> > > > >
> > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > + krcp = this_cpu_ptr(&krc)
> > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > +
> > > > >
> > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > Maybe be never.
> > > >
> > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > > hard time getting too worried about it. ;-)
> > >
> > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > >
> > > >thanks,
> > > >
> > > >- Joel
> > >
> > > When cpu going offline, the slub or slab only flush free objects in offline
> > > cpu cache, put these free objects in node list or return buddy system,
> > > for those who are still in use, they still stay offline cpu cache.
> > >
> > > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > called in "cpuhp/%u" per-cpu thread.
> > >
> >
> > Could you please wrap text properly when you post to mailing list, thanks. I
> > fixed it for you above.
> >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > unsigned long flags;
> > > struct rcu_data *rdp;
> > > struct rcu_node *rnp;
> > > + struct kfree_rcu_cpu *krcp;
> > >
> > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > rnp = rdp->mynode;
> > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >
> > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > +
> > > + krcp = per_cpu_ptr(&krc, cpu);
> > > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > return 0;
> >
> > I realized the above is not good enough for what this is trying to do. Unlike
> > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > RCU because the previous batch may still be waiting for a grace period. So
> > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > objects still in the cache.
> >
> > One option is to spin-wait here for monitor_todo to be false and keep calling
> > kfree_rcu_drain_unlock() till then.
> >
> > But then that's not good enough either, because if new objects are queued
> > when interrupts are enabled in the CPU offline path, then the cache will get
> > new objects after the previous set was drained. Further, spin waiting may
> > introduce deadlocks.
> >
> > Another option is to switch the kfree_rcu() path to non-batching (so new
> > objects cannot be cached in the offline path and are submitted directly to
> > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > kfree_rcu() will like that.
>
> Or spawn a workqueue that does something like this:
>
> 1. Get any pending kvfree_rcu() requests sent off to RCU.
>
> 2. Do an rcu_barrier().
>
> 3. Do the cleanup actions.

One more thing I'n not sure about is if this will even be an issue because
the delayed-work to run the monitor could be migrated to another (both timers
and wq get migrated). So in that case, the draining of the kfree_rcu objects
will happened on another CPU.

Adding Boqun as well since I believe he knows a lot about wq and hotplug as
well.

Basically my question is does delayed_work get migrated to another CPU when
one CPU is offline? If so, I am assuming all races between new
schedule_delayed_work invocations and CPU offlining are handled as well, such
that at some point those new invocations are queued on another CPU.

> > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > the shrinker case, and then we can tackle the hotplug one.
>
> It might take some experimentation to find the best solution.

Sure.

thanks,

- Joel

2020-08-19 15:59:35

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > >
> > >
> > > ________________________________________
> > > 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> > > 发送时间: 2020年8月19日 8:04
> > > 收件人: Paul E. McKenney
> > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > >
> > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > {
> > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > > + struct kfree_rcu_cpu *krcp;
> > > > >
> > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > return 0;
> > > > >
> > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > + krcp = this_cpu_ptr(&krc)
> > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > +
> > > > >
> > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > Maybe be never.
> > > >
> > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > > hard time getting too worried about it. ;-)
> > >
> > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > >
> > > >thanks,
> > > >
> > > >- Joel
> > >
> > > When cpu going offline, the slub or slab only flush free objects in offline
> > > cpu cache, put these free objects in node list or return buddy system,
> > > for those who are still in use, they still stay offline cpu cache.
> > >
> > > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > called in "cpuhp/%u" per-cpu thread.
> > >
> >
> > Could you please wrap text properly when you post to mailing list, thanks. I
> > fixed it for you above.
> >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > unsigned long flags;
> > > struct rcu_data *rdp;
> > > struct rcu_node *rnp;
> > > + struct kfree_rcu_cpu *krcp;
> > >
> > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > rnp = rdp->mynode;
> > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > >
> > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > +
> > > + krcp = per_cpu_ptr(&krc, cpu);
> > > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > return 0;
> >
> > I realized the above is not good enough for what this is trying to do. Unlike
> > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > RCU because the previous batch may still be waiting for a grace period. So
> > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > objects still in the cache.
> >
> > One option is to spin-wait here for monitor_todo to be false and keep calling
> > kfree_rcu_drain_unlock() till then.
> >
> > But then that's not good enough either, because if new objects are queued
> > when interrupts are enabled in the CPU offline path, then the cache will get
> > new objects after the previous set was drained. Further, spin waiting may
> > introduce deadlocks.
> >
> > Another option is to switch the kfree_rcu() path to non-batching (so new
> > objects cannot be cached in the offline path and are submitted directly to
> > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > kfree_rcu() will like that.
>
> Or spawn a workqueue that does something like this:
>
> 1. Get any pending kvfree_rcu() requests sent off to RCU.
>
> 2. Do an rcu_barrier().
>
> 3. Do the cleanup actions.
>
> > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > the shrinker case, and then we can tackle the hotplug one.
>
> It might take some experimentation to find the best solution.
>

<snip>
static void do_idle(void)
{
...
while (!need_resched()) {
rmb();

local_irq_disable();

if (cpu_is_offline(cpu)) {
tick_nohz_idle_stop_tick();
cpuhp_report_idle_dead();
-> cpuhp_report_idle_dead(void)
-> rcu_report_dead(smp_processor_id());
arch_cpu_idle_dead();
}
...
<snip>

We have the rcu_report_dead() callback. When it gets called IRQs are off
and CPU that is in question is offline.

krcp = per_cpu_ptr(&krc, cpu);
raw_spin_lock_irqsave(&krcp->lock, flags);
krcp->monotro_todo = true;
schedule_delayed_work(&krcp->monitor_work, 0);
raw_spin_unlock_irqrestore(&krcp->lock, flags);

If there is a batch that is in progress, the job will rearm itself.
But i agree, it requires more experiments.


--
Vlad Rezki

2020-08-20 22:41:24

by Joel Fernandes

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > >
> > > >
> > > > ________________________________________
> > > > 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> > > > 发送时间: 2020年8月19日 8:04
> > > > 收件人: Paul E. McKenney
> > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > >
> > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > {
> > > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > > > + struct kfree_rcu_cpu *krcp;
> > > > > >
> > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > return 0;
> > > > > >
> > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > + krcp = this_cpu_ptr(&krc)
> > > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > +
> > > > > >
> > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > Maybe be never.
> > > > >
> > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > > > hard time getting too worried about it. ;-)
> > > >
> > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > >
> > > > >thanks,
> > > > >
> > > > >- Joel
> > > >
> > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > cpu cache, put these free objects in node list or return buddy system,
> > > > for those who are still in use, they still stay offline cpu cache.
> > > >
> > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > called in "cpuhp/%u" per-cpu thread.
> > > >
> > >
> > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > fixed it for you above.
> > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > unsigned long flags;
> > > > struct rcu_data *rdp;
> > > > struct rcu_node *rnp;
> > > > + struct kfree_rcu_cpu *krcp;
> > > >
> > > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > rnp = rdp->mynode;
> > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > >
> > > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > +
> > > > + krcp = per_cpu_ptr(&krc, cpu);
> > > > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > return 0;
> > >
> > > I realized the above is not good enough for what this is trying to do. Unlike
> > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > RCU because the previous batch may still be waiting for a grace period. So
> > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > objects still in the cache.
> > >
> > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > kfree_rcu_drain_unlock() till then.
> > >
> > > But then that's not good enough either, because if new objects are queued
> > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > new objects after the previous set was drained. Further, spin waiting may
> > > introduce deadlocks.
> > >
> > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > objects cannot be cached in the offline path and are submitted directly to
> > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > kfree_rcu() will like that.
> >
> > Or spawn a workqueue that does something like this:
> >
> > 1. Get any pending kvfree_rcu() requests sent off to RCU.
> >
> > 2. Do an rcu_barrier().
> >
> > 3. Do the cleanup actions.
> >
> > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > the shrinker case, and then we can tackle the hotplug one.
> >
> > It might take some experimentation to find the best solution.
> >
>
> <snip>
> static void do_idle(void)
> {
> ...
> while (!need_resched()) {
> rmb();
>
> local_irq_disable();
>
> if (cpu_is_offline(cpu)) {
> tick_nohz_idle_stop_tick();
> cpuhp_report_idle_dead();
> -> cpuhp_report_idle_dead(void)
> -> rcu_report_dead(smp_processor_id());
> arch_cpu_idle_dead();
> }
> ...
> <snip>
>
> We have the rcu_report_dead() callback. When it gets called IRQs are off
> and CPU that is in question is offline.
>
> krcp = per_cpu_ptr(&krc, cpu);
> raw_spin_lock_irqsave(&krcp->lock, flags);
> krcp->monotro_todo = true;
> schedule_delayed_work(&krcp->monitor_work, 0);
> raw_spin_unlock_irqrestore(&krcp->lock, flags);
>
> If there is a batch that is in progress, the job will rearm itself.
> But i agree, it requires more experiments.

I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
get migrated during the CPU offline path, so it is not an issue.

In this case, Qiang's initial patch suffices to fix the shrinker issue.

thanks,

- Joel

2020-08-21 15:34:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > > >
> > > > >
> > > > > ________________________________________
> > > > > 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> > > > > 发送时间: 2020年8月19日 8:04
> > > > > 收件人: Paul E. McKenney
> > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > >
> > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > > {
> > > > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > > > > + struct kfree_rcu_cpu *krcp;
> > > > > > >
> > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > > return 0;
> > > > > > >
> > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > > + krcp = this_cpu_ptr(&krc)
> > > > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > > +
> > > > > > >
> > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > > Maybe be never.
> > > > > >
> > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > > > > hard time getting too worried about it. ;-)
> > > > >
> > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > >
> > > > > >thanks,
> > > > > >
> > > > > >- Joel
> > > > >
> > > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > > cpu cache, put these free objects in node list or return buddy system,
> > > > > for those who are still in use, they still stay offline cpu cache.
> > > > >
> > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > > called in "cpuhp/%u" per-cpu thread.
> > > > >
> > > >
> > > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > > fixed it for you above.
> > > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > unsigned long flags;
> > > > > struct rcu_data *rdp;
> > > > > struct rcu_node *rnp;
> > > > > + struct kfree_rcu_cpu *krcp;
> > > > >
> > > > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > rnp = rdp->mynode;
> > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > >
> > > > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > +
> > > > > + krcp = per_cpu_ptr(&krc, cpu);
> > > > > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > > return 0;
> > > >
> > > > I realized the above is not good enough for what this is trying to do. Unlike
> > > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > > RCU because the previous batch may still be waiting for a grace period. So
> > > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > > objects still in the cache.
> > > >
> > > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > > kfree_rcu_drain_unlock() till then.
> > > >
> > > > But then that's not good enough either, because if new objects are queued
> > > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > > new objects after the previous set was drained. Further, spin waiting may
> > > > introduce deadlocks.
> > > >
> > > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > > objects cannot be cached in the offline path and are submitted directly to
> > > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > > kfree_rcu() will like that.
> > >
> > > Or spawn a workqueue that does something like this:
> > >
> > > 1. Get any pending kvfree_rcu() requests sent off to RCU.
> > >
> > > 2. Do an rcu_barrier().
> > >
> > > 3. Do the cleanup actions.
> > >
> > > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > > the shrinker case, and then we can tackle the hotplug one.
> > >
> > > It might take some experimentation to find the best solution.
> > >
> >
> > <snip>
> > static void do_idle(void)
> > {
> > ...
> > while (!need_resched()) {
> > rmb();
> >
> > local_irq_disable();
> >
> > if (cpu_is_offline(cpu)) {
> > tick_nohz_idle_stop_tick();
> > cpuhp_report_idle_dead();
> > -> cpuhp_report_idle_dead(void)
> > -> rcu_report_dead(smp_processor_id());
> > arch_cpu_idle_dead();
> > }
> > ...
> > <snip>
> >
> > We have the rcu_report_dead() callback. When it gets called IRQs are off
> > and CPU that is in question is offline.
> >
> > krcp = per_cpu_ptr(&krc, cpu);
> > raw_spin_lock_irqsave(&krcp->lock, flags);
> > krcp->monotro_todo = true;
> > schedule_delayed_work(&krcp->monitor_work, 0);
> > raw_spin_unlock_irqrestore(&krcp->lock, flags);
> >
> > If there is a batch that is in progress, the job will rearm itself.
> > But i agree, it requires more experiments.
>
> I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
> get migrated during the CPU offline path, so it is not an issue.
>
> In this case, Qiang's initial patch suffices to fix the shrinker issue.

As in the patch that is currented in -rcu, correct?

Thanx, Paul

2020-08-31 09:31:39

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: 回复 : [PATCH] rcu: shrink each possible cpu krcp

On Fri, Aug 21, 2020 at 08:33:28AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 20, 2020 at 06:39:57PM -0400, Joel Fernandes wrote:
> > On Wed, Aug 19, 2020 at 05:58:08PM +0200, Uladzislau Rezki wrote:
> > > On Wed, Aug 19, 2020 at 08:21:59AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 19, 2020 at 09:56:54AM -0400, Joel Fernandes wrote:
> > > > > On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
> > > > > >
> > > > > >
> > > > > > ________________________________________
> > > > > > 发件人: [email protected] <[email protected]> 代表 Joel Fernandes <[email protected]>
> > > > > > 发送时间: 2020年8月19日 8:04
> > > > > > 收件人: Paul E. McKenney
> > > > > > 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> > > > > > 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
> > > > > >
> > > > > > On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <[email protected]> wrote:
> > > > > >
> > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > index b8ccd7b5af82..6decb9ad2421 100644
> > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > > > > > > {
> > > > > > > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > > > > > > + struct kfree_rcu_cpu *krcp;
> > > > > > > >
> > > > > > > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > > > > > > return 0;
> > > > > > > >
> > > > > > > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > > > > > > + krcp = this_cpu_ptr(&krc)
> > > > > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > > > +
> > > > > > > >
> > > > > > > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > > > > > > Maybe be never.
> > > > > > >
> > > > > > > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > > > > > > hard time getting too worried about it. ;-)
> > > > > >
> > > > > > >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> > > > > > >on the cache reaper who's job is to flush the per-cpu caches. So I
> > > > > > >believe during CPU offlining, the per-cpu slab caches are flushed.
> > > > > > >
> > > > > > >thanks,
> > > > > > >
> > > > > > >- Joel
> > > > > >
> > > > > > When cpu going offline, the slub or slab only flush free objects in offline
> > > > > > cpu cache, put these free objects in node list or return buddy system,
> > > > > > for those who are still in use, they still stay offline cpu cache.
> > > > > >
> > > > > > If we want clean per-cpu "krcp" objects when cpu going offline. we should
> > > > > > free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> > > > > > before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> > > > > > called in "cpuhp/%u" per-cpu thread.
> > > > > >
> > > > >
> > > > > Could you please wrap text properly when you post to mailing list, thanks. I
> > > > > fixed it for you above.
> > > > >
> > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > index 8ce77d9ac716..1812d4a1ac1b 100644
> > > > > > --- a/kernel/rcu/tree.c
> > > > > > +++ b/kernel/rcu/tree.c
> > > > > > @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > > unsigned long flags;
> > > > > > struct rcu_data *rdp;
> > > > > > struct rcu_node *rnp;
> > > > > > + struct kfree_rcu_cpu *krcp;
> > > > > >
> > > > > > rdp = per_cpu_ptr(&rcu_data, cpu);
> > > > > > rnp = rdp->mynode;
> > > > > > @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
> > > > > >
> > > > > > // nohz_full CPUs need the tick for stop-machine to work quickly
> > > > > > tick_dep_set(TICK_DEP_BIT_RCU);
> > > > > > +
> > > > > > + krcp = per_cpu_ptr(&krc, cpu);
> > > > > > + raw_spin_lock_irqsave(&krcp->lock, flags);
> > > > > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > > > > + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > > > > > return 0;
> > > > >
> > > > > I realized the above is not good enough for what this is trying to do. Unlike
> > > > > the slab, the new kfree_rcu objects cannot always be drained / submitted to
> > > > > RCU because the previous batch may still be waiting for a grace period. So
> > > > > the above code could very well return with the yet-to-be-submitted kfree_rcu
> > > > > objects still in the cache.
> > > > >
> > > > > One option is to spin-wait here for monitor_todo to be false and keep calling
> > > > > kfree_rcu_drain_unlock() till then.
> > > > >
> > > > > But then that's not good enough either, because if new objects are queued
> > > > > when interrupts are enabled in the CPU offline path, then the cache will get
> > > > > new objects after the previous set was drained. Further, spin waiting may
> > > > > introduce deadlocks.
> > > > >
> > > > > Another option is to switch the kfree_rcu() path to non-batching (so new
> > > > > objects cannot be cached in the offline path and are submitted directly to
> > > > > RCU), wait for a GP and then submit the work. But then not sure if 1-argument
> > > > > kfree_rcu() will like that.
> > > >
> > > > Or spawn a workqueue that does something like this:
> > > >
> > > > 1. Get any pending kvfree_rcu() requests sent off to RCU.
> > > >
> > > > 2. Do an rcu_barrier().
> > > >
> > > > 3. Do the cleanup actions.
> > > >
> > > > > Probably Qian's original fix for for_each_possible_cpus() is good enough for
> > > > > the shrinker case, and then we can tackle the hotplug one.
> > > >
> > > > It might take some experimentation to find the best solution.
> > > >
> > >
> > > <snip>
> > > static void do_idle(void)
> > > {
> > > ...
> > > while (!need_resched()) {
> > > rmb();
> > >
> > > local_irq_disable();
> > >
> > > if (cpu_is_offline(cpu)) {
> > > tick_nohz_idle_stop_tick();
> > > cpuhp_report_idle_dead();
> > > -> cpuhp_report_idle_dead(void)
> > > -> rcu_report_dead(smp_processor_id());
> > > arch_cpu_idle_dead();
> > > }
> > > ...
> > > <snip>
> > >
> > > We have the rcu_report_dead() callback. When it gets called IRQs are off
> > > and CPU that is in question is offline.
> > >
> > > krcp = per_cpu_ptr(&krc, cpu);
> > > raw_spin_lock_irqsave(&krcp->lock, flags);
> > > krcp->monotro_todo = true;
> > > schedule_delayed_work(&krcp->monitor_work, 0);
> > > raw_spin_unlock_irqrestore(&krcp->lock, flags);
> > >
> > > If there is a batch that is in progress, the job will rearm itself.
> > > But i agree, it requires more experiments.
> >
> > I chatted with Ulad and we believe the timer and/or (delayed) workqueue will
> > get migrated during the CPU offline path, so it is not an issue.
> >
> > In this case, Qiang's initial patch suffices to fix the shrinker issue.
>
> As in the patch that is currented in -rcu, correct?
>
The "[PATCH] rcu: shrink each possible cpu krcp" will fix the
shrinker "issue" when CPU goes offline. So that was valid concern.

As for "main track", our drain work or a timer that triggers it
will be migrated anyway. Just to repeat what Joel wrote earlier,
we do not have any issues with this part.

--
Vlad Rezki

2020-09-09 06:38:07

by Zhang, Qiang

[permalink] [raw]
Subject:

When config preempt RCU, if task switch happened in rcu read critical region.
the current task be add to "rnp->blkd_tasks" link list, these rnp is leaf node in RCU tree.
In "rcu_gp_fqs_loop" func,

static void rcu_gp_fqs_loop(void)
{

struct rcu_node *rnp = rcu_get_root();
}

2020-09-09 07:05:24

by Zhang, Qiang

[permalink] [raw]
Subject: RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func


When config preempt RCU, and then there are multiple levels node, the current task is preempted in rcu read critical region.
the current task be add to "rnp->blkd_tasks" link list, and the "rnp->gp_tasks" may be assigned a value . these rnp is leaf node in RCU tree.

But in "rcu_gp_fqs_loop" func, we check blocked readers in root node.

static void rcu_gp_fqs_loop(void)
{
.....
struct rcu_node *rnp = rcu_get_root();
.....
if (!READ_ONCE(rnp->qsmask) &&
!rcu_preempt_blocked_readers_cgp(rnp)) ------> rnp is root node
break;
....
}

the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value, this check is invailed.
Should we check leaf nodes like this

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void)
return true;
}

+static bool rcu_preempt_blocked_readers(void)
+{
+ struct rcu_node *rnp;
+ unsigned long flags;
+ bool ret = false;
+
+ rcu_for_each_leaf_node(rnp) {
+ raw_spin_lock_irqsave_rcu_node(rnp, flags);
+ if (rcu_preempt_blocked_readers_cgp(rnp)) {
+ ret = true;
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ break;
+ }
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ }
+
+ return ret;
+}
+
/*
* Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
* time.
@@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
return true;

// The current grace period has completed.
- if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
+ if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers())
return true;

return false;
@@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void)
/* Locking provides needed memory barriers. */
/* If grace period done, leave loop. */
if (!READ_ONCE(rnp->qsmask) &&
- !rcu_preempt_blocked_readers_cgp(rnp))
+ !rcu_preempt_blocked_readers())
break;
/* If time for quiescent-state forcing, do it. */
if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
--


thanks
Qiang

2020-09-09 15:23:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func

On Wed, Sep 09, 2020 at 07:03:39AM +0000, Zhang, Qiang wrote:
>
> When config preempt RCU, and then there are multiple levels node, the current task is preempted in rcu read critical region.
> the current task be add to "rnp->blkd_tasks" link list, and the "rnp->gp_tasks" may be assigned a value . these rnp is leaf node in RCU tree.
>
> But in "rcu_gp_fqs_loop" func, we check blocked readers in root node.
>
> static void rcu_gp_fqs_loop(void)
> {
> .....
> struct rcu_node *rnp = rcu_get_root();
> .....
> if (!READ_ONCE(rnp->qsmask) &&
> !rcu_preempt_blocked_readers_cgp(rnp)) ------> rnp is root node
> break;
> ....
> }
>
> the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value, this check is invailed.
> Should we check leaf nodes like this

There are two cases:

1. There is only a single rcu_node structure, which is both root
and leaf. In this case, the current check is required: Both
->qsmask and the ->blkd_tasks list must be checked. Your
rcu_preempt_blocked_readers() would work in this case, but
the current code is a bit faster because it does not need
to acquire the ->lock nor does it need the loop overhead.

2. There are multiple levels. In this case, as you say, the root
rcu_node structure's ->blkd_tasks list will always be empty.
But also in this case, the root rcu_node structure's ->qsmask
cannot be zero until all the leaf rcu_node structures' ->qsmask
fields are zero and their ->blkd_tasks lists no longer have
tasks blocking the current grace period. This means that your
rcu_preempt_blocked_readers() function would never return
true in this case.

So the current code is fine.

Are you seeing failures on mainline kernels? If so, what is the failure
mode?

Thanx, Paul

> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void)
> return true;
> }
>
> +static bool rcu_preempt_blocked_readers(void)
> +{
> + struct rcu_node *rnp;
> + unsigned long flags;
> + bool ret = false;
> +
> + rcu_for_each_leaf_node(rnp) {
> + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> + if (rcu_preempt_blocked_readers_cgp(rnp)) {
> + ret = true;
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + break;
> + }
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + }
> +
> + return ret;
> +}
> +
> /*
> * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
> * time.
> @@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
> return true;
>
> // The current grace period has completed.
> - if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
> + if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers())
> return true;
>
> return false;
> @@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void)
> /* Locking provides needed memory barriers. */
> /* If grace period done, leave loop. */
> if (!READ_ONCE(rnp->qsmask) &&
> - !rcu_preempt_blocked_readers_cgp(rnp))
> + !rcu_preempt_blocked_readers())
> break;
> /* If time for quiescent-state forcing, do it. */
> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> --
>
>
> thanks
> Qiang

2020-09-10 03:29:47

by Zhang, Qiang

[permalink] [raw]
Subject: 回复: RCU: Question rcu_preempt_blocked_re aders_cgp in rcu_gp_fqs_loop func



________________________________________
??????: Paul E. McKenney <[email protected]>
????ʱ??: 2020??9??9?? 19:22
?ռ???: Zhang, Qiang
????: Joel Fernandes; Uladzislau Rezki; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
????: Re: RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func

On Wed, Sep 09, 2020 at 07:03:39AM +0000, Zhang, Qiang wrote:
>
> When config preempt RCU, and then there are multiple levels node, the current task is preempted in rcu read critical region.
> the current task be add to "rnp->blkd_tasks" link list, and the "rnp->gp_tasks" may be assigned a value . these rnp is leaf node in RCU tree.
>
> But in "rcu_gp_fqs_loop" func, we check blocked readers in root node.
>
> static void rcu_gp_fqs_loop(void)
> {
> .....
> struct rcu_node *rnp = rcu_get_root();
> .....
> if (!READ_ONCE(rnp->qsmask) &&
> !rcu_preempt_blocked_readers_cgp(rnp)) ------> rnp is root node
> break;
> ....
> }
>
> the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value, this check is invailed.
> Should we check leaf nodes like this

>There are two cases:

>1. There is only a single rcu_node structure, which is both root
> and leaf. In this case, the current check is required: Both
> ->qsmask and the ->blkd_tasks list must be checked. Your
> rcu_preempt_blocked_readers() would work in this case, but
> the current code is a bit faster because it does not need
> to acquire the ->lock nor does it need the loop overhead.

>2. There are multiple levels. In this case, as you say, the root
> rcu_node structure's ->blkd_tasks list will always be empty.
> But also in this case, the root rcu_node structure's ->qsmask
> cannot be zero until all the leaf rcu_node structures' ->qsmask
> fields are zero and their ->blkd_tasks lists no longer have
> tasks blocking the current grace period. This means that your
> rcu_preempt_blocked_readers() function would never return
> true in this case.

>So the current code is fine.

>Are you seeing failures on mainline kernels? If so, what is the failure
>mode?

Yes it's right, thank you for your explanation.

thanks
Qiang

> Thanx, Paul

> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1846,6 +1846,25 @@ static bool rcu_gp_init(void)
> return true;
> }
>
> +static bool rcu_preempt_blocked_readers(void)
> +{
> + struct rcu_node *rnp;
> + unsigned long flags;
> + bool ret = false;
> +
> + rcu_for_each_leaf_node(rnp) {
> + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> + if (rcu_preempt_blocked_readers_cgp(rnp)) {
> + ret = true;
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + break;
> + }
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + }
> +
> + return ret;
> +}
> +
> /*
> * Helper function for swait_event_idle_exclusive() wakeup at force-quiescent-state
> * time.
> @@ -1864,7 +1883,7 @@ static bool rcu_gp_fqs_check_wake(int *gfp)
> return true;
>
> // The current grace period has completed.
> - if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers_cgp(rnp))
> + if (!READ_ONCE(rnp->qsmask) && !rcu_preempt_blocked_readers())
> return true;
>
> return false;
> @@ -1927,7 +1946,7 @@ static void rcu_gp_fqs_loop(void)
> /* Locking provides needed memory barriers. */
> /* If grace period done, leave loop. */
> if (!READ_ONCE(rnp->qsmask) &&
> - !rcu_preempt_blocked_readers_cgp(rnp))
> + !rcu_preempt_blocked_readers())
> break;
> /* If time for quiescent-state forcing, do it. */
> if (!time_after(rcu_state.jiffies_force_qs, jiffies) ||
> --
>
>
> thanks
> Qiang

2020-09-14 20:10:06

by Joel Fernandes

[permalink] [raw]
Subject: Re: RCU: Question rcu_preempt_blocked_readers_cgp in rcu_gp_fqs_loop func

On Wed, Sep 09, 2020 at 04:22:41AM -0700, Paul E. McKenney wrote:
> On Wed, Sep 09, 2020 at 07:03:39AM +0000, Zhang, Qiang wrote:
> >
> > When config preempt RCU, and then there are multiple levels node, the current task is preempted in rcu read critical region.
> > the current task be add to "rnp->blkd_tasks" link list, and the "rnp->gp_tasks" may be assigned a value . these rnp is leaf node in RCU tree.
> >
> > But in "rcu_gp_fqs_loop" func, we check blocked readers in root node.
> >
> > static void rcu_gp_fqs_loop(void)
> > {
> > .....
> > struct rcu_node *rnp = rcu_get_root();
> > .....
> > if (!READ_ONCE(rnp->qsmask) &&
> > !rcu_preempt_blocked_readers_cgp(rnp)) ------> rnp is root node
> > break;
> > ....
> > }
> >
> > the root node's blkd_tasks never add task, the "rnp->gp_tasks" is never be assigned value, this check is invailed.
> > Should we check leaf nodes like this
>
> There are two cases:
>
> 1. There is only a single rcu_node structure, which is both root
> and leaf. In this case, the current check is required: Both
> ->qsmask and the ->blkd_tasks list must be checked. Your
> rcu_preempt_blocked_readers() would work in this case, but
> the current code is a bit faster because it does not need
> to acquire the ->lock nor does it need the loop overhead.
>
> 2. There are multiple levels. In this case, as you say, the root
> rcu_node structure's ->blkd_tasks list will always be empty.
> But also in this case, the root rcu_node structure's ->qsmask
> cannot be zero until all the leaf rcu_node structures' ->qsmask
> fields are zero and their ->blkd_tasks lists no longer have
> tasks blocking the current grace period. This means that your
> rcu_preempt_blocked_readers() function would never return
> true in this case.
>
> So the current code is fine.
>
> Are you seeing failures on mainline kernels? If so, what is the failure
> mode?

Nice question! He had me wondering there too for a bit ;-)

Thanks,

- Joel