LinuxLists.cc - [PATCH v2] bpf: Simplify code by using for_each_cpu

2022-09-07 16:16:39

Subject: [PATCH v2] bpf: Simplify code by using for_each_cpu_wrap()

In the percpu freelist code, it is a common pattern to iterate over
the possible CPUs mask starting with the current CPU. The pattern is
implemented using a hand rolled while loop with the loop variable
increment being open-coded.

Simplify the code by using for_each_cpu_wrap() helper to iterate over
the possible cpus starting with the current CPU. As a result, some of
the special-casing in the loop also gets simplified.

No functional change intended.

Signed-off-by: Punit Agrawal <[email protected]>
---
v1 -> v2:
* Fixed the incorrect transformation changing semantics of __pcpu_freelist_push_nmi()

Previous version -
v1: https://lore.kernel.org/all/[email protected]/

kernel/bpf/percpu_freelist.c | 48 ++++++++++++------------------------
1 file changed, 16 insertions(+), 32 deletions(-)

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 00b874c8e889..b6e7f5c5b9ab 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -58,23 +58,21 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
{
int cpu, orig_cpu;

- orig_cpu = cpu = raw_smp_processor_id();
+ orig_cpu = raw_smp_processor_id();
while (1) {
- struct pcpu_freelist_head *head;
+ for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
+ struct pcpu_freelist_head *head;

- head = per_cpu_ptr(s->freelist, cpu);
- if (raw_spin_trylock(&head->lock)) {
- pcpu_freelist_push_node(head, node);
- raw_spin_unlock(&head->lock);
- return;
+ head = per_cpu_ptr(s->freelist, cpu);
+ if (raw_spin_trylock(&head->lock)) {
+ pcpu_freelist_push_node(head, node);
+ raw_spin_unlock(&head->lock);
+ return;
+ }
}
- cpu = cpumask_next(cpu, cpu_possible_mask);
- if (cpu >= nr_cpu_ids)
- cpu = 0;

/* cannot lock any per cpu lock, try extralist */
- if (cpu == orig_cpu &&
- pcpu_freelist_try_push_extra(s, node))
+ if (pcpu_freelist_try_push_extra(s, node))
return;
}
}
@@ -125,13 +123,12 @@ static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
{
struct pcpu_freelist_head *head;
struct pcpu_freelist_node *node;
- int orig_cpu, cpu;
+ int cpu;

- orig_cpu = cpu = raw_smp_processor_id();
- while (1) {
+ for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
head = per_cpu_ptr(s->freelist, cpu);
if (!READ_ONCE(head->first))
- goto next_cpu;
+ continue;
raw_spin_lock(&head->lock);
node = head->first;
if (node) {
@@ -140,12 +137,6 @@ static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
return node;
}
raw_spin_unlock(&head->lock);
-next_cpu:
- cpu = cpumask_next(cpu, cpu_possible_mask);
- if (cpu >= nr_cpu_ids)
- cpu = 0;
- if (cpu == orig_cpu)
- break;
}

/* per cpu lists are all empty, try extralist */
@@ -164,13 +155,12 @@ ___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
{
struct pcpu_freelist_head *head;
struct pcpu_freelist_node *node;
- int orig_cpu, cpu;
+ int cpu;

- orig_cpu = cpu = raw_smp_processor_id();
- while (1) {
+ for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
head = per_cpu_ptr(s->freelist, cpu);
if (!READ_ONCE(head->first))
- goto next_cpu;
+ continue;
if (raw_spin_trylock(&head->lock)) {
node = head->first;
if (node) {
@@ -180,12 +170,6 @@ ___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
}
raw_spin_unlock(&head->lock);
}
-next_cpu:
- cpu = cpumask_next(cpu, cpu_possible_mask);
- if (cpu >= nr_cpu_ids)
- cpu = 0;
- if (cpu == orig_cpu)
- break;
}

/* cannot pop from per cpu lists, try extralist */
--
2.35.1

2022-09-08 01:28:42

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH v2] bpf: Simplify code by using for_each_cpu_wrap()

On Wed, Sep 7, 2022 at 8:58 AM Punit Agrawal
<[email protected]> wrote:
>
> In the percpu freelist code, it is a common pattern to iterate over
> the possible CPUs mask starting with the current CPU. The pattern is
> implemented using a hand rolled while loop with the loop variable
> increment being open-coded.
>
> Simplify the code by using for_each_cpu_wrap() helper to iterate over
> the possible cpus starting with the current CPU. As a result, some of
> the special-casing in the loop also gets simplified.
>
> No functional change intended.
>
> Signed-off-by: Punit Agrawal <[email protected]>
> ---
> v1 -> v2:
> * Fixed the incorrect transformation changing semantics of __pcpu_freelist_push_nmi()
>
> Previous version -
> v1: https://lore.kernel.org/all/[email protected]/
>
> kernel/bpf/percpu_freelist.c | 48 ++++++++++++------------------------
> 1 file changed, 16 insertions(+), 32 deletions(-)
>
> diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
> index 00b874c8e889..b6e7f5c5b9ab 100644
> --- a/kernel/bpf/percpu_freelist.c
> +++ b/kernel/bpf/percpu_freelist.c
> @@ -58,23 +58,21 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
> {
> int cpu, orig_cpu;
>
> - orig_cpu = cpu = raw_smp_processor_id();
> + orig_cpu = raw_smp_processor_id();
> while (1) {
> - struct pcpu_freelist_head *head;
> + for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
> + struct pcpu_freelist_head *head;
>
> - head = per_cpu_ptr(s->freelist, cpu);
> - if (raw_spin_trylock(&head->lock)) {
> - pcpu_freelist_push_node(head, node);
> - raw_spin_unlock(&head->lock);
> - return;
> + head = per_cpu_ptr(s->freelist, cpu);
> + if (raw_spin_trylock(&head->lock)) {
> + pcpu_freelist_push_node(head, node);
> + raw_spin_unlock(&head->lock);
> + return;
> + }
> }
> - cpu = cpumask_next(cpu, cpu_possible_mask);
> - if (cpu >= nr_cpu_ids)
> - cpu = 0;

I personally don't like nested loops here. Maybe we can keep
the original while loop and use cpumask_next_wrap()?

Thanks,
Song

>
> /* cannot lock any per cpu lock, try extralist */
> - if (cpu == orig_cpu &&
> - pcpu_freelist_try_push_extra(s, node))
> + if (pcpu_freelist_try_push_extra(s, node))
> return;
> }
> }
> @@ -125,13 +123,12 @@ static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
> {
> struct pcpu_freelist_head *head;
> struct pcpu_freelist_node *node;
> - int orig_cpu, cpu;
> + int cpu;
>
> - orig_cpu = cpu = raw_smp_processor_id();
> - while (1) {
> + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
> head = per_cpu_ptr(s->freelist, cpu);
> if (!READ_ONCE(head->first))
> - goto next_cpu;
> + continue;
> raw_spin_lock(&head->lock);
> node = head->first;
> if (node) {
> @@ -140,12 +137,6 @@ static struct pcpu_freelist_node *___pcpu_freelist_pop(struct pcpu_freelist *s)
> return node;
> }
> raw_spin_unlock(&head->lock);
> -next_cpu:
> - cpu = cpumask_next(cpu, cpu_possible_mask);
> - if (cpu >= nr_cpu_ids)
> - cpu = 0;
> - if (cpu == orig_cpu)
> - break;
> }
>
> /* per cpu lists are all empty, try extralist */
> @@ -164,13 +155,12 @@ ___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
> {
> struct pcpu_freelist_head *head;
> struct pcpu_freelist_node *node;
> - int orig_cpu, cpu;
> + int cpu;
>
> - orig_cpu = cpu = raw_smp_processor_id();
> - while (1) {
> + for_each_cpu_wrap(cpu, cpu_possible_mask, raw_smp_processor_id()) {
> head = per_cpu_ptr(s->freelist, cpu);
> if (!READ_ONCE(head->first))
> - goto next_cpu;
> + continue;
> if (raw_spin_trylock(&head->lock)) {
> node = head->first;
> if (node) {
> @@ -180,12 +170,6 @@ ___pcpu_freelist_pop_nmi(struct pcpu_freelist *s)
> }
> raw_spin_unlock(&head->lock);
> }
> -next_cpu:
> - cpu = cpumask_next(cpu, cpu_possible_mask);
> - if (cpu >= nr_cpu_ids)
> - cpu = 0;
> - if (cpu == orig_cpu)
> - break;
> }
>
> /* cannot pop from per cpu lists, try extralist */
> --
> 2.35.1
>

2022-09-08 10:54:34

by Punit Agrawal

[permalink] [raw]

Subject: Re: Re: [PATCH v2] bpf: Simplify code by using for_each_cpu_wrap()

Hi Song,

Thanks for taking a look.

Song Liu <[email protected]> writes:

> On Wed, Sep 7, 2022 at 8:58 AM Punit Agrawal
> <[email protected]> wrote:
>>
>> In the percpu freelist code, it is a common pattern to iterate over
>> the possible CPUs mask starting with the current CPU. The pattern is
>> implemented using a hand rolled while loop with the loop variable
>> increment being open-coded.
>>
>> Simplify the code by using for_each_cpu_wrap() helper to iterate over
>> the possible cpus starting with the current CPU. As a result, some of
>> the special-casing in the loop also gets simplified.
>>
>> No functional change intended.
>>
>> Signed-off-by: Punit Agrawal <[email protected]>
>> ---
>> v1 -> v2:
>> * Fixed the incorrect transformation changing semantics of __pcpu_freelist_push_nmi()
>>
>> Previous version -
>> v1: https://lore.kernel.org/all/[email protected]/
>>
>> kernel/bpf/percpu_freelist.c | 48 ++++++++++++------------------------
>> 1 file changed, 16 insertions(+), 32 deletions(-)
>>
>> diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
>> index 00b874c8e889..b6e7f5c5b9ab 100644
>> --- a/kernel/bpf/percpu_freelist.c
>> +++ b/kernel/bpf/percpu_freelist.c
>> @@ -58,23 +58,21 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
>> {
>> int cpu, orig_cpu;
>>
>> - orig_cpu = cpu = raw_smp_processor_id();
>> + orig_cpu = raw_smp_processor_id();
>> while (1) {
>> - struct pcpu_freelist_head *head;
>> + for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
>> + struct pcpu_freelist_head *head;
>>
>> - head = per_cpu_ptr(s->freelist, cpu);
>> - if (raw_spin_trylock(&head->lock)) {
>> - pcpu_freelist_push_node(head, node);
>> - raw_spin_unlock(&head->lock);
>> - return;
>> + head = per_cpu_ptr(s->freelist, cpu);
>> + if (raw_spin_trylock(&head->lock)) {
>> + pcpu_freelist_push_node(head, node);
>> + raw_spin_unlock(&head->lock);
>> + return;
>> + }
>> }
>> - cpu = cpumask_next(cpu, cpu_possible_mask);
>> - if (cpu >= nr_cpu_ids)
>> - cpu = 0;
>
> I personally don't like nested loops here. Maybe we can keep
> the original while loop and use cpumask_next_wrap()?

Out of curiosity, is there a reason to avoid nesting here? The nested
loop avoids the "cpu == orig_cpu" unnecessary check every iteration.

As suggested, it's possible to use cpumask_next_wrap() like below -

diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 00b874c8e889..19e8eab70c40 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -68,9 +68,7 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
raw_spin_unlock(&head->lock);
return;
}
- cpu = cpumask_next(cpu, cpu_possible_mask);
- if (cpu >= nr_cpu_ids)
- cpu = 0;
+ cpu = cpumask_next_wrap(cpu, cpu_possible_mask, orig_cpu, false);

/* cannot lock any per cpu lock, try extralist */
if (cpu == orig_cpu &&

I can send an updated patch if this is preferred.

Thanks,
Punit

2022-09-08 20:34:13

by Song Liu

[permalink] [raw]

Subject: Re: Re: [PATCH v2] bpf: Simplify code by using for_each_cpu_wrap()

On Thu, Sep 8, 2022 at 3:45 AM Punit Agrawal
<[email protected]> wrote:
>
> Hi Song,
>
> Thanks for taking a look.
>
> Song Liu <[email protected]> writes:
>
> > On Wed, Sep 7, 2022 at 8:58 AM Punit Agrawal
> > <[email protected]> wrote:
> >>
> >> In the percpu freelist code, it is a common pattern to iterate over
> >> the possible CPUs mask starting with the current CPU. The pattern is
> >> implemented using a hand rolled while loop with the loop variable
> >> increment being open-coded.
> >>
> >> Simplify the code by using for_each_cpu_wrap() helper to iterate over
> >> the possible cpus starting with the current CPU. As a result, some of
> >> the special-casing in the loop also gets simplified.
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Punit Agrawal <[email protected]>
> >> ---
> >> v1 -> v2:
> >> * Fixed the incorrect transformation changing semantics of __pcpu_freelist_push_nmi()
> >>
> >> Previous version -
> >> v1: https://lore.kernel.org/all/[email protected]/
> >>
> >> kernel/bpf/percpu_freelist.c | 48 ++++++++++++------------------------
> >> 1 file changed, 16 insertions(+), 32 deletions(-)
> >>
> >> diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
> >> index 00b874c8e889..b6e7f5c5b9ab 100644
> >> --- a/kernel/bpf/percpu_freelist.c
> >> +++ b/kernel/bpf/percpu_freelist.c
> >> @@ -58,23 +58,21 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
> >> {
> >> int cpu, orig_cpu;
> >>
> >> - orig_cpu = cpu = raw_smp_processor_id();
> >> + orig_cpu = raw_smp_processor_id();
> >> while (1) {
> >> - struct pcpu_freelist_head *head;
> >> + for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
> >> + struct pcpu_freelist_head *head;
> >>
> >> - head = per_cpu_ptr(s->freelist, cpu);
> >> - if (raw_spin_trylock(&head->lock)) {
> >> - pcpu_freelist_push_node(head, node);
> >> - raw_spin_unlock(&head->lock);
> >> - return;
> >> + head = per_cpu_ptr(s->freelist, cpu);
> >> + if (raw_spin_trylock(&head->lock)) {
> >> + pcpu_freelist_push_node(head, node);
> >> + raw_spin_unlock(&head->lock);
> >> + return;
> >> + }
> >> }
> >> - cpu = cpumask_next(cpu, cpu_possible_mask);
> >> - if (cpu >= nr_cpu_ids)
> >> - cpu = 0;
> >
> > I personally don't like nested loops here. Maybe we can keep
> > the original while loop and use cpumask_next_wrap()?
>
> Out of curiosity, is there a reason to avoid nesting here? The nested
> loop avoids the "cpu == orig_cpu" unnecessary check every iteration.

for_each_cpu_wrap is a more complex loop, so we are using some
checks either way.

OTOH, the nesting is not too deep (two loops then one if), so I guess
current version is fine.

Acked-by: Song Liu <[email protected]>

>
> As suggested, it's possible to use cpumask_next_wrap() like below -
>
> diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
> index 00b874c8e889..19e8eab70c40 100644
> --- a/kernel/bpf/percpu_freelist.c
> +++ b/kernel/bpf/percpu_freelist.c
> @@ -68,9 +68,7 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
> raw_spin_unlock(&head->lock);
> return;
> }
> - cpu = cpumask_next(cpu, cpu_possible_mask);
> - if (cpu >= nr_cpu_ids)
> - cpu = 0;
> + cpu = cpumask_next_wrap(cpu, cpu_possible_mask, orig_cpu, false);
>
> /* cannot lock any per cpu lock, try extralist */
> if (cpu == orig_cpu &&
>
>
> I can send an updated patch if this is preferred.
>
> Thanks,
> Punit

2022-09-09 09:23:58

by Punit Agrawal

[permalink] [raw]

Subject: Re: [External] Re: Re: [PATCH v2] bpf: Simplify code by using for_each_cpu_wrap()

Song Liu <[email protected]> writes:

> On Thu, Sep 8, 2022 at 3:45 AM Punit Agrawal
> <[email protected]> wrote:
>>
>> Hi Song,
>>
>> Thanks for taking a look.
>>
>> Song Liu <[email protected]> writes:
>>
>> > On Wed, Sep 7, 2022 at 8:58 AM Punit Agrawal
>> > <[email protected]> wrote:
>> >>
>> >> In the percpu freelist code, it is a common pattern to iterate over
>> >> the possible CPUs mask starting with the current CPU. The pattern is
>> >> implemented using a hand rolled while loop with the loop variable
>> >> increment being open-coded.
>> >>
>> >> Simplify the code by using for_each_cpu_wrap() helper to iterate over
>> >> the possible cpus starting with the current CPU. As a result, some of
>> >> the special-casing in the loop also gets simplified.
>> >>
>> >> No functional change intended.
>> >>
>> >> Signed-off-by: Punit Agrawal <[email protected]>
>> >> ---
>> >> v1 -> v2:
>> >> * Fixed the incorrect transformation changing semantics of __pcpu_freelist_push_nmi()
>> >>
>> >> Previous version -
>> >> v1: https://lore.kernel.org/all/[email protected]/
>> >>
>> >> kernel/bpf/percpu_freelist.c | 48 ++++++++++++------------------------
>> >> 1 file changed, 16 insertions(+), 32 deletions(-)
>> >>
>> >> diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
>> >> index 00b874c8e889..b6e7f5c5b9ab 100644
>> >> --- a/kernel/bpf/percpu_freelist.c
>> >> +++ b/kernel/bpf/percpu_freelist.c
>> >> @@ -58,23 +58,21 @@ static inline void ___pcpu_freelist_push_nmi(struct pcpu_freelist *s,
>> >> {
>> >> int cpu, orig_cpu;
>> >>
>> >> - orig_cpu = cpu = raw_smp_processor_id();
>> >> + orig_cpu = raw_smp_processor_id();
>> >> while (1) {
>> >> - struct pcpu_freelist_head *head;
>> >> + for_each_cpu_wrap(cpu, cpu_possible_mask, orig_cpu) {
>> >> + struct pcpu_freelist_head *head;
>> >>
>> >> - head = per_cpu_ptr(s->freelist, cpu);
>> >> - if (raw_spin_trylock(&head->lock)) {
>> >> - pcpu_freelist_push_node(head, node);
>> >> - raw_spin_unlock(&head->lock);
>> >> - return;
>> >> + head = per_cpu_ptr(s->freelist, cpu);
>> >> + if (raw_spin_trylock(&head->lock)) {
>> >> + pcpu_freelist_push_node(head, node);
>> >> + raw_spin_unlock(&head->lock);
>> >> + return;
>> >> + }
>> >> }
>> >> - cpu = cpumask_next(cpu, cpu_possible_mask);
>> >> - if (cpu >= nr_cpu_ids)
>> >> - cpu = 0;
>> >
>> > I personally don't like nested loops here. Maybe we can keep
>> > the original while loop and use cpumask_next_wrap()?
>>
>> Out of curiosity, is there a reason to avoid nesting here? The nested
>> loop avoids the "cpu == orig_cpu" unnecessary check every iteration.
>
> for_each_cpu_wrap is a more complex loop, so we are using some
> checks either way.

That's true, indeed. While putting the patch together I wondering about
the need for a simpler / optimized version of for_each_cpu_wrap().

> OTOH, the nesting is not too deep (two loops then one if), so I guess
> current version is fine.
>
> Acked-by: Song Liu <[email protected]>
>

Thanks!

[...]

2022-09-10 23:53:22

by patchwork-bot+netdevbpf

[permalink] [raw]

Subject: Re: [PATCH v2] bpf: Simplify code by using for_each_cpu_wrap()

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <[email protected]>:

On Wed, 7 Sep 2022 16:57:46 +0100 you wrote:
> In the percpu freelist code, it is a common pattern to iterate over
> the possible CPUs mask starting with the current CPU. The pattern is
> implemented using a hand rolled while loop with the loop variable
> increment being open-coded.
>
> Simplify the code by using for_each_cpu_wrap() helper to iterate over
> the possible cpus starting with the current CPU. As a result, some of
> the special-casing in the loop also gets simplified.
>
> [...]

Here is the summary with links:
- [v2] bpf: Simplify code by using for_each_cpu_wrap()
https://git.kernel.org/bpf/bpf-next/c/57c92f11a215

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html