LinuxLists.cc - [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init

2023-12-12 04:21:32

Subject: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()

The loop starts from the beginning every time we switch to the next
sibling mask. This is the Schlemiel the Painter's style of coding
because we know for sure that nmsk is clear up to current CPU, and we
can just continue from the next CPU.

Also, we can do it nicer if leverage the dedicated for_each() iterator,
and simplify the logic of clearing a bit in nmsk.

Signed-off-by: Yury Norov <[email protected]>
---
lib/group_cpus.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/group_cpus.c b/lib/group_cpus.c
index ee272c4cefcc..10dead3ab0e0 100644
--- a/lib/group_cpus.c
+++ b/lib/group_cpus.c
@@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,

/* If the cpu has siblings, use them first */
siblmsk = topology_sibling_cpumask(cpu);
- for (sibl = -1; cpus_per_grp > 0; ) {
- sibl = cpumask_next(sibl, siblmsk);
- if (sibl >= nr_cpu_ids)
- break;
- if (!cpumask_test_and_clear_cpu(sibl, nmsk))
- continue;
+ sibl = cpu + 1;
+
+ for_each_cpu_and_from(sibl, siblmsk, nmsk) {
+ cpumask_clear_cpu(sibl, nmsk);
cpumask_set_cpu(sibl, irqmsk);
- cpus_per_grp--;
+ if (cpus_per_grp-- == 0)
+ return;
}
}
}
--
2.40.1

2023-12-12 09:47:23

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()

On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> The loop starts from the beginning every time we switch to the next
> sibling mask. This is the Schlemiel the Painter's style of coding
> because we know for sure that nmsk is clear up to current CPU, and we
> can just continue from the next CPU.
>
> Also, we can do it nicer if leverage the dedicated for_each() iterator,
> and simplify the logic of clearing a bit in nmsk.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> lib/group_cpus.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index ee272c4cefcc..10dead3ab0e0 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
>
> /* If the cpu has siblings, use them first */
> siblmsk = topology_sibling_cpumask(cpu);
> - for (sibl = -1; cpus_per_grp > 0; ) {
> - sibl = cpumask_next(sibl, siblmsk);
> - if (sibl >= nr_cpu_ids)
> - break;
> - if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> - continue;
> + sibl = cpu + 1;

It doesn't have to 'cpu + 1', see below comment.

> +
> + for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> + cpumask_clear_cpu(sibl, nmsk);
> cpumask_set_cpu(sibl, irqmsk);
> - cpus_per_grp--;
> + if (cpus_per_grp-- == 0)

if (--cpus_per_grp == 0)

> + return;

I think for_each_cpu_and() should work just fine, cause cpu has been cleared
from nmsk, so the change can be something like, then patch 1 isn't
necessary.

for_each_cpu_and(sibl, siblmsk, nmsk) {
cpumask_clear_cpu(sibl, nmsk);
cpumask_set_cpu(sibl, irqmsk);
if (--cpus_per_grp == 0)
return;
}

Thanks,
Ming

2023-12-12 17:04:45

by Yury Norov

[permalink] [raw]

Subject: Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()

On Tue, Dec 12, 2023 at 05:46:53PM +0800, Ming Lei wrote:
> On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> > The loop starts from the beginning every time we switch to the next
> > sibling mask. This is the Schlemiel the Painter's style of coding
> > because we know for sure that nmsk is clear up to current CPU, and we
> > can just continue from the next CPU.
> >
> > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > and simplify the logic of clearing a bit in nmsk.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > lib/group_cpus.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > index ee272c4cefcc..10dead3ab0e0 100644
> > --- a/lib/group_cpus.c
> > +++ b/lib/group_cpus.c
> > @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> >
> > /* If the cpu has siblings, use them first */
> > siblmsk = topology_sibling_cpumask(cpu);
> > - for (sibl = -1; cpus_per_grp > 0; ) {
> > - sibl = cpumask_next(sibl, siblmsk);
> > - if (sibl >= nr_cpu_ids)
> > - break;
> > - if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > - continue;
> > + sibl = cpu + 1;
>
> It doesn't have to 'cpu + 1', see below comment.
>
> > +
> > + for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > + cpumask_clear_cpu(sibl, nmsk);
> > cpumask_set_cpu(sibl, irqmsk);
> > - cpus_per_grp--;
> > + if (cpus_per_grp-- == 0)
>
> if (--cpus_per_grp == 0)

That's right, I'll send a new version this weekend.

> > + return;
>
> I think for_each_cpu_and() should work just fine, cause cpu has been cleared
> from nmsk, so the change can be something like, then patch 1 isn't
> necessary.

It works just fine except that it's O(N^2), where O(N) is easily
achievable. Again, it's not about performance, it's about coding
habits.

> for_each_cpu_and(sibl, siblmsk, nmsk) {
> cpumask_clear_cpu(sibl, nmsk);
> cpumask_set_cpu(sibl, irqmsk);
> if (--cpus_per_grp == 0)
> return;
> }
>
>
> Thanks,
> Ming

2023-12-13 00:07:42

by Ming Lei

[permalink] [raw]

Subject: Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()

On Tue, Dec 12, 2023 at 09:04:19AM -0800, Yury Norov wrote:
> On Tue, Dec 12, 2023 at 05:46:53PM +0800, Ming Lei wrote:
> > On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> > > The loop starts from the beginning every time we switch to the next
> > > sibling mask. This is the Schlemiel the Painter's style of coding
> > > because we know for sure that nmsk is clear up to current CPU, and we
> > > can just continue from the next CPU.
> > >
> > > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > > and simplify the logic of clearing a bit in nmsk.
> > >
> > > Signed-off-by: Yury Norov <[email protected]>
> > > ---
> > > lib/group_cpus.c | 13 ++++++-------
> > > 1 file changed, 6 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > index ee272c4cefcc..10dead3ab0e0 100644
> > > --- a/lib/group_cpus.c
> > > +++ b/lib/group_cpus.c
> > > @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > >
> > > /* If the cpu has siblings, use them first */
> > > siblmsk = topology_sibling_cpumask(cpu);
> > > - for (sibl = -1; cpus_per_grp > 0; ) {
> > > - sibl = cpumask_next(sibl, siblmsk);
> > > - if (sibl >= nr_cpu_ids)
> > > - break;
> > > - if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > > - continue;
> > > + sibl = cpu + 1;
> >
> > It doesn't have to 'cpu + 1', see below comment.
> >
> > > +
> > > + for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > + cpumask_clear_cpu(sibl, nmsk);
> > > cpumask_set_cpu(sibl, irqmsk);
> > > - cpus_per_grp--;
> > > + if (cpus_per_grp-- == 0)
> >
> > if (--cpus_per_grp == 0)
>
> That's right, I'll send a new version this weekend.
>
> > > + return;
> >
> > I think for_each_cpu_and() should work just fine, cause cpu has been cleared
> > from nmsk, so the change can be something like, then patch 1 isn't
> > necessary.
>
> It works just fine except that it's O(N^2), where O(N) is easily
> achievable. Again, it's not about performance, it's about coding
> habits.

Both for_each_cpu_and() and for_each_cpu_and_from() are O(N), aren't
they? Given both two are based on find_next_and_bit().

for_each_cpu_and() is simpler and more readable, and more
importantly, we can save one single-user public helper.

>
> > for_each_cpu_and(sibl, siblmsk, nmsk) {
> > cpumask_clear_cpu(sibl, nmsk);
> > cpumask_set_cpu(sibl, irqmsk);
> > if (--cpus_per_grp == 0)
> > return;
> > }

Thanks,
Ming

2023-12-25 17:38:26

by Yury Norov

[permalink] [raw]

Subject: Re: [PATCH v3 2/7] lib/group_cpus: optimize inner loop in grp_spread_init_one()

On Wed, Dec 13, 2023 at 08:06:48AM +0800, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 09:04:19AM -0800, Yury Norov wrote:
> > On Tue, Dec 12, 2023 at 05:46:53PM +0800, Ming Lei wrote:
> > > On Mon, Dec 11, 2023 at 08:21:02PM -0800, Yury Norov wrote:
> > > > The loop starts from the beginning every time we switch to the next
> > > > sibling mask. This is the Schlemiel the Painter's style of coding
> > > > because we know for sure that nmsk is clear up to current CPU, and we
> > > > can just continue from the next CPU.
> > > >
> > > > Also, we can do it nicer if leverage the dedicated for_each() iterator,
> > > > and simplify the logic of clearing a bit in nmsk.
> > > >
> > > > Signed-off-by: Yury Norov <[email protected]>
> > > > ---
> > > > lib/group_cpus.c | 13 ++++++-------
> > > > 1 file changed, 6 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> > > > index ee272c4cefcc..10dead3ab0e0 100644
> > > > --- a/lib/group_cpus.c
> > > > +++ b/lib/group_cpus.c
> > > > @@ -30,14 +30,13 @@ static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk,
> > > >
> > > > /* If the cpu has siblings, use them first */
> > > > siblmsk = topology_sibling_cpumask(cpu);
> > > > - for (sibl = -1; cpus_per_grp > 0; ) {
> > > > - sibl = cpumask_next(sibl, siblmsk);
> > > > - if (sibl >= nr_cpu_ids)
> > > > - break;
> > > > - if (!cpumask_test_and_clear_cpu(sibl, nmsk))
> > > > - continue;
> > > > + sibl = cpu + 1;
> > >
> > > It doesn't have to 'cpu + 1', see below comment.
> > >
> > > > +
> > > > + for_each_cpu_and_from(sibl, siblmsk, nmsk) {
> > > > + cpumask_clear_cpu(sibl, nmsk);
> > > > cpumask_set_cpu(sibl, irqmsk);
> > > > - cpus_per_grp--;
> > > > + if (cpus_per_grp-- == 0)
> > >
> > > if (--cpus_per_grp == 0)
> >
> > That's right, I'll send a new version this weekend.
> >
> > > > + return;
> > >
> > > I think for_each_cpu_and() should work just fine, cause cpu has been cleared
> > > from nmsk, so the change can be something like, then patch 1 isn't
> > > necessary.
> >
> > It works just fine except that it's O(N^2), where O(N) is easily
> > achievable. Again, it's not about performance, it's about coding
> > habits.
>
> Both for_each_cpu_and() and for_each_cpu_and_from() are O(N), aren't
> they? Given both two are based on find_next_and_bit().

for_each_cpu_and() is the same Schlemiel the Painter's code, as the
plain for() was.

> for_each_cpu_and() is simpler and more readable, and more
> importantly, we can save one single-user public helper.
>
> >
> > > for_each_cpu_and(sibl, siblmsk, nmsk) {
> > > cpumask_clear_cpu(sibl, nmsk);
> > > cpumask_set_cpu(sibl, irqmsk);
> > > if (--cpus_per_grp == 0)
> > > return;
> > > }
>
>
> Thanks,
> Ming