2024-05-13 22:02:02

by Yury Norov

[permalink] [raw]
Subject: [PATCH 0/6] bitmap: optimize API usage

In a few places bitmap API is called with such a combination of
parameters that makes the call unneeded, or there's a trivial cheaper
alternative.

For example, cpumask_copy(dst, src) where dst == src is simply a no-op.
This series addresses such cases spotted on x86_64 with LTP.

All the patches are independent and may be applied separately in
corresponding subsystems. Or I can take them in bitmap branch, if it's
more convenient.

Yury Norov (6):
smp: optimize smp_call_function_many_cond()
sched/topology: optimize topology_span_sane()
driver core: cpu: optimize print_cpus_isolated()
genirq: optimze irq_do_set_affinity()
cgroup/cpuset: optimize cpuset_mems_allowed_intersects()
tick/common: optimize cpumask_equal() usage

drivers/base/cpu.c | 6 ++++--
kernel/cgroup/cpuset.c | 3 +++
kernel/irq/manage.c | 3 ++-
kernel/sched/topology.c | 2 +-
kernel/smp.c | 5 ++++-
kernel/time/tick-common.c | 15 +++++++++++----
6 files changed, 25 insertions(+), 9 deletions(-)

--
2.40.1



2024-05-13 22:02:18

by Yury Norov

[permalink] [raw]
Subject: [PATCH 1/6] smp: optimize smp_call_function_many_cond()

The function may be called with mask == cpu_online_mask, and in this
case we can use a cheaper cpumask_cooy() instead of cpumask_and().

Signed-off-by: Yury Norov <[email protected]>
---
kernel/smp.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index f085ebcdf9e7..6f41214a1b54 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -798,7 +798,10 @@ static void smp_call_function_many_cond(const struct cpumask *mask,

if (run_remote) {
cfd = this_cpu_ptr(&cfd_data);
- cpumask_and(cfd->cpumask, mask, cpu_online_mask);
+ if (mask == cpu_online_mask)
+ cpumask_copy(cfd->cpumask, mask);
+ else
+ cpumask_and(cfd->cpumask, mask, cpu_online_mask);
__cpumask_clear_cpu(this_cpu, cfd->cpumask);

cpumask_clear(cfd->cpumask_ipi);
--
2.40.1


2024-05-13 22:02:22

by Yury Norov

[permalink] [raw]
Subject: [PATCH 2/6] sched/topology: optimize topology_span_sane()

The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
even though cpu != i. In such case, cpumask_equal() would always return
true, and we can proceed to the next CPU immediately.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/sched/topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 99ea5986038c..eb9eb17b0efa 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2360,7 +2360,7 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
* breaks the linking done for an earlier span.
*/
for_each_cpu(i, cpu_map) {
- if (i == cpu)
+ if (i == cpu || tl->mask(cpu) == tl->mask(i))
continue;
/*
* We should 'and' all those masks with 'cpu_map' to exactly
--
2.40.1


2024-05-13 22:02:38

by Yury Norov

[permalink] [raw]
Subject: [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated()

The function may be called with housekeeping_cpumask == cpu_possible_mask,
and in such case the 'isolated' cpumask would be just empty.

We can call cpumask_clear() in that case, and save CPU cycles.

Signed-off-by: Yury Norov <[email protected]>
---
drivers/base/cpu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 56fba44ba391..1322957286dd 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -282,8 +282,10 @@ static ssize_t print_cpus_isolated(struct device *dev,
if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
return -ENOMEM;

- cpumask_andnot(isolated, cpu_possible_mask,
- housekeeping_cpumask(HK_TYPE_DOMAIN));
+ if (cpu_possible_mask != housekeeping_cpumask(HK_TYPE_DOMAIN))
+ cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
+ else
+ cpumask_clear(isolated);
len = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated));

free_cpumask_var(isolated);
--
2.40.1


2024-05-13 22:03:27

by Yury Norov

[permalink] [raw]
Subject: [PATCH 6/6] tick/common: optimize cpumask_equal() usage

Some functions in the file call cpumask_equal() with src1p == src2p.
We can obviously just skip comparison entirely in this case.

This patch fixes cpumask_equal invocations when boot-test or LTP detect
such condition.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/time/tick-common.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index d88b13076b79..b31fef292833 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
* When the device is not per cpu, pin the interrupt to the
* current cpu:
*/
- if (!cpumask_equal(newdev->cpumask, cpumask))
+ if (newdev->cpumask != cpumask &&
+ !cpumask_equal(newdev->cpumask, cpumask))
irq_set_affinity(newdev->irq, cpumask);

/*
@@ -288,14 +289,19 @@ static bool tick_check_percpu(struct clock_event_device *curdev,
{
if (!cpumask_test_cpu(cpu, newdev->cpumask))
return false;
- if (cpumask_equal(newdev->cpumask, cpumask_of(cpu)))
+ if (newdev->cpumask == cpumask_of(cpu) ||
+ cpumask_equal(newdev->cpumask, cpumask_of(cpu)))
return true;
/* Check if irq affinity can be set */
if (newdev->irq >= 0 && !irq_can_set_affinity(newdev->irq))
return false;
/* Prefer an existing cpu local device */
- if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
+ if (!curdev)
+ return true;
+ if (curdev->cpumask == cpumask_of(cpu) ||
+ cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
return false;
+
return true;
}

@@ -316,7 +322,8 @@ static bool tick_check_preferred(struct clock_event_device *curdev,
*/
return !curdev ||
newdev->rating > curdev->rating ||
- !cpumask_equal(curdev->cpumask, newdev->cpumask);
+ !(curdev->cpumask == newdev->cpumask &&
+ cpumask_equal(curdev->cpumask, newdev->cpumask));
}

/*
--
2.40.1


2024-05-13 22:04:34

by Yury Norov

[permalink] [raw]
Subject: [PATCH 4/6] genirq: optimize irq_do_set_affinity()

If mask == desc->irq_common_data.affinity, copying one to another is
useless, and we can just skip it.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/irq/manage.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index bf9ae8a8686f..ad9ed9fdf919 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -285,7 +285,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
switch (ret) {
case IRQ_SET_MASK_OK:
case IRQ_SET_MASK_OK_DONE:
- cpumask_copy(desc->irq_common_data.affinity, mask);
+ if (desc->irq_common_data.affinity != mask)
+ cpumask_copy(desc->irq_common_data.affinity, mask);
fallthrough;
case IRQ_SET_MASK_OK_NOCOPY:
irq_validate_effective_affinity(data);
--
2.40.1


2024-05-13 22:05:07

by Yury Norov

[permalink] [raw]
Subject: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()

If the function is called with tsk1 == tsk2, we know for sure that their
mems_allowed nodes do intersect, and so we can return immediately instead
of checking the nodes content.

Signed-off-by: Yury Norov <[email protected]>
---
kernel/cgroup/cpuset.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4237c8748715..47ed206d4890 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
const struct task_struct *tsk2)
{
+ if (tsk1 == tsk2)
+ return 1;
+
return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
}

--
2.40.1


2024-05-14 08:31:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage

On Mon, May 13 2024 at 15:01, Yury Norov wrote:
> Some functions in the file call cpumask_equal() with src1p == src2p.
> We can obviously just skip comparison entirely in this case.
>
> This patch fixes cpumask_equal invocations when boot-test or LTP detect
> such condition.

Please write your changelogs in imperative mood.

git grep 'This patch' Documentation/process/

Also please see Documentation/process/maintainer-tip.rst

> Signed-off-by: Yury Norov <[email protected]>
> ---
> kernel/time/tick-common.c | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index d88b13076b79..b31fef292833 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
> * When the device is not per cpu, pin the interrupt to the
> * current cpu:
> */
> - if (!cpumask_equal(newdev->cpumask, cpumask))
> + if (newdev->cpumask != cpumask &&
> + !cpumask_equal(newdev->cpumask, cpumask))
> irq_set_affinity(newdev->irq, cpumask);

I'm not seeing the benefit. This is slow path setup code so the extra
comparison does not really buy anything aside of malformatted line
breaks.

Thanks,

tglx

2024-05-14 08:43:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage

On Tue, May 14 2024 at 10:31, Thomas Gleixner wrote:
> On Mon, May 13 2024 at 15:01, Yury Norov wrote:
>> Some functions in the file call cpumask_equal() with src1p == src2p.
>> We can obviously just skip comparison entirely in this case.
>>
>> This patch fixes cpumask_equal invocations when boot-test or LTP detect
>> such condition.
>
> Please write your changelogs in imperative mood.
>
> git grep 'This patch' Documentation/process/
>
> Also please see Documentation/process/maintainer-tip.rst
>
>> Signed-off-by: Yury Norov <[email protected]>
>> ---
>> kernel/time/tick-common.c | 15 +++++++++++----
>> 1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
>> index d88b13076b79..b31fef292833 100644
>> --- a/kernel/time/tick-common.c
>> +++ b/kernel/time/tick-common.c
>> @@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
>> * When the device is not per cpu, pin the interrupt to the
>> * current cpu:
>> */
>> - if (!cpumask_equal(newdev->cpumask, cpumask))
>> + if (newdev->cpumask != cpumask &&
>> + !cpumask_equal(newdev->cpumask, cpumask))
>> irq_set_affinity(newdev->irq, cpumask);
>
> I'm not seeing the benefit. This is slow path setup code so the extra
> comparison does not really buy anything aside of malformatted line
> breaks.

Instead of sprinkling these conditional all over the place, can't you
just do the obvious and check for ptr1 == ptr2 in bitmap_copy() and
bitmap_equal()?

Thanks,

tglx

2024-05-14 16:19:01

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 4/6] genirq: optimize irq_do_set_affinity()

On Tue, May 14, 2024 at 5:51 AM Jinjie Ruan <[email protected]> wrote:
>
>
>
> On 2024/5/14 6:01, Yury Norov wrote:
> > If mask == desc->irq_common_data.affinity, copying one to another is
> > useless, and we can just skip it.
> >
> > Signed-off-by: Yury Norov <[email protected]>
> > ---
> > kernel/irq/manage.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> > index bf9ae8a8686f..ad9ed9fdf919 100644
> > --- a/kernel/irq/manage.c
> > +++ b/kernel/irq/manage.c
> > @@ -285,7 +285,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> > switch (ret) {
> > case IRQ_SET_MASK_OK:
> > case IRQ_SET_MASK_OK_DONE:
> > - cpumask_copy(desc->irq_common_data.affinity, mask);
> > + if (desc->irq_common_data.affinity != mask)
> > + cpumask_copy(desc->irq_common_data.affinity, mask);
>
> It seems that mask is a pointer, shouldn't use "cpumask_equal"?

cpumask_equal() is O(N), just as cpumask_copy(), so we'll have no
benefit if the masks are equal, and will double slow it if they aren't
in the worst case.

On the other hand, pointers comparison is O(1), a very quick tasks,
even more the pointers are already in registers.

> > fallthrough;
> > case IRQ_SET_MASK_OK_NOCOPY:
> > irq_validate_effective_affinity(data);

2024-05-14 16:48:05

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage

On Tue, May 14, 2024 at 1:42 AM Thomas Gleixner <[email protected]> wrote:
>
> On Tue, May 14 2024 at 10:31, Thomas Gleixner wrote:
> > On Mon, May 13 2024 at 15:01, Yury Norov wrote:
> >> Some functions in the file call cpumask_equal() with src1p == src2p.
> >> We can obviously just skip comparison entirely in this case.
> >>
> >> This patch fixes cpumask_equal invocations when boot-test or LTP detect
> >> such condition.
> >
> > Please write your changelogs in imperative mood.
> >
> > git grep 'This patch' Documentation/process/
> >
> > Also please see Documentation/process/maintainer-tip.rst
> >
> >> Signed-off-by: Yury Norov <[email protected]>
> >> ---
> >> kernel/time/tick-common.c | 15 +++++++++++----
> >> 1 file changed, 11 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> >> index d88b13076b79..b31fef292833 100644
> >> --- a/kernel/time/tick-common.c
> >> +++ b/kernel/time/tick-common.c
> >> @@ -253,7 +253,8 @@ static void tick_setup_device(struct tick_device *td,
> >> * When the device is not per cpu, pin the interrupt to the
> >> * current cpu:
> >> */
> >> - if (!cpumask_equal(newdev->cpumask, cpumask))
> >> + if (newdev->cpumask != cpumask &&
> >> + !cpumask_equal(newdev->cpumask, cpumask))
> >> irq_set_affinity(newdev->irq, cpumask);
> >
> > I'm not seeing the benefit. This is slow path setup code so the extra
> > comparison does not really buy anything aside of malformatted line
> > breaks.
>
> Instead of sprinkling these conditional all over the place, can't you
> just do the obvious and check for ptr1 == ptr2 in bitmap_copy() and
> bitmap_equal()?

I proposed this a while (few years) ago, and it has been rejected. On
bitmaps level we decided not to do that for the reasons memcpy() and
memcmp() doesn't, and on cpumasks and nodemasks level it hasn't
been discussed at all.

Now that most of bitmap ops have inline and outline implementation,
we technically can move this checks in outline code, as inline bitmap
ops are very lightweight already.

So I see the following options:
- Implement these sanity checks in outline bitmap API (lib/bitmap.c);
- Implement them on cpumask and nodemask level; or
- add a new family of helpers that do this check, like
bitmap_copy_if_needed() (better name appreciated).

The argument against #1 and #2 these days was that memcpy() and
similarly bitmap_copy() with dst == src may be a sign of error, and
we don't want to add a code that optimizes for it.

Now, I ran the kernel through the LTP test and in practice all the
cases that I spot look pretty normal. So I can continue sprinkling
the checks once a few years, or do something like described above.

Can you / everyone please share your opinion?

Thanks,
Yury

2024-05-14 16:48:27

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()


On 5/13/24 18:01, Yury Norov wrote:
> If the function is called with tsk1 == tsk2, we know for sure that their
> mems_allowed nodes do intersect, and so we can return immediately instead
> of checking the nodes content.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> kernel/cgroup/cpuset.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 4237c8748715..47ed206d4890 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
> int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
> const struct task_struct *tsk2)
> {
> + if (tsk1 == tsk2)
> + return 1;
> +
> return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
> }
>
Reviewed-by: Waiman Long <[email protected]>


2024-05-14 16:50:35

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()

On Tue, May 14, 2024 at 12:47:59PM -0400, Waiman Long wrote:
> > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> > index 4237c8748715..47ed206d4890 100644
> > --- a/kernel/cgroup/cpuset.c
> > +++ b/kernel/cgroup/cpuset.c
> > @@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
> > int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
> > const struct task_struct *tsk2)
> > {
> > + if (tsk1 == tsk2)
> > + return 1;
> > +
> > return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
> > }
> Reviewed-by: Waiman Long <[email protected]>

I'm not sure this is worth the added code.

Thanks.

--
tejun

2024-05-14 16:55:59

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 5/6] cgroup/cpuset: optimize cpuset_mems_allowed_intersects()


On 5/14/24 12:50, Tejun Heo wrote:
> On Tue, May 14, 2024 at 12:47:59PM -0400, Waiman Long wrote:
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 4237c8748715..47ed206d4890 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -5010,6 +5010,9 @@ EXPORT_SYMBOL_GPL(cpuset_mem_spread_node);
>>> int cpuset_mems_allowed_intersects(const struct task_struct *tsk1,
>>> const struct task_struct *tsk2)
>>> {
>>> + if (tsk1 == tsk2)
>>> + return 1;
>>> +
>>> return nodes_intersects(tsk1->mems_allowed, tsk2->mems_allowed);
>>> }
>> Reviewed-by: Waiman Long <[email protected]>
> I'm not sure this is worth the added code.

Yes, I do have a second thought afterward. The only caller of
cpuset_mems_allowed_intersects() is oom_cpuset_eligible() in
mm/oom_kill.c. So I believe a better fix is to avoid calling
cpuset_mems_allowed_intersects() there if current == tsk.

Cheers,
Longman


2024-05-14 20:48:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 6/6] tick/common: optimize cpumask_equal() usage

On Tue, May 14 2024 at 09:47, Yury Norov wrote:
>> Instead of sprinkling these conditional all over the place, can't you
>> just do the obvious and check for ptr1 == ptr2 in bitmap_copy() and
>> bitmap_equal()?
>
> I proposed this a while (few years) ago, and it has been rejected. On
> bitmaps level we decided not to do that for the reasons memcpy() and
> memcmp() doesn't, and on cpumasks and nodemasks level it hasn't
> been discussed at all.
>
> Now that most of bitmap ops have inline and outline implementation,
> we technically can move this checks in outline code, as inline bitmap
> ops are very lightweight already.
>
> So I see the following options:
> - Implement these sanity checks in outline bitmap API (lib/bitmap.c);
> - Implement them on cpumask and nodemask level; or
> - add a new family of helpers that do this check, like
> bitmap_copy_if_needed() (better name appreciated).
>
> The argument against #1 and #2 these days was that memcpy() and
> similarly bitmap_copy() with dst == src may be a sign of error, and
> we don't want to add a code that optimizes for it.

That's a fair argument.

> Now, I ran the kernel through the LTP test and in practice all the
> cases that I spot look pretty normal. So I can continue sprinkling
> the checks once a few years, or do something like described above.

I don't see these checks as valuable in most cases and I detest them as
they make the code harder to read.

Except for smp_call_function_many_cond() and to a lesser extent
irq_do_set_affinity() none of them you added really matters.

Though it might be worth to have helper functions which make it obvious
that the src == dst case is intentional.

Thanks,

tglx

2024-05-14 21:02:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 3/6] driver core: cpu: optimize print_cpus_isolated()

On Mon, May 13 2024 at 15:01, Yury Norov wrote:
> The function may be called with housekeeping_cpumask == cpu_possible_mask,

How so? There is no cpumask argument in the function signature. Can you
please be precise?

> and in such case the 'isolated' cpumask would be just empty.
>
> We can call cpumask_clear() in that case, and save CPU cycles.
>
> @@ -282,8 +282,10 @@ static ssize_t print_cpus_isolated(struct device *dev,
> if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
> return -ENOMEM;
>
> - cpumask_andnot(isolated, cpu_possible_mask,
> - housekeeping_cpumask(HK_TYPE_DOMAIN));
> + if (cpu_possible_mask != housekeeping_cpumask(HK_TYPE_DOMAIN))
> + cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
> + else
> + cpumask_clear(isolated);
> len = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated));
>
> free_cpumask_var(isolated);

Seriously? You need clear() to emit an empty string via %*pbl?

if (cpu_possible_mask != housekeeping_cpumask(HK_TYPE_DOMAIN)) {
if (!alloc_cpumask_var(&isolated, GFP_KERNEL))
return -ENOMEM;
cpumask_andnot(isolated, cpu_possible_mask, housekeeping_cpumask(HK_TYPE_DOMAIN));
len = sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(isolated));
free_cpumask_var(isolated);
} else {
len = sysfs_emit(buf, "\n");
}

That actually would make sense and spare way more CPU cycles, no?

Is it actually worth the larger text size? Not really convinced about that.

Thanks,

tglx

2024-05-14 21:30:10

by Christophe JAILLET

[permalink] [raw]
Subject: Re: [PATCH 2/6] sched/topology: optimize topology_span_sane()

Le 14/05/2024 à 00:01, Yury Norov a écrit :
> The function may call cpumask_equal with tl->mask(cpu) == tl->mask(i),
> even though cpu != i. In such case, cpumask_equal() would always return
> true, and we can proceed to the next CPU immediately.
>
> Signed-off-by: Yury Norov <[email protected]>
> ---
> kernel/sched/topology.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 99ea5986038c..eb9eb17b0efa 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2360,7 +2360,7 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
> * breaks the linking done for an earlier span.
> */
> for_each_cpu(i, cpu_map) {
> - if (i == cpu)
> + if (i == cpu || tl->mask(cpu) == tl->mask(i))
> continue;
> /*
> * We should 'and' all those masks with 'cpu_map' to exactly

Hi,

does it make sense to pre-compute tl->mask(cpu) outside the for_each_cpu()?

CJ