2023-02-13 12:45:53

by Laurent Dufour

[permalink] [raw]
Subject: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.

Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):

ltcden3-lp12:~ # ppc64_cpu --info
Core 0: 0* 1* 2* 3* 4 5 6 7
Core 1: 8* 9* 10* 11* 12* 13* 14* 15*

There is no SMT value in the kernel. It is possible to run unbalanced LPAR
with 2 threads for a CPU, 4 for another one, and 5 on the latest.

To work around this possibility, and assuming that the LPAR run with the
same number of threads for each CPU, which is the common case, the number
of active threads of the CPU doing the hot-plug operation is computed. Only
that number of threads will be activated for the newly added CPU.

This way on a LPAR running in SMT=4, newly added CPU will be running 4
threads, which is what a end user would expect.

Cc: Srikar Dronamraju <[email protected]>
Signed-off-by: Laurent Dufour <[email protected]>
---
arch/powerpc/platforms/pseries/hotplug-cpu.c | 24 ++++++++++++++++----
1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 090ae5a1e0f5..58a7c97fc475 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn)
{
int rc = 0;
unsigned int cpu;
- int len, nthreads, i;
+ int len, nthreads, i, smt;
const __be32 *intserv;
u32 thread;

@@ -392,6 +392,17 @@ static int dlpar_online_cpu(struct device_node *dn)

nthreads = len / sizeof(u32);

+ /*
+ * Compute the number of active threads for the current CPU, assuming
+ * the system is homogeus, we don't want to active more threads than the
+ * current SMT setting.
+ */
+ for (cpu = cpu_first_thread_sibling(raw_smp_processor_id()), smt = 0;
+ cpu <= cpu_last_thread_sibling(raw_smp_processor_id()); cpu++) {
+ if (cpu_online(cpu))
+ smt++;
+ }
+
cpu_maps_update_begin();
for (i = 0; i < nthreads; i++) {
thread = be32_to_cpu(intserv[i]);
@@ -400,10 +411,13 @@ static int dlpar_online_cpu(struct device_node *dn)
continue;
cpu_maps_update_done();
find_and_update_cpu_nid(cpu);
- rc = device_online(get_cpu_device(cpu));
- if (rc) {
- dlpar_offline_cpu(dn);
- goto out;
+ /* Don't active CPU over the current SMT setting */
+ if (smt-- > 0) {
+ rc = device_online(get_cpu_device(cpu));
+ if (rc) {
+ dlpar_offline_cpu(dn);
+ goto out;
+ }
}
cpu_maps_update_begin();

--
2.39.1



2023-02-13 14:47:34

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

Laurent Dufour <[email protected]> writes:
> When a new CPU is added, the kernel is activating all its threads. This
> leads to weird, but functional, result when adding CPU on a SMT 4 system
> for instance.
>
> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
> active (system has been booted with the 'smt-enabled=4' kernel option):
>
> ltcden3-lp12:~ # ppc64_cpu --info
> Core 0: 0* 1* 2* 3* 4 5 6 7
> Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
>
> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>
> To work around this possibility, and assuming that the LPAR run with the
> same number of threads for each CPU, which is the common case,

I am skeptical at best of baking that assumption into this code. Mixed
SMT modes within a partition doesn't strike me as an unreasonable
possibility for some use cases. And if that's wrong, then we should just
add a global smt value instead of using heuristics.

> the number
> of active threads of the CPU doing the hot-plug operation is computed. Only
> that number of threads will be activated for the newly added CPU.
>
> This way on a LPAR running in SMT=4, newly added CPU will be running 4
> threads, which is what a end user would expect.

I could see why most users would prefer this new behavior. But surely
some users have come to expect the existing behavior, which has been in
place for years, and developed workarounds that might be broken by this
change?

I would suggest that to handle this well, we need to give user space
more ability to tell the kernel what actions to take on added cores, on
an opt-in basis.

This could take the form of extending the DLPAR sysfs command set:

Option 1 - Add a flag that tells the kernel not to online any threads at
all; user space will online the desired threads later.

Option 2 - Add an option that tells the kernel which SMT mode to apply.

2023-02-13 15:04:48

by Michal Suchánek

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

Hello,

On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
> Laurent Dufour <[email protected]> writes:
> > When a new CPU is added, the kernel is activating all its threads. This
> > leads to weird, but functional, result when adding CPU on a SMT 4 system
> > for instance.
> >
> > Here the newly added CPU 1 has 8 threads while the other one has 4 threads
> > active (system has been booted with the 'smt-enabled=4' kernel option):
> >
> > ltcden3-lp12:~ # ppc64_cpu --info
> > Core 0: 0* 1* 2* 3* 4 5 6 7
> > Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
> >
> > There is no SMT value in the kernel. It is possible to run unbalanced LPAR
> > with 2 threads for a CPU, 4 for another one, and 5 on the latest.
> >
> > To work around this possibility, and assuming that the LPAR run with the
> > same number of threads for each CPU, which is the common case,
>
> I am skeptical at best of baking that assumption into this code. Mixed
> SMT modes within a partition doesn't strike me as an unreasonable
> possibility for some use cases. And if that's wrong, then we should just
> add a global smt value instead of using heuristics.
>
> > the number
> > of active threads of the CPU doing the hot-plug operation is computed. Only
> > that number of threads will be activated for the newly added CPU.
> >
> > This way on a LPAR running in SMT=4, newly added CPU will be running 4
> > threads, which is what a end user would expect.
>
> I could see why most users would prefer this new behavior. But surely
> some users have come to expect the existing behavior, which has been in
> place for years, and developed workarounds that might be broken by this
> change?
>
> I would suggest that to handle this well, we need to give user space
> more ability to tell the kernel what actions to take on added cores, on
> an opt-in basis.
>
> This could take the form of extending the DLPAR sysfs command set:
>
> Option 1 - Add a flag that tells the kernel not to online any threads at
> all; user space will online the desired threads later.
>
> Option 2 - Add an option that tells the kernel which SMT mode to apply.

powerpc-utils grew some drmgr hooks recently so maybe the policy can be
moved to userspace?

Thanks

Michal

2023-02-13 15:41:15

by Nathan Lynch

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

Michal Suchánek <[email protected]> writes:
> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>> Laurent Dufour <[email protected]> writes:
>> > When a new CPU is added, the kernel is activating all its threads. This
>> > leads to weird, but functional, result when adding CPU on a SMT 4 system
>> > for instance.
>> >
>> > Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>> > active (system has been booted with the 'smt-enabled=4' kernel option):
>> >
>> > ltcden3-lp12:~ # ppc64_cpu --info
>> > Core 0: 0* 1* 2* 3* 4 5 6 7
>> > Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
>> >
>> > There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>> > with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>> >
>> > To work around this possibility, and assuming that the LPAR run with the
>> > same number of threads for each CPU, which is the common case,
>>
>> I am skeptical at best of baking that assumption into this code. Mixed
>> SMT modes within a partition doesn't strike me as an unreasonable
>> possibility for some use cases. And if that's wrong, then we should just
>> add a global smt value instead of using heuristics.
>>
>> > the number
>> > of active threads of the CPU doing the hot-plug operation is computed. Only
>> > that number of threads will be activated for the newly added CPU.
>> >
>> > This way on a LPAR running in SMT=4, newly added CPU will be running 4
>> > threads, which is what a end user would expect.
>>
>> I could see why most users would prefer this new behavior. But surely
>> some users have come to expect the existing behavior, which has been in
>> place for years, and developed workarounds that might be broken by this
>> change?
>>
>> I would suggest that to handle this well, we need to give user space
>> more ability to tell the kernel what actions to take on added cores, on
>> an opt-in basis.
>>
>> This could take the form of extending the DLPAR sysfs command set:
>>
>> Option 1 - Add a flag that tells the kernel not to online any threads at
>> all; user space will online the desired threads later.
>>
>> Option 2 - Add an option that tells the kernel which SMT mode to apply.
>
> powerpc-utils grew some drmgr hooks recently so maybe the policy can be
> moved to userspace?

I'm not sure whether the hook mechanism would come into play, but yes, I
am suggesting that user space be given the option of overriding the
kernel's current behavior.

2023-02-14 15:33:35

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

On 13/02/2023 16:40:50, Nathan Lynch wrote:
> Michal Suchánek <[email protected]> writes:
>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>> Laurent Dufour <[email protected]> writes:
>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>> for instance.
>>>>
>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>
>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>> Core 0: 0* 1* 2* 3* 4 5 6 7
>>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
>>>>
>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>>>>
>>>> To work around this possibility, and assuming that the LPAR run with the
>>>> same number of threads for each CPU, which is the common case,
>>>
>>> I am skeptical at best of baking that assumption into this code. Mixed
>>> SMT modes within a partition doesn't strike me as an unreasonable
>>> possibility for some use cases. And if that's wrong, then we should just
>>> add a global smt value instead of using heuristics.
>>>
>>>> the number
>>>> of active threads of the CPU doing the hot-plug operation is computed. Only
>>>> that number of threads will be activated for the newly added CPU.
>>>>
>>>> This way on a LPAR running in SMT=4, newly added CPU will be running 4
>>>> threads, which is what a end user would expect.
>>>
>>> I could see why most users would prefer this new behavior. But surely
>>> some users have come to expect the existing behavior, which has been in
>>> place for years, and developed workarounds that might be broken by this
>>> change?
>>>
>>> I would suggest that to handle this well, we need to give user space
>>> more ability to tell the kernel what actions to take on added cores, on
>>> an opt-in basis.
>>>
>>> This could take the form of extending the DLPAR sysfs command set:
>>>
>>> Option 1 - Add a flag that tells the kernel not to online any threads at
>>> all; user space will online the desired threads later.
>>>
>>> Option 2 - Add an option that tells the kernel which SMT mode to apply.
>>
>> powerpc-utils grew some drmgr hooks recently so maybe the policy can be
>> moved to userspace?
>
> I'm not sure whether the hook mechanism would come into play, but yes, I
> am suggesting that user space be given the option of overriding the
> kernel's current behavior.

I agree, sounds doable using the new drmgr hook mechanism.

2023-03-30 15:55:14

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

On 13/02/2023 16:40:50, Nathan Lynch wrote:
> Michal Suchánek <[email protected]> writes:
>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>> Laurent Dufour <[email protected]> writes:
>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>> for instance.
>>>>
>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>
>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>> Core 0: 0* 1* 2* 3* 4 5 6 7
>>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
>>>>
>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>>>>
>>>> To work around this possibility, and assuming that the LPAR run with the
>>>> same number of threads for each CPU, which is the common case,
>>>
>>> I am skeptical at best of baking that assumption into this code. Mixed
>>> SMT modes within a partition doesn't strike me as an unreasonable
>>> possibility for some use cases. And if that's wrong, then we should just
>>> add a global smt value instead of using heuristics.
>>>
>>>> the number
>>>> of active threads of the CPU doing the hot-plug operation is computed. Only
>>>> that number of threads will be activated for the newly added CPU.
>>>>
>>>> This way on a LPAR running in SMT=4, newly added CPU will be running 4
>>>> threads, which is what a end user would expect.
>>>
>>> I could see why most users would prefer this new behavior. But surely
>>> some users have come to expect the existing behavior, which has been in
>>> place for years, and developed workarounds that might be broken by this
>>> change?
>>>
>>> I would suggest that to handle this well, we need to give user space
>>> more ability to tell the kernel what actions to take on added cores, on
>>> an opt-in basis.
>>>
>>> This could take the form of extending the DLPAR sysfs command set:
>>>
>>> Option 1 - Add a flag that tells the kernel not to online any threads at
>>> all; user space will online the desired threads later.
>>>
>>> Option 2 - Add an option that tells the kernel which SMT mode to apply.
>>
>> powerpc-utils grew some drmgr hooks recently so maybe the policy can be
>> moved to userspace?
>
> I'm not sure whether the hook mechanism would come into play, but yes, I
> am suggesting that user space be given the option of overriding the
> kernel's current behavior.

Indeed, that's not so easy. There are multiple ways for the SMT level to be
impacted:
- smt-enabled kernel option
- smtstate systemctl service (if activated), saving SMT level at shutdown
time to restore it a boot time
- pseries-energyd daemon (if activated) could turn off threads
- ppc64_cpu --smt=x user command
- sysfs direct writing to turn off/on specific threads.

There is no SMT level saved, on "disk" or in the kernel, and any of these
options can interact in parallel. So from the user space point of view, the
best we could do is looking for the SMT current values, there could be
multiple values in the case of a mixed SMT state, peek one value and apply it.

Extending the drmgr's hook is still valid, and I sent a patch series on the
powerpc-utils mailing list to achieve that. However, changing the SMT level
in that hook means that newly added CPU will be first turn on and there is
a window where this threads could be seen active. Not a big deal but not
turning on these extra threads looks better to me.

That's being said, I can't see any benefit of a user space implementation
compared to the option I'm proposing in that patch.

Does anyone have a better idea?

Cheers,
Laurent.

2023-03-30 16:28:51

by Michal Suchánek

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote:
> On 13/02/2023 16:40:50, Nathan Lynch wrote:
> > Michal Such?nek <[email protected]> writes:
> >> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
> >>> Laurent Dufour <[email protected]> writes:
> >>>> When a new CPU is added, the kernel is activating all its threads. This
> >>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
> >>>> for instance.
> >>>>
> >>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
> >>>> active (system has been booted with the 'smt-enabled=4' kernel option):
> >>>>
> >>>> ltcden3-lp12:~ # ppc64_cpu --info
> >>>> Core 0: 0* 1* 2* 3* 4 5 6 7
> >>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
> >>>>
> >>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
> >>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.

> Indeed, that's not so easy. There are multiple ways for the SMT level to be
> impacted:
> - smt-enabled kernel option
> - smtstate systemctl service (if activated), saving SMT level at shutdown
> time to restore it a boot time
> - pseries-energyd daemon (if activated) could turn off threads
> - ppc64_cpu --smt=x user command
> - sysfs direct writing to turn off/on specific threads.
>
> There is no SMT level saved, on "disk" or in the kernel, and any of these
> options can interact in parallel. So from the user space point of view, the
> best we could do is looking for the SMT current values, there could be
> multiple values in the case of a mixed SMT state, peek one value and apply it.
>
> Extending the drmgr's hook is still valid, and I sent a patch series on the
> powerpc-utils mailing list to achieve that. However, changing the SMT level
> in that hook means that newly added CPU will be first turn on and there is
> a window where this threads could be seen active. Not a big deal but not
> turning on these extra threads looks better to me.

Which means

1) add an option to not onlince hotplugged CPUs by default

2) when a tool that wants to manage CPU onlining is active it can set
the option so that no threads are onlined automatically, and online the
desired threads

3) when no such tool is active the default should be to online all
threeads to preserve compatibility with existing behavior

> That's being said, I can't see any benefit of a user space implementation
> compared to the option I'm proposing in that patch.

The userspace implementation can implement arbitrily complex policy,
that's not something that belongs into the kernel.

Thanks

Michal

2023-03-31 15:20:34

by Laurent Dufour

[permalink] [raw]
Subject: Re: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU

On 30/03/2023 18:19:38, Michal Suchánek wrote:
> On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote:
>> On 13/02/2023 16:40:50, Nathan Lynch wrote:
>>> Michal Suchánek <[email protected]> writes:
>>>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote:
>>>>> Laurent Dufour <[email protected]> writes:
>>>>>> When a new CPU is added, the kernel is activating all its threads. This
>>>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system
>>>>>> for instance.
>>>>>>
>>>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads
>>>>>> active (system has been booted with the 'smt-enabled=4' kernel option):
>>>>>>
>>>>>> ltcden3-lp12:~ # ppc64_cpu --info
>>>>>> Core 0: 0* 1* 2* 3* 4 5 6 7
>>>>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
>>>>>>
>>>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR
>>>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest.
>
>> Indeed, that's not so easy. There are multiple ways for the SMT level to be
>> impacted:
>> - smt-enabled kernel option
>> - smtstate systemctl service (if activated), saving SMT level at shutdown
>> time to restore it a boot time
>> - pseries-energyd daemon (if activated) could turn off threads
>> - ppc64_cpu --smt=x user command
>> - sysfs direct writing to turn off/on specific threads.
>>
>> There is no SMT level saved, on "disk" or in the kernel, and any of these
>> options can interact in parallel. So from the user space point of view, the
>> best we could do is looking for the SMT current values, there could be
>> multiple values in the case of a mixed SMT state, peek one value and apply it.
>>
>> Extending the drmgr's hook is still valid, and I sent a patch series on the
>> powerpc-utils mailing list to achieve that. However, changing the SMT level
>> in that hook means that newly added CPU will be first turn on and there is
>> a window where this threads could be seen active. Not a big deal but not
>> turning on these extra threads looks better to me.
>
> Which means
>
> 1) add an option to not onlince hotplugged CPUs by default

After discussing this with Srikar, it happens that exposing the smt-enabled
value set a boot time (or not) in SYS FS and to update it when SMT level is
changed using ppc64_cpu will be better. This will aslo allow the energy
daemon to take this value in account.

> 2) when a tool that wants to manage CPU onlining is active it can set
> the option so that no threads are onlined automatically, and online the
> desired threads

When new CPU are added, the kernel will automatically online the right
number of threads based on that recorded SMT level.

>
> 3) when no such tool is active the default should be to online all
> threeads to preserve compatibility with existing behavior

I don't think we should keep the existing behavior, customers are confused
and some user space tools like lparstart have difficulties to handled mixed
SMT levels.

I'll submit a new series exposing the SMT level and propose a patch for
ppc64_cpu too.

>
>> That's being said, I can't see any benefit of a user space implementation
>> compared to the option I'm proposing in that patch.
>
> The userspace implementation can implement arbitrily complex policy,
> that's not something that belongs into the kernel.
>
> Thanks
>
> Michal