________________________________________
??????: Thomas Gleixner <[email protected]>
????ʱ??: 2020??10??28?? 16:30
?ռ???: Zhang, Qiang; [email protected]; [email protected]
????: [email protected]; [email protected]; [email protected]
????: Re: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online
[Please note this e-mail is from an EXTERNAL e-mail address]
On Wed, Oct 28 2020 at 15:30, qiang zhang wrote:
> From: Zqiang <[email protected]>
>
> When someone CPU offlined, the 'kthread_worker' which bind this CPU,
> will run anywhere, if this CPU online, recovery of 'kthread_worker'
> affinity by cpuhp notifiers.
>
> Signed-off-by: Zqiang <[email protected]>
> ---
> v1->v2:
> rename variable kworker_online to kthread_worker_online.
> add 'cpuhp_node' and 'bind_cpu' init in KTHREAD_WORKER_INIT.
> add a comment explaining for WARN_ON_ONCE.
>How is that addressing any of the comments I made on V1 of this?
Do you mean the following problem:
"The dynamic hotplug states run late. What's preventing work to be queued
on such a worker before it is bound to the CPU again?"
Thanks
Qiang
>
>Thanks,
>
> tglx
Quiang,
On Wed, Oct 28 2020 at 08:45, Qiang Zhang wrote:
> ________________________________________
> 发件人: Thomas Gleixner <[email protected]>
> 发送时间: 2020年10月28日 16:30
> 收件人: Zhang, Qiang; [email protected]; [email protected]
> 抄送: [email protected]; [email protected]; [email protected]
> 主题: Re: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online
Can you please teach your mail client not to copy the mail headers into
the reply?
> [Please note this e-mail is from an EXTERNAL e-mail address]
And delete this non-information please.
> On Wed, Oct 28 2020 at 15:30, qiang zhang wrote:
>
>>How is that addressing any of the comments I made on V1 of this?
>
> Do you mean the following problem:
>
> "The dynamic hotplug states run late. What's preventing work to be queued
> on such a worker before it is bound to the CPU again?"
This is one problem, but there are more and I explained them in great
length. If there is anything unclear, then please ask.
Thanks,
tglx
________________________________________
??????: Thomas Gleixner <[email protected]>
????ʱ??: 2020??10??28?? 17:23
?ռ???: Zhang, Qiang; [email protected]; [email protected]
????: [email protected]; [email protected]; [email protected]
????: Re: ?ظ?: [PATCH v2] kthread_worker: re-set CPU affinities if CPU come online
> [Please note this e-mail is from an EXTERNAL e-mail address]
>
> On Wed, Oct 28 2020 at 15:30, qiang zhang wrote:
>
>>How is that addressing any of the comments I made on V1 of this?
>
> Do you mean the following problem:
>
> "The dynamic hotplug states run late. What's preventing work to be queued
> on such a worker before it is bound to the CPU again?"
>
>This is one problem, but there are more and I explained them in great
>length. If there is anything unclear, then please ask.
Really, this patch is not considered that work may be put into the queue after the bound CPU is offline. in addition, when the bound CPU goes online again, before restoring the worker's CPU affinity, work may be put into the queue.
Although int this (powerclamp) way??that's not a problem, that it is solved by destroying and creating tasks when the CPU hotplug, in addition, when CPU going down , this need call 'cancel_work_sync' func in offline callback, this may be blocked long time. these operation is expensive.
this patch only just to recover the worker task's affinity when CPU go to online again that create by "kthread_create_worker_on_cpu" func , likely per-CPU worker method when CPU hotplug in "workqueue" and "io-wq".
Thanks
Qiang
>
>Thanks,
>
> tglx
On Thu, Oct 29 2020 at 03:14, Qiang Zhang wrote:
> Really, this patch is not considered that work may be put into the
> queue after the bound CPU is offline. in addition, when the bound CPU
> goes online again, before restoring the worker's CPU affinity, work
> may be put into the queue.
And how is that supposed to be correct?
> Although int this (powerclamp) way,that's not a problem, that it is
> solved by destroying and creating tasks when the CPU hotplug, in
> addition, when CPU going down , this need call 'cancel_work_sync' func
> in offline callback, this may be blocked long time. these operation is
> expensive.
It does not matter whether it's expensive or not. It's correct and
that's what matters most.
> this patch only just to recover the worker task's affinity when CPU go
> to online again that create by "kthread_create_worker_on_cpu" func ,
> likely per-CPU worker method when CPU hotplug in "workqueue" and
> "io-wq".
I know what this patch just does, but that makes it not any more
correct. It creates a semanticaly ill defined illusion of correctness.
We are not "fixing" a problem by making it work for your particular and
even not explained use case.
The expected semantics of a cpu bound kthread_worker are completely
unclear and undocumented. This needs to be fixed first and once this is
established and agreed on then the gaps in the implementation can be
closed.
Thanks,
tglx
On Thu 2020-10-29 09:27:26, Thomas Gleixner wrote:
> The expected semantics of a cpu bound kthread_worker are completely
> unclear and undocumented. This needs to be fixed first and once this is
> established and agreed on then the gaps in the implementation can be
> closed.
I thought about some sane semantic and it goes down to
the following problem:
The per-CPU kthread workers are created by explicitly calling
kthread_create_worker_on_cpu() on each CPU.
The API does _not_ store the information how to start the worker.
As a result, it is not able to start a new one when the CPU
goes online "for the first time". I mean when the CPU was offline
when the API user created the workers.
It means that the API user is responsible for handling CPU hotplug
on its own. We probably should just document it and do nothing else [*]
Alternative solution would be to extend the API and allow to create
kthread_worker on each online CPU. It would require to store
parameters needed to create the kthread only new online CPUs.
Then we might think about some sane semantic for CPU hotplug.
Well, it might be hard to define a sane semantic unless there are
more users of the API. So, I tend to keep it simple and just
document the status quo.
Any ideas?
[*] IMHO, it does not even make sense to manipulate the affinity.
It would just give a false feeling that it is enough.
Best Regards,
Petr
On Thu, Oct 29 2020 at 14:08, Petr Mladek wrote:
> On Thu 2020-10-29 09:27:26, Thomas Gleixner wrote:
>> The expected semantics of a cpu bound kthread_worker are completely
>> unclear and undocumented. This needs to be fixed first and once this is
>> established and agreed on then the gaps in the implementation can be
>> closed.
>
> I thought about some sane semantic and it goes down to
> the following problem:
>
> The per-CPU kthread workers are created by explicitly calling
> kthread_create_worker_on_cpu() on each CPU.
>
> The API does _not_ store the information how to start the worker.
> As a result, it is not able to start a new one when the CPU
> goes online "for the first time". I mean when the CPU was offline
> when the API user created the workers.
>
> It means that the API user is responsible for handling CPU hotplug
> on its own. We probably should just document it and do nothing else [*]
> [*] IMHO, it does not even make sense to manipulate the affinity.
> It would just give a false feeling that it is enough.
Agreed on both.
> Alternative solution would be to extend the API and allow to create
> kthread_worker on each online CPU. It would require to store
> parameters needed to create the kthread only new online CPUs.
> Then we might think about some sane semantic for CPU hotplug.
That facility already exists: smpboot_register_percpu_thread()
So "all" you'd need to do is to provide a kthread_worker variant which
utilizes that. It's straight forward, but not sure whether it's worth
the trouble.
> Well, it might be hard to define a sane semantic unless there are
> more users of the API. So, I tend to keep it simple and just
> document the status quo.
Ack.
Thanks,
tglx