Introduce queue_work_cpumask to queue work on a "random" CPU onto a given
cpumask. It would be helpful when devices/modules want to assign works on
different cpusets but do not want to maintain extra workqueues, since they
have to alloc different workqueues and set different
workqueue_attrs->cpumask in the past times.
For now only available for unbound workqueues, We will try to further
patch it.
And default to the first CPU that is in the intersection of the cpumask
given and the online cpumask.
The only exception is if the CPU is local in the cpuset we will just use
the current CPU.
The implementation and comments are referenced from
'commit 8204e0c1113d ("workqueue: Provide queue_work_node to queue work
near a given NUMA node")'
Signed-off-by: Tio Zhang <[email protected]>
Signed-off-by: zzzyhtheonly <[email protected]>
---
include/linux/workqueue.h | 2 +
kernel/workqueue.c | 77 +++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+)
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 3992c994787f..61e56f4fcdaa 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -440,6 +440,8 @@ extern bool queue_work_on(int cpu, struct workqueue_struct *wq,
struct work_struct *work);
extern bool queue_work_node(int node, struct workqueue_struct *wq,
struct work_struct *work);
+extern bool queue_work_cpumask(cpumask_var_t cpumask,
+ struct workqueue_struct *wq, struct work_struct *work);
extern bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *work, unsigned long delay);
extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4666a1a92a31..80aae9a55829 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1644,6 +1644,83 @@ bool queue_work_node(int node, struct workqueue_struct *wq,
}
EXPORT_SYMBOL_GPL(queue_work_node);
+/**
+ * workqueue_select_cpu_cpumask - Select a CPU based on given cpumask
+ * @cpumask: cpumask that we want to select a CPU from
+ *
+ * This function will attempt to find a "random" cpu available on a given
+ * cpumask. If there are no CPUs available on the given cpumask it will
+ * return WORK_CPU_UNBOUND indicating that we should just schedule to any
+ * available CPU if we need to schedule this work.
+ */
+static int workqueue_select_cpu_cpumask(cpumask_var_t cpumask)
+{
+ int cpu;
+
+ /* Use local cpu if we are already there */
+ cpu = raw_smp_processor_id();
+ if (cpumask_test_cpu(cpu, cpumask))
+ return cpu;
+
+ /* Use "random" otherwise know as "first" online CPU of node */
+ cpu = cpumask_any_and(cpumask, cpu_online_mask);
+
+ /* If CPU is valid return that, otherwise just defer */
+ return cpu < nr_cpu_ids ? cpu : WORK_CPU_UNBOUND;
+}
+
+/**
+ * queue_work_cpumask - queue work on a "random" cpu for a given cpumask
+ * @cpumask: cpumask that we are targeting the work for
+ * @wq: workqueue to use
+ * @work: work to queue
+ *
+ * We queue the work to a "random" CPU within a given cpumask. The basic
+ * idea here is to provide a way to somehow associate work with a given
+ * cpumask.
+ *
+ * This function will only make a best effort attempt at getting this onto
+ * the right cpumask. If no cpu in this cpumask is requested or the
+ * requested cpumask is offline then we just fall back to standard
+ * queue_work behavior.
+ *
+ * Currently the "random" CPU ends up being the first available CPU in the
+ * intersection of cpu_online_mask and the cpumask given, unless we
+ * are running on the cpumask. In that case we just use the current CPU.
+ *
+ * Return: %false if @work was already on a queue, %true otherwise.
+ */
+bool queue_work_cpumask(cpumask_var_t cpumask, struct workqueue_struct *wq,
+ struct work_struct *work)
+{
+ unsigned long flags;
+ bool ret = false;
+
+ /*
+ * This current implementation is specific to unbound workqueues.
+ * Specifically we only return the first available CPU for a given
+ * node instead of cycling through individual CPUs within the node.
+ *
+ * If this is used with a per-cpu workqueue then the logic in
+ * workqueue_select_cpu_cpumask would need to be updated to allow for
+ * some round robin type logic.
+ */
+ WARN_ON_ONCE(!(wq->flags & WQ_UNBOUND));
+
+ local_irq_save(flags);
+
+ if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
+ int cpu = workqueue_select_cpu_cpumask(cpumask);
+
+ __queue_work(cpu, wq, work);
+ ret = true;
+ }
+
+ local_irq_restore(flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(queue_work_cpumask);
+
void delayed_work_timer_fn(struct timer_list *t)
{
struct delayed_work *dwork = from_timer(dwork, t, timer);
--
2.39.2 (Apple Git-143)
On Tue, Jun 6, 2023 at 5:31 PM Tio Zhang <[email protected]> wrote:
>
> Introduce queue_work_cpumask to queue work on a "random" CPU onto a given
> cpumask. It would be helpful when devices/modules want to assign works on
> different cpusets but do not want to maintain extra workqueues, since they
> have to alloc different workqueues and set different
> workqueue_attrs->cpumask in the past times.
>
> For now only available for unbound workqueues, We will try to further
> patch it.
> And default to the first CPU that is in the intersection of the cpumask
> given and the online cpumask.
> The only exception is if the CPU is local in the cpuset we will just use
> the current CPU.
>
> The implementation and comments are referenced from
> 'commit 8204e0c1113d ("workqueue: Provide queue_work_node to queue work
> near a given NUMA node")'
>
The code seems duplicated too. Could you do a little refactoring and make
they (queue_work_cpumask() & queue_work_node()) share some code?
On Tue, Jun 06, 2023 at 05:31:35PM +0800, Tio Zhang wrote:
> Introduce queue_work_cpumask to queue work on a "random" CPU onto a given
> cpumask. It would be helpful when devices/modules want to assign works on
> different cpusets but do not want to maintain extra workqueues, since they
> have to alloc different workqueues and set different
> workqueue_attrs->cpumask in the past times.
>
> For now only available for unbound workqueues, We will try to further
> patch it.
> And default to the first CPU that is in the intersection of the cpumask
> given and the online cpumask.
> The only exception is if the CPU is local in the cpuset we will just use
> the current CPU.
>
> The implementation and comments are referenced from
> 'commit 8204e0c1113d ("workqueue: Provide queue_work_node to queue work
> near a given NUMA node")'
Can you elaborate the intended use cases?
Thanks.
--
tejun
// I resend this to put it into the same thread, sorry for the confusion.
> Can you elaborate the intended use cases?
Hi Tejun,
Thanks for your reply! Please let me use myself as an example to explain this.
In my scenario, I have 7 cpus on my machine (actually it is uma, so
queue_work_node
or using UNBOUND do not works for me), and for some unlucky reasons
there are always some irqs running on cpu 0 and cpu 6, since I'm using arm64
with irqs tuning into FIFO threads, those threaded irqs are always running on
cpu 0 and 6 too (for affinity). And this would not be fixed easily in
short terms :(
So in order to help async init for better boot times for my devices,
I'd like to prevent
works from running on cpu 0 and 6. With queue_work_cpumask(), it would be simply
done by:
...
cpumask_clear_cpu(0, cpumask); // actually I use sysfs to parse my cpumask
cpumask_clear_cpu(6, cpumask);
queue_work_cpumask(cpumask, my_wq, &my_work->work);
...
> The code seems duplicated too. Could you do a little refactoring and make
> they (queue_work_cpumask() & queue_work_node()) share some code?
Hi Lai,
Thanks for your advice!
I do the refactoring in PATCH v2, there are some changes:
1. removed WARN_ONCE in previous code
1). queue_work_node works well in UNBOUND since we have unbound_pwq_by_node()
in __queue_work() to choose the right node.
2). queue_work_cpumask does not work in UNBOUND since list
numa_pwq_tbl is designed
to be per numa node. I comment on this in this patch.
2. remove the previous workqueue_select_cpu_near and let queue_work_node() use
queue_work_on() and queue_work_cpumask().
I test this patch with 100,000 queue_work_cpumask() &
queue_work_node() with randomly
inputs cpumask & node, it works as expected on my machines (80 cores
x86_64 & 7 cores ARM64
& 16 cores ARM64).
Please help review, thanks a lot!
Thanks,
Tio Zhang
Yuanhan Zhang <[email protected]> 于2023年6月9日周五 14:07写道:
>
> // I resend this to put it into the same thread, sorry for the confusion.
>
> > Can you elaborate the intended use cases?
>
> Hi Tejun,
>
> Thanks for your reply! Please let me use myself as an example to explain this.
>
> In my scenario, I have 7 cpus on my machine (actually it is uma, so queue_work_node
> or using UNBOUND do not works for me), and for some unlucky reasons
> there are always some irqs running on cpu 0 and cpu 6, since I'm using arm64
> with irqs tuning into FIFO threads, those threaded irqs are always running on
> cpu 0 and 6 too (for affinity). And this would not be fixed easily in short terms :(
>
> So in order to help async init for better boot times for my devices, I'd like to prevent
> works from running on cpu 0 and 6. With queue_work_cpumask(), it would be simply
> done by:
>
> ...
> cpumask_clear_cpu(0, cpumask); // actually I use sysfs to parse my cpumask
> cpumask_clear_cpu(6, cpumask);
> queue_work_cpumask(cpumask, my_wq, &my_work->work);
> ...
>
>
> > The code seems duplicated too. Could you do a little refactoring and make
> > they (queue_work_cpumask() & queue_work_node()) share some code?
>
> Hi Lai,
>
> Thanks for your advice!
>
> I do the refactoring in PATCH v2, there are some changes:
> 1. removed WARN_ONCE in previous code
> 1). queue_work_node works well in UNBOUND since we have unbound_pwq_by_node()
> in __queue_work() to choose the right node.
> 2). queue_work_cpumask does not work in UNBOUND since list numa_pwq_tbl is designed
> to be per numa node. I comment on this in this patch.
> 2. remove the previous workqueue_select_cpu_near and let queue_work_node() use
> queue_work_on() and queue_work_cpumask().
>
> I test this patch with 100,000 queue_work_cpumask() & queue_work_node() with randomly
> inputs cpumask & node, it works as expected on my machines (80 cores x86_64 & 7 cores ARM64
> & 16 cores ARM64).
>
> Please help review, thanks a lot!
>
> Thanks,
> Tio Zhang
// I resend this to put it into the same thread, sorry for the confusion.
> Can you elaborate the intended use cases?
Hi Tejun,
Thanks for your reply! Please let me use myself as an example to explain this.
In my scenario, I have 7 cpus on my machine (actually it is uma, so
queue_work_node
or using UNBOUND do not works for me), and for some unlucky reasons
there are always some irqs running on cpu 0 and cpu 6, since I'm using arm64
with irqs tuning into FIFO threads, those threaded irqs are always running on
cpu 0 and 6 too (for affinity). And this would not be fixed easily in
short terms :(
So in order to help async init for better boot times for my devices,
I'd like to prevent
works from running on cpu 0 and 6. With queue_work_cpumask(), it would be simply
done by:
...
cpumask_clear_cpu(0, cpumask); // actually I use sysfs to parse my cpumask
cpumask_clear_cpu(6, cpumask);
queue_work_cpumask(cpumask, my_wq, &my_work->work);
...
> The code seems duplicated too. Could you do a little refactoring and make
> they (queue_work_cpumask() & queue_work_node()) share some code?
Hi Lai,
Thanks for your advice!
I do the refactoring in PATCH v2, there are some changes:
1. removed WARN_ONCE in previous code
1). queue_work_node works well in UNBOUND since we have unbound_pwq_by_node()
in __queue_work() to choose the right node.
2). queue_work_cpumask does not work in UNBOUND since list
numa_pwq_tbl is designed
to be per numa node. I comment on this in this patch.
2. remove the previous workqueue_select_cpu_near and let queue_work_node() use
queue_work_on() and queue_work_cpumask().
I test this patch with 100,000 queue_work_cpumask() &
queue_work_node() with randomly
inputs cpumask & node, it works as expected on my machines (80 cores
x86_64 & 7 cores ARM64
& 16 cores ARM64).
Please help review, thanks a lot!
Thanks,
Tio Zhang
Tejun Heo <[email protected]> 于2023年6月9日周五 06:52写道:
>
Tejun Heo <[email protected]> 于2023年6月9日周五 06:52写道:
>
> On Tue, Jun 06, 2023 at 05:31:35PM +0800, Tio Zhang wrote:
> > Introduce queue_work_cpumask to queue work on a "random" CPU onto a given
> > cpumask. It would be helpful when devices/modules want to assign works on
> > different cpusets but do not want to maintain extra workqueues, since they
> > have to alloc different workqueues and set different
> > workqueue_attrs->cpumask in the past times.
> >
> > For now only available for unbound workqueues, We will try to further
> > patch it.
> > And default to the first CPU that is in the intersection of the cpumask
> > given and the online cpumask.
> > The only exception is if the CPU is local in the cpuset we will just use
> > the current CPU.
> >
> > The implementation and comments are referenced from
> > 'commit 8204e0c1113d ("workqueue: Provide queue_work_node to queue work
> > near a given NUMA node")'
>
> Can you elaborate the intended use cases?
>
> Thanks.
>
> --
> tejun
Hello,
On Fri, Jun 09, 2023 at 02:28:19PM +0800, Yuanhan Zhang wrote:
> // I resend this to put it into the same thread, sorry for the confusion.
This got resent quite a few times and I don't know which one to reply to.
Just picking the one which seems like the latest.
> > Can you elaborate the intended use cases?
>
> Thanks for your reply! Please let me use myself as an example to explain this.
>
> In my scenario, I have 7 cpus on my machine (actually it is uma, so
> queue_work_node
> or using UNBOUND do not works for me), and for some unlucky reasons
> there are always some irqs running on cpu 0 and cpu 6, since I'm using arm64
> with irqs tuning into FIFO threads, those threaded irqs are always running on
> cpu 0 and 6 too (for affinity). And this would not be fixed easily in
> short terms :(
>
> So in order to help async init for better boot times for my devices,
> I'd like to prevent
> works from running on cpu 0 and 6. With queue_work_cpumask(), it would be simply
> done by:
>
> ...
> cpumask_clear_cpu(0, cpumask); // actually I use sysfs to parse my cpumask
> cpumask_clear_cpu(6, cpumask);
> queue_work_cpumask(cpumask, my_wq, &my_work->work);
> ...
But this would require explicit code customization on every call site which
doesn't seem ideal given that this is to work around something which is tied
to the specific hardware.
Wouldn't it be better to add a kernel parameter to further constrain
wq_unbound_cpumask? Right now, on boot, it's only determined by isolcpus but
it shouldn't be difficult to add a workqueue parameter to further constrain
it.
Thanks.
--
tejun
Hi Tejun,
Tejun Heo <[email protected]> 于2023年6月13日周二 01:54写道:
>
> Hello,
>
> On Fri, Jun 09, 2023 at 02:28:19PM +0800, Yuanhan Zhang wrote:
> > // I resend this to put it into the same thread, sorry for the confusion.
>
> This got resent quite a few times and I don't know which one to reply to.
> Just picking the one which seems like the latest.
Thanks for your patience.
>
> > > Can you elaborate the intended use cases?
> >
> > Thanks for your reply! Please let me use myself as an example to explain this.
> >
> > In my scenario, I have 7 cpus on my machine (actually it is uma, so
> > queue_work_node
> > or using UNBOUND do not works for me), and for some unlucky reasons
> > there are always some irqs running on cpu 0 and cpu 6, since I'm using arm64
> > with irqs tuning into FIFO threads, those threaded irqs are always running on
> > cpu 0 and 6 too (for affinity). And this would not be fixed easily in
> > short terms :(
> >
> > So in order to help async init for better boot times for my devices,
> > I'd like to prevent
> > works from running on cpu 0 and 6. With queue_work_cpumask(), it would be simply
> > done by:
> >
> > ...
> > cpumask_clear_cpu(0, cpumask); // actually I use sysfs to parse my cpumask
> > cpumask_clear_cpu(6, cpumask);
> > queue_work_cpumask(cpumask, my_wq, &my_work->work);
> > ...
>
> But this would require explicit code customization on every call site which
> doesn't seem ideal given that this is to work around something which is tied
> to the specific hardware.
Yes, I agree that using wq_unbound_cpumask would be a great idea and a
substitute
solution for devices booting. But wq_unbound_cpumask could only constrain on
WQ_UNBOUND while I'm trying to make each of my work configurable.
Please let me try to explain this by another example:
If I have several kinds of works, and I'd like to make them run on
different cpusets
(so it is not ideal to put them on WQ_UNBOUND).
This would be done like this:
queue_work_cpumask(cpumask_A /*B or C or D, just maintain different cpumasks*/,
system_wq, work);
And after that, I don't have to customize my codes anymore since I could
control those cpumasks by procfs or sysfs or ioctl or whatever I like.
So I believe this feature to let per work to choose its cpumask would be quite
convenient sometimes :)
>
> Wouldn't it be better to add a kernel parameter to further constrain
> wq_unbound_cpumask? Right now, on boot, it's only determined by isolcpus but
> it shouldn't be difficult to add a workqueue parameter to further constrain
> it.
Yes, thanks again, this would be a great solution for device boot. And
I have followed
your suggestion to submit another patch '[PATCH] sched/isolation: add
a workqueue
parameter to constrain unbound CPUs'. This patch simply let
"isolcpus=" to have a
"workqueue" option, which makes `housekeeping_cpumask(HK_TYPE_WQ)` to copy
a constrained workqueue cpumask to wq_unbound_cpumask.
Please help review, and thank you again for your time.
Thanks,
Tio Zhang
>
> Thanks.
>
> --
> tejun
Hello,
On Tue, Jun 13, 2023 at 06:25:44PM +0800, Yuanhan Zhang wrote:
> If I have several kinds of works, and I'd like to make them run on
> different cpusets
> (so it is not ideal to put them on WQ_UNBOUND).
So, you can achieve that by creating a workqueue for each group of work
items. Each workqueue is just a front-end to shared pools of workers and
groups work items according to their attributes and flush grouping needs,
which is what you're asking for here.
Thanks.
--
tejun