LinuxLists.cc - Subject: [PATCHSET v2 wq/for-3.10] workqueue: NUMA affinity for unbound workqueues

2013-03-28 06:43:26

Subject: Subject: [PATCHSET v2 wq/for-3.10] workqueue: NUMA affinity for unbound workqueues

Hello,

Changes from the last take[L] are

* Lai pointed out that the previous implementation was broken in that
if a workqueue spans over multiple nodes and some of the nodes don't
have any desired online CPUs, work items queued on those nodes would
be spread across all CPUs violating the configured cpumask.

The patchset is updated such that apply_workqueue_attrs() now only
assigns NUMA-affine pwqs to nodes with desired online CPUs and
default pwq to all other nodes. To track CPU online state,
wq_update_unbound_numa_attrs() is added. The function is called for
each workqueue during hot[un]plug and updates pwq association
accordingly.

* Rolled in updated patches from the previous posting.

* More helper routines are factored out such that
apply_workqueue_attrs() is easier to follow and can share code paths
with wq_update_unbound_numa_attrs().

* Various cleanups and fixes.

Patchset description from the original posting follows.

There are two types of workqueues - per-cpu and unbound. The former
is bound to each CPU and the latter isn't not bound to any by default.
While the recently added attrs support allows unbound workqueues to be
confined to subset of CPUs, it still is quite cumbersome for
applications where CPU affinity is too constricted but NUMA locality
still matters.

This patchset tries to solve that issue by automatically making
unbound workqueues affine to NUMA nodes by default. A work item
queued to an unbound workqueue is executed on one of the CPUs allowed
by the workqueue in the same node. If there's none allowed, it may be
executed on any cpu allowed by the workqueue. It doesn't require any
changes on the user side. Every interface of workqueues functions the
same as before.

This would be most helpful to subsystems which use some form of async
execution to process significant amount of data - e.g. crypto and
btrfs; however, I wanted to find out whether it would make any dent in
much less favorable use cases. The following is total run time in
seconds of buliding allmodconfig kernel w/ -j20 on a dual socket
opteron machine with writeback thread pool converted to unbound
workqueue and thus made NUMA-affine. The file system is ext4 on top
of a WD SSD.

before conversion after conversion
1396.126 1394.763
1397.621 1394.965
1399.636 1394.738
1397.463 1398.162
1395.543 1393.670

AVG 1397.278 1395.260 DIFF 2.018
STDEV 1.585 1.700

And, yes, it actually made things go faster by about 1.2 sigma, which
isn't completely conclusive but is a pretty good indication that it's
actually faster. Note that this is a workload which is dominated by
CPU time and while there's writeback going on continously it really
isn't touching too much data or a dominating factor, so the gain is
understandably small, 0.14%, but hey it's still a gain and it should
be much more interesting for crypto and btrfs which would actully
access the data or workloads which are more sensitive to NUMA
affinity.

The implementation is fairly simple. After the recent attrs support
changes, a lot of the differences in pwq (pool_workqueue) handling
between unbound and per-cpu workqueues are gone. An unbound workqueue
still has one "current" pwq that it uses for queueing any new work
items but can handle multiple pwqs perfectly well while they're
draining, so this patchset adds pwq dispatch table to unbound
workqueues which is indexed by NUMA node and points to the matching
pwq. Unbound workqueues now simply have multiple "current" pwqs keyed
by NUMA node.

NUMA affinity can be turned off system-wide by workqueue.disable_numa
kernel param or per-workqueue using "numa" sysfs file.

This patchset contains the following fourteen patches.

0001-workqueue-move-pwq_pool_locking-outside-of-get-put_u.patch
0002-workqueue-add-wq_numa_tbl_len-and-wq_numa_possible_c.patch
0003-workqueue-drop-H-from-kworker-names-of-unbound-worke.patch
0004-workqueue-determine-NUMA-node-of-workers-accourding-.patch
0005-workqueue-add-workqueue-unbound_attrs.patch
0006-workqueue-make-workqueue-name-fixed-len.patch
0007-workqueue-move-hot-fields-of-workqueue_struct-to-the.patch
0008-workqueue-map-an-unbound-workqueues-to-multiple-per-.patch
0009-workqueue-break-init_and_link_pwq-into-two-functions.patch
0010-workqueue-use-NUMA-aware-allocation-for-pool_workque.patch
0011-workqueue-introduce-numa_pwq_tbl_install.patch
0012-workqueue-introduce-put_pwq_unlocked.patch
0013-workqueue-implement-NUMA-affinity-for-unbound-workqu.patch
0014-workqueue-update-sysfs-interface-to-reflect-NUMA-awa.patch

0001-0009 are prep patches.

0010-0013 implement NUMA affinity.

0014 adds control knobs and updates sysfs interface.

This patchset is on top of

wq/for-3.10 b59276054 ("workqueue: remove pwq_lock which is no longer used")
+ [1] ("workqueue: fix race condition in unbound workqueue free path") patch
+ [2] ("workqueue: fix unbound workqueue attrs hashing / comparison") patch

and also available in the following git branch.

git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-numa

diffstat follows.

Documentation/kernel-parameters.txt | 9
include/linux/workqueue.h | 5
kernel/workqueue.c | 640 ++++++++++++++++++++++++++++++------
3 files changed, 549 insertions(+), 105 deletions(-)

Thanks.

--
tejun

[L] http://thread.gmane.org/gmane.linux.kernel.cryptoapi/8501
[1] http://article.gmane.org/gmane.linux.kernel/1465618
[2] http://article.gmane.org/gmane.linux.kernel/1465619

2013-03-28 06:43:27

Subject: Subject: [PATCHSET v2 wq/for-3.10] workqueue: NUMA affinity for unbound workqueues

Subject: [PATCH 01/14] workqueue: move pwq_pool_locking outside of get/put_unbound_pool()

Subject: [PATCH 03/14] workqueue: drop 'H' from kworker names of unbound worker pools

Subject: [PATCH 04/14] workqueue: determine NUMA node of workers accourding to the allowed cpumask

Subject: [PATCH 07/14] workqueue: move hot fields of workqueue_struct to the end

Subject: [PATCH 05/14] workqueue: add workqueue->unbound_attrs

Subject: [PATCH 09/14] workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()

Subject: [PATCH 10/14] workqueue: use NUMA-aware allocation for pool_workqueues

Subject: [PATCH 08/14] workqueue: map an unbound workqueues to multiple per-node pool_workqueues

Subject: [PATCH 06/14] workqueue: make workqueue->name[] fixed len

Subject: [PATCH 14/14] workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity

Subject: [PATCH 11/14] workqueue: introduce numa_pwq_tbl_install()

Subject: [PATCH 12/14] workqueue: introduce put_pwq_unlocked()

Subject: [PATCH 13/14] workqueue: implement NUMA affinity for unbound workqueues

Subject: [PATCH 02/14] workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]

Subject: [PATCH v4 13/14] workqueue: implement NUMA affinity for unbound workqueues

Subject: Re: [PATCH v4 13/14] workqueue: implement NUMA affinity for unbound workqueues

Subject: Re: [PATCH v4 13/14] workqueue: implement NUMA affinity for unbound workqueues

Subject: [PATCH 0.5/14] workqueue: fix memory leak in apply_workqueue_attrs()

Subject: [PATCH v5 13/14] workqueue: implement NUMA affinity for unbound workqueues

Subject: Re: Subject: [PATCHSET v2 wq/for-3.10] workqueue: NUMA affinity for unbound workqueues