Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp3027392rdb; Wed, 15 Nov 2023 19:34:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IF0DQI3DAHczNGb6HrBpXwQqCSZxv9HaO62XOSKHqVhg1D+EO2zDw6NXutVHywhe4fbURYS X-Received: by 2002:a17:90b:4b8d:b0:280:215e:7ae2 with SMTP id lr13-20020a17090b4b8d00b00280215e7ae2mr15661694pjb.2.1700105692739; Wed, 15 Nov 2023 19:34:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700105692; cv=none; d=google.com; s=arc-20160816; b=aFvkpiyNfthsdy5AlDssabe5mO6UPNPfCxuiWS9Ckol4Kcm2nDHMvyjRG1AKo77Po6 kwp0REKo49jdREitYEAqgHFrMkcqw2dNvWM8r43jr4bWxosUcFUXO/krP30oK7HAwwqr 3Al+chwrP+sYg4bpzQeQRy26lWeg6Id3KGUpwHSupZhjFQJ9uxgudmIa5oNyz2y+PN34 CEVbE8Pqf3Qss9oaKmpg/pb0uHSjyJRv9HIGTtve7GYch85HqcIKKNXcvC2N3s1DuM8l bQzc5P9Hj7GIU2JzoZtmIaRPQqbu5VRl1HAkNboUa/p3U5nYMgG9Fv3RKWXuyxuQfL9k mR/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HNDkm6mbeRQ+U5vKsB8rY7jHAsMMih9J2Y0gIXrPx78=; fh=bNpy/TkeoCLOfJdH8gD+DRrlc5TtbiBfs9F9HtSKd4A=; b=sQozbU85Z4r0GZP2X9jBs6ji18r4MfqAto02RVLngpTKKmR+RH4l1ji1zMlMwgRHx9 sqmV68ZiBkHbXbU6a7qoJ1A/IFtY2CE18XN5H+f+ZPBb9yQwJARidI2GhggWyCQ1rTm2 PKhW5pYgkGLQLujwF54tFFOpedH4f/X1m1slKfOD4wdrS3n/eVXNuraBmNEESUkoOOrP jDHzdL4r805E5KMXd97g9t8Y/TSzyw/PpEfzEoENihMp0lJqaMD7Ngm/IaYfm/TbSImp oPsesGOAMiMYRHI02lQHkWxDU7cT5feg5JLisEU/MVH+ky7qIv6LmxK3dlfBT+36iHdc cJQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="FaoW2Cl/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id hi23-20020a17090b30d700b0027b0acd55d2si1055644pjb.101.2023.11.15.19.34.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 19:34:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="FaoW2Cl/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B60C4804C1B0; Wed, 15 Nov 2023 19:34:51 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344881AbjKPDep (ORCPT + 99 others); Wed, 15 Nov 2023 22:34:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229447AbjKPDem (ORCPT ); Wed, 15 Nov 2023 22:34:42 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC8241A4 for ; Wed, 15 Nov 2023 19:34:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1700105677; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HNDkm6mbeRQ+U5vKsB8rY7jHAsMMih9J2Y0gIXrPx78=; b=FaoW2Cl/7yPx13f+7ca68TLZ6GB8rfxmUfOOgycnWjJiJ49TLZNycgea7bphRaVpP1oHIM Z+yPXlgdxYrtS3Vf9/kZItAWhuOLcuvRd7ceFQlvbNoeZLDKFHXuxkGtpqIRoDBDWxyJPG g3u5YN7e8HNPC5l8YfVrVVDVfyUONRs= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-602-OesBF61SNpqcgKE8DRlJsw-1; Wed, 15 Nov 2023 22:34:35 -0500 X-MC-Unique: OesBF61SNpqcgKE8DRlJsw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DCF1F2806674; Thu, 16 Nov 2023 03:34:34 +0000 (UTC) Received: from llong.com (unknown [10.22.8.169]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7E153492BFE; Thu, 16 Nov 2023 03:34:34 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Lai Jiangshan , Shuah Khan Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Peter Hunt , Frederic Weisbecker , Waiman Long Subject: [PATCH v4 2/5] workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask Date: Wed, 15 Nov 2023 22:34:02 -0500 Message-Id: <20231116033405.185166-3-longman@redhat.com> In-Reply-To: <20231116033405.185166-1-longman@redhat.com> References: <20231116033405.185166-1-longman@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 15 Nov 2023 19:34:51 -0800 (PST) When the "isolcpus" boot command line option is used to add a set of isolated CPUs, those CPUs will be excluded automatically from wq_unbound_cpumask to avoid running work functions from unbound workqueues. Recently cpuset has been extended to allow the creation of partitions of isolated CPUs dynamically. To make it closer to the "isolcpus" in functionality, the CPUs in those isolated cpuset partitions should be excluded from wq_unbound_cpumask as well. This can be done currently by explicitly writing to the workqueue's cpumask sysfs file after creating the isolated partitions. However, this process can be error prone. Ideally, the cpuset code should be allowed to request the workqueue code to exclude those isolated CPUs from wq_unbound_cpumask so that this operation can be done automatically and the isolated CPUs will be returned back to wq_unbound_cpumask after the destructions of the isolated cpuset partitions. This patch adds a new workqueue_unbound_exclude_cpumask() function to enable that. This new function will exclude the specified isolated CPUs from wq_unbound_cpumask. To be able to restore those isolated CPUs back after the destruction of isolated cpuset partitions, a new wq_requested_unbound_cpumask is added to store the user provided unbound cpumask either from the boot command line options or from writing to the cpumask sysfs file. This new cpumask provides the basis for CPU exclusion. To enable users to understand how the wq_unbound_cpumask is being modified internally, this patch also exposes the newly introduced wq_requested_unbound_cpumask as well as a wq_isolated_cpumask to store the cpumask to be excluded from wq_unbound_cpumask as read-only sysfs files. Signed-off-by: Waiman Long --- include/linux/workqueue.h | 1 + kernel/workqueue.c | 91 +++++++++++++++++++++++++++++++++++---- 2 files changed, 84 insertions(+), 8 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index cf49b467bd57..b0b9604b76b8 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -491,6 +491,7 @@ struct workqueue_attrs *alloc_workqueue_attrs(void); void free_workqueue_attrs(struct workqueue_attrs *attrs); int apply_workqueue_attrs(struct workqueue_struct *wq, const struct workqueue_attrs *attrs); +extern int workqueue_unbound_exclude_cpumask(cpumask_var_t cpumask); extern bool queue_work_on(int cpu, struct workqueue_struct *wq, struct work_struct *work); diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 145159f993e9..2fc585d3d6ca 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -381,6 +381,12 @@ static bool workqueue_freezing; /* PL: have wqs started freezing? */ /* PL&A: allowable cpus for unbound wqs and work items */ static cpumask_var_t wq_unbound_cpumask; +/* PL: user requested unbound cpumask via sysfs */ +static cpumask_var_t wq_requested_unbound_cpumask; + +/* PL: isolated cpumask to be excluded from unbound cpumask */ +static cpumask_var_t wq_isolated_cpumask; + /* for further constrain wq_unbound_cpumask by cmdline parameter*/ static struct cpumask wq_cmdline_cpumask __initdata; @@ -5771,7 +5777,7 @@ void thaw_workqueues(void) } #endif /* CONFIG_FREEZER */ -static int __maybe_unused workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask) +static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask) { LIST_HEAD(ctxs); int ret = 0; @@ -5814,6 +5820,44 @@ static int __maybe_unused workqueue_apply_unbound_cpumask(const cpumask_var_t un return ret; } +/** + * workqueue_unbound_exclude_cpumask - Exclude given CPUs from unbound cpumask + * @exclude_cpumask: the cpumask to be excluded from wq_unbound_cpumask + * + * This function can be called from cpuset code to provide a set of isolated + * CPUs that should be excluded from wq_unbound_cpumask. The caller must hold + * either cpus_read_lock or cpus_write_lock. + */ +int workqueue_unbound_exclude_cpumask(cpumask_var_t exclude_cpumask) +{ + cpumask_var_t cpumask; + int ret = 0; + + if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL)) + return -ENOMEM; + + lockdep_assert_cpus_held(); + mutex_lock(&wq_pool_mutex); + + /* Save the current isolated cpumask & export it via sysfs */ + cpumask_copy(wq_isolated_cpumask, exclude_cpumask); + + /* + * If the operation fails, it will fall back to + * wq_requested_unbound_cpumask which is initially set to + * (HK_TYPE_WQ ∩ HK_TYPE_DOMAIN) house keeping mask and rewritten + * by any subsequent write to workqueue/cpumask sysfs file. + */ + if (!cpumask_andnot(cpumask, wq_requested_unbound_cpumask, exclude_cpumask)) + cpumask_copy(cpumask, wq_requested_unbound_cpumask); + if (!cpumask_equal(cpumask, wq_unbound_cpumask)) + ret = workqueue_apply_unbound_cpumask(cpumask); + + mutex_unlock(&wq_pool_mutex); + free_cpumask_var(cpumask); + return ret; +} + static int parse_affn_scope(const char *val) { int i; @@ -6144,6 +6188,7 @@ static int workqueue_set_unbound_cpumask(cpumask_var_t cpumask) cpumask_and(cpumask, cpumask, cpu_possible_mask); if (!cpumask_empty(cpumask)) { apply_wqattrs_lock(); + cpumask_copy(wq_requested_unbound_cpumask, cpumask); if (cpumask_equal(cpumask, wq_unbound_cpumask)) { ret = 0; goto out_unlock; @@ -6158,19 +6203,36 @@ static int workqueue_set_unbound_cpumask(cpumask_var_t cpumask) return ret; } -static ssize_t wq_unbound_cpumask_show(struct device *dev, - struct device_attribute *attr, char *buf) +static ssize_t __wq_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf, cpumask_var_t mask) { int written; mutex_lock(&wq_pool_mutex); - written = scnprintf(buf, PAGE_SIZE, "%*pb\n", - cpumask_pr_args(wq_unbound_cpumask)); + written = scnprintf(buf, PAGE_SIZE, "%*pb\n", cpumask_pr_args(mask)); mutex_unlock(&wq_pool_mutex); return written; } +static ssize_t wq_unbound_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __wq_cpumask_show(dev, attr, buf, wq_unbound_cpumask); +} + +static ssize_t wq_requested_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __wq_cpumask_show(dev, attr, buf, wq_requested_unbound_cpumask); +} + +static ssize_t wq_isolated_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return __wq_cpumask_show(dev, attr, buf, wq_isolated_cpumask); +} + static ssize_t wq_unbound_cpumask_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { @@ -6188,9 +6250,13 @@ static ssize_t wq_unbound_cpumask_store(struct device *dev, return ret ? ret : count; } -static struct device_attribute wq_sysfs_cpumask_attr = +static struct device_attribute wq_sysfs_cpumask_attrs[] = { __ATTR(cpumask, 0644, wq_unbound_cpumask_show, - wq_unbound_cpumask_store); + wq_unbound_cpumask_store), + __ATTR(cpumask_requested, 0444, wq_requested_cpumask_show, NULL), + __ATTR(cpumask_isolated, 0444, wq_isolated_cpumask_show, NULL), + __ATTR_NULL, +}; static int __init wq_sysfs_init(void) { @@ -6203,7 +6269,13 @@ static int __init wq_sysfs_init(void) dev_root = bus_get_dev_root(&wq_subsys); if (dev_root) { - err = device_create_file(dev_root, &wq_sysfs_cpumask_attr); + struct device_attribute *attr; + + for (attr = wq_sysfs_cpumask_attrs; attr->attr.name; attr++) { + err = device_create_file(dev_root, attr); + if (err) + break; + } put_device(dev_root); } return err; @@ -6534,11 +6606,14 @@ void __init workqueue_init_early(void) BUILD_BUG_ON(__alignof__(struct pool_workqueue) < __alignof__(long long)); BUG_ON(!alloc_cpumask_var(&wq_unbound_cpumask, GFP_KERNEL)); + BUG_ON(!alloc_cpumask_var(&wq_requested_unbound_cpumask, GFP_KERNEL)); + BUG_ON(!zalloc_cpumask_var(&wq_isolated_cpumask, GFP_KERNEL)); cpumask_copy(wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_WQ)); cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN)); if (!cpumask_empty(&wq_cmdline_cpumask)) cpumask_and(wq_unbound_cpumask, wq_unbound_cpumask, &wq_cmdline_cpumask); + cpumask_copy(wq_requested_unbound_cpumask, wq_unbound_cpumask); pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC); -- 2.39.3