Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp533246imw; Wed, 13 Jul 2022 03:21:04 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vM8OnU3c+k9lJDG8W8F/Zi6mM/4YDvzcK3lmoisA3tjVRUPEAOJTu1WzxueJTVMLkvbtum X-Received: by 2002:a17:902:d487:b0:16c:509a:ac10 with SMTP id c7-20020a170902d48700b0016c509aac10mr2457882plg.162.1657707663811; Wed, 13 Jul 2022 03:21:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657707663; cv=none; d=google.com; s=arc-20160816; b=DJukNWJoQ55NkNvJdNWVJtu6Wr2iSN57DzGmi/we92Y50HmeWfdRs4CRBoZRrM40pJ S7h2NZdCoPI+zZ4f3coC3hpH9SH9OlUTYeJiHeP8d4ODlmNGh5FZnrRHb0BDqR5a8awO cfLMdZJOGpWd4nCLvq8z9J2HBXEpFlE/dNf25qwVwWXP3zwlqOE7YloBW0/sCy066HLv 1WpTie8lZECC7mgU2JBhqfZNaGmr6UgRRUFUjD9iwrHiVVjvm6OiqRl+rPRq3CRjsLeN Jom5QmMULxR0vxl0No576LiFLx9LpqGWpmZYmtkYlt9dG+7GF99NeKIim7BSmrjpMs3+ 5Sgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=iqox5ElpuuJLfeOhCrK3mRieKVQov3zq0OlnwAVVh9E=; b=zt12B9x5d6cgZ7qB6DcvTCyqTJHAF9dZqNCcauSDa395zFq9SISdivX5cTiWLIyg3l +AFMHOGEQ7qpxjj5QgYF9c5sjcsIlgm4oH3elQjd3yvfc/znbrUZbT2sXPYH/YWCGsr1 d6IkdVg5ANuXW4CUqNAcO8ZQnmQX2xqxS5xfq8xZwW559X67izBi3Ei1eWUoQGOXbcV+ 0mFOhR4WRr4ZfHMWNy5MtbvZglAfrcP02bmjBBplfdwkbTvjECs2XwWmsfbJgTlfYIMp bzrmLLAwdoqy5heDnVE8oTpgu8kLH/udTOZE38evON9oqzImwtX25tVw2HLXWSR9uqqe e5nA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="eglvEC3/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a26-20020a056a001d1a00b0050fac0168c6si14214041pfx.49.2022.07.13.03.20.51; Wed, 13 Jul 2022 03:21:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="eglvEC3/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235553AbiGMJxG (ORCPT + 99 others); Wed, 13 Jul 2022 05:53:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230381AbiGMJxE (ORCPT ); Wed, 13 Jul 2022 05:53:04 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 829CCF90CB for ; Wed, 13 Jul 2022 02:53:03 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id q5-20020a17090a304500b001efcc885cc4so2701330pjl.4 for ; Wed, 13 Jul 2022 02:53:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=iqox5ElpuuJLfeOhCrK3mRieKVQov3zq0OlnwAVVh9E=; b=eglvEC3/5079MjgvxFT+uezit/T7cwu5EW/csDGDSdA3+7AfBFqC8QV0fCfIK6Mbg6 VZty7pzFt9RQWmC8i1R9uQC5ZGU6bGCM6FRhI2iwqpWOVebt2emmeUFxPcClLeQ/HyXd 5nsfsQXUSorz+b8MAzLSsB0qT7aO9Nokz2LTY1/5M99iH6CeMronAlUPXg3zVycKap/S va/wZ4eQhOpYSVh46SiII2hjWKSw2tdn04C9DF/Vfgy31960ZxEZhTzetXKlmibd5NVi muEnbi8EP9t7BqstzqUFkASL2DmAbtweTf0zrNpEjeLuhdqIE++ZTnXviMauW47MfKCc q0bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=iqox5ElpuuJLfeOhCrK3mRieKVQov3zq0OlnwAVVh9E=; b=EvlOaSqgVTfJZn5TCxrWeoKA9mQZpLxiTBGMxQIYby8nj5QtLrHLvDynG29GeNPm2W kBbtdUtS5ZeA/YiCD2oqx+ZAayeIEjxBDX3k4FXaeXZwGVcmPviUAlYjJI5DDmeo5H2X J7xcmcXkKEDhKUXI0Wv9kEMEVfe0vXxS0dKCYulKbwcixZwA+ugGou+sJT7WYASm9ckO YBLsBG1iXJ9N/VQQxc6bRlKqycRxRKSjrH4lDyjZyGpp/wt3HWcR8R4V2Hf7kbo/00Zv 5B6eltRUW2j+979Wv6P13WhFUx7v6/rfmnAo7mYFYYCytV0EDAipe5dNrvwk5i9nRIys pCZg== X-Gm-Message-State: AJIora8G1Egya/Cf5VRJYm26bsrtENM/5EZCPuTc53vV1nXbnhIMzHqe EyC3lqj0h7zjPS3U1YL+/SU= X-Received: by 2002:a17:90b:3d82:b0:1f0:5894:7e39 with SMTP id pq2-20020a17090b3d8200b001f058947e39mr2921512pjb.187.1657705982928; Wed, 13 Jul 2022 02:53:02 -0700 (PDT) Received: from [30.46.241.33] ([47.246.98.177]) by smtp.gmail.com with ESMTPSA id g14-20020aa796ae000000b0052536c695c0sm8640476pfk.170.2022.07.13.02.53.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 Jul 2022 02:53:02 -0700 (PDT) Message-ID: <0320c5f9-cbda-1652-1f97-24d1a22fb298@gmail.com> Date: Wed, 13 Jul 2022 17:52:58 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH] workqueue: Use active mask for new worker when pool is DISASSOCIATED Content-Language: en-US To: Schspa Shi , tj@kernel.org Cc: linux-kernel@vger.kernel.org, zhaohui.shi@horizon.ai, Peter Zijlstra References: <20220707090501.55483-1-schspa@gmail.com> From: Lai Jiangshan In-Reply-To: <20220707090501.55483-1-schspa@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CC Peter. Peter has changed the CPU binding code in workqueue.c. I'm not understanding the problem enough, if kthread_bind_mask() is buggy in workqueue.c, it would be buggy in other places too. On 2022/7/7 17:05, Schspa Shi wrote: > > - if (worker->rescue_wq) > - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + if (worker->rescue_wq) { > + if (pool->flags & POOL_DISASSOCIATED) > + set_cpus_allowed_ptr(worker->task, cpu_active_mask); > + else > + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + } > For unbound pools (which also has POOL_DISASSOCIATED), pool->attrs->cpumask should be used if pool->attrs->cpumask has active cpu. > + > + mutex_lock(&wq_pool_attach_mutex); > + if ((pool->flags & POOL_DISASSOCIATED)) { > + /* We can't call get_online_cpus, there will be deadlock > + * cpu_active_mask will no change, because we have > + * wq_pool_attach_mutex hold. > + **/ > + kthread_bind_mask(worker->task, cpu_active_mask); > + } else { > + kthread_bind_mask(worker->task, pool->attrs->cpumask); > + } > + mutex_unlock(&wq_pool_attach_mutex); For unbound pools, pool->attrs->cpumask should be used if pool->attrs->cpumask has active cpu. wq_pool_attach_mutex is held here and in worker_attach_to_pool() which smells bad. The change is complex. And if kthread_bind_mask() can't work as expected here, the change I prefer would be: diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 4056f2a3f9d5..1ad8aef5fe98 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1862,6 +1862,12 @@ static void worker_attach_to_pool(struct worker *worker, { mutex_lock(&wq_pool_attach_mutex); + /* + * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any + * online CPUs. It'll be re-applied when any of the CPUs come up. + */ + set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + /* * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains * stable across this function. See the comments above the flag @@ -1872,9 +1877,6 @@ static void worker_attach_to_pool(struct worker *worker, else kthread_set_per_cpu(worker->task, pool->cpu); - if (worker->rescue_wq) - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); - list_add_tail(&worker->node, &pool->workers); worker->pool = pool; @@ -1952,7 +1954,7 @@ static struct worker *create_worker(struct worker_pool *pool) goto fail; set_user_nice(worker->task, pool->attrs->nice); - kthread_bind_mask(worker->task, pool->attrs->cpumask); + worker->flags |= PF_NO_SETAFFINITY; /* successful, attach the worker to the pool */ worker_attach_to_pool(worker, pool); @@ -4270,7 +4272,7 @@ static int init_rescuer(struct workqueue_struct *wq) } wq->rescuer = rescuer; - kthread_bind_mask(rescuer->task, cpu_possible_mask); + rescuer->flags |= PF_NO_SETAFFINITY; wake_up_process(rescuer->task); return 0; It is untested. It effectively reverts the commit 640f17c82460e ("workqueue: Restrict affinity change to rescuer"). It avoids using kthread_bind_mask().