Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp335111pxb; Wed, 13 Jan 2021 05:04:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJzqdUEG9cfZbNx3nqQ5d8IWRTIgP4w3tf14hPgjq5jf9mp+TOakj3ipnoDkzu67xlNR9nG0 X-Received: by 2002:a17:906:3593:: with SMTP id o19mr1441572ejb.377.1610543075133; Wed, 13 Jan 2021 05:04:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610543075; cv=none; d=google.com; s=arc-20160816; b=XBpd4nHeI14/pLAqpiEdWvIJLtZN7D9WRIu9GnvH9itPLLokcjY+Cly4b6Fx/Tb0Fs p/ALTykpsUA+kXEw6Y2u601VP6gIuizhBOZ3jx03nDpgxQGYxmyRJoWpR3/+2C2962f4 R1AvfIvnrya54RpU39jnzuiOZpgsBUP07uwxIlSn98HmMsMsgiyQl+WdspFWF1mT+ive dZCR6P/Sly67inCDUYBOD2Hc1GRJ+hhlyAL/w7W5a6ykuIfg8mJPcrgfRscszo2RE4OU 7EyIaaa3CuZReCPd5jdaXxthRVVcjMo2Dw68U1cTlxe04s6f70jbz0H5fMUb/mErac6d 1rzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=RBk0UtFJPR5nNgLQkhQoRvICAEhViF6KRQ4QYEqXPE8=; b=XGTj2CQlqne2U2Rt/J+Ny1oO6vkmUpqrBvDTGvIOjIAn26XZI394dA+05lYVgOSobe SdEZxI0MFuNSkehwtw79wdULrxOX/H0UY9BVslmGo4tfkocsjdqcIRiBjWoY1eB8rUMt 9wgsuNnQEBWU7LqlG6JEaHnAhLONcpmYPYRSTmMpvRX9y/uZb3kqKMmBdaWS284qmXhP TSzFp8HSxvhg60YiTEXeUs0EKs5SoQAqrlQ+ZSOFdGZSPNrQiZA0lHR5Q2k6II5A8A0j nsNW427RO2GH9iY1xi84y1rinC0PSjLsawSZkgR+xO89tJxIZ6aJo9qHiSwjXCV/BUw7 KzoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h17si855207ejc.592.2021.01.13.05.04.07; Wed, 13 Jan 2021 05:04:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725912AbhAMNDJ (ORCPT + 99 others); Wed, 13 Jan 2021 08:03:09 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:58540 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725774AbhAMNDJ (ORCPT ); Wed, 13 Jan 2021 08:03:09 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=laijs@linux.alibaba.com;NM=1;PH=DS;RN=13;SR=0;TI=SMTPD_---0ULcnSRR_1610542676; Received: from C02XQCBJJG5H.local(mailfrom:laijs@linux.alibaba.com fp:SMTPD_---0ULcnSRR_1610542676) by smtp.aliyun-inc.com(127.0.0.1); Wed, 13 Jan 2021 20:58:59 +0800 Subject: Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively To: Peter Zijlstra , Lai Jiangshan , Tejun Heo Cc: Valentin Schneider , Thomas Gleixner , LKML , Qian Cai , Vincent Donnefort , Dexuan Cui , Paul McKenney , Vincent Guittot , Steven Rostedt , Jens Axboe References: <20201226025117.2770-1-jiangshanlai@gmail.com> <87o8hv7pnd.fsf@nanos.tec.linutronix.de> From: Lai Jiangshan Message-ID: <7e92d3b2-2323-f608-1090-e2c91aa612ce@linux.alibaba.com> Date: Wed, 13 Jan 2021 20:57:56 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/1/13 19:10, Peter Zijlstra wrote: > On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote: > >> But the hard problem is "how to suppress the warning of >> online&!active in __set_cpus_allowed_ptr()" for late spawned >> unbound workers during hotplug. > > I cannot see create_worker() go bad like that. > > The thing is, it uses: > > kthread_bind_mask(, pool->attr->cpumask) > worker_attach_to_pool() > set_cpus_allowed_ptr(, pool->attr->cpumask) > > which means set_cpus_allowed_ptr() must be a NOP, because the affinity > is already set by kthread_bind_mask(). Further, the first wakeup of that > worker will then hit: > > select_task_rq() > is_cpu_allowed() > is_per_cpu_kthread() -- false > select_fallback_rq() > > > So normally that really isn't a problem. I can only see a tiny hole > there, where someone changes the cpumask between kthread_bind_mask() and > set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways: > > - add wq_pool_mutex around things in create_worker(), or > - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and > into rescuer_thread(). > > Which then brings us to rescuer_thread... If we manage to trigger the > rescuer during hotplug, then yes, I think that can go wobbly. > How about the following idea (not complied, not tested). It does not call set_cpus_allowed_ptr() for just created workers. It does not change cpumask for rescuer except when it is per cpu pool. The only problem is that, unbound rescue worker doesn't comply with wq_unbound_cpumask nor wq->unbound_attrs->cpumask. Another 50 Lines of code can make it complied, but I don't want to type it in email and complicated the idea. diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 9880b6c0e272..df2082283c1e 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -1849,10 +1849,30 @@ static void worker_attach_to_pool(struct worker *worker, mutex_lock(&wq_pool_attach_mutex); /* - * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any - * online CPUs. It'll be re-applied when any of the CPUs come up. + * If we called from create_worker(), we don't need to call + * set_cpus_allowed_ptr() since we just kthread_bind_mask() it. + * + * The only other path gets us here is rescuer_thread(). + * + * When !(pool->flags & POOL_DISASSOCIATED), it is per-cpu pool + * and we should rebind the rescuer worker to the target CPU. + * + * When it is a rescuer worker attaching to unbound pool, we keep + * the affinity for rescuer worker to be cpu_possible_mask. + * + * Note: unbound rescue worker doesn't comply with wq_unbound_cpumask + * nor wq->unbound_attrs->cpumask. The optimal choice is to keep + * the affinity for rescuer worker to be + * wq_unbound_cpumask & wq->unbound_attrs->cpumask + * but there is no reliable way to set it back via + * set_cpus_allowed_ptr() when its affinity is changed by scheduler + * due to CPU hotplug, so we just use cpu_possible_mask for resuer. + * + * set_cpus_allowed_ptr() will not fail since + * !(pool->flags & POOL_DISASSOCIATED) */ - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); + if (worker->rescue_wq && !(pool->flags & POOL_DISASSOCIATED)) + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0); /* * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains @@ -5043,7 +5063,8 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) /* as we're called from CPU_ONLINE, the following shouldn't fail */ for_each_pool_worker(worker, pool) - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0); + if (!worker->rescue_wq) + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0); } int workqueue_prepare_cpu(unsigned int cpu)