Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp2915590pxb; Sun, 8 Nov 2020 19:00:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJzFhys3t02S2NK6y1QN1985n5CT80gBI5RReKR8lSSKEdNgqulG1w/r3MxycISoMm/n4sDt X-Received: by 2002:a50:930a:: with SMTP id m10mr13426128eda.288.1604890816953; Sun, 08 Nov 2020 19:00:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1604890816; cv=none; d=google.com; s=arc-20160816; b=QoQAMMCUv4CJB4ZzftbGtciXFy8nyrHpta/TlhQFzuExvq2S80xzkuzMYOzEOWCZlf o+LC17kRmRTu1GZrsmYFjswOi3rmPjm7Vi0OrHFUfxHBC6i/FRAAPjlFrlTlnl+3PzG4 iNV+icKnWp8VesS1ProZL08bKiYAsJuSpGnMFXjMKj4+u/00Bav9RZJJLbgSIDoKkPpc 6vtUaVApfh/0IMj9J7YgR4uJJlrsi/4Mx6wMgN6xKNCjVh2l8VLRKm7U71MZwE8Khv3M 8KontC248B7qnyu+lgvt5Hl1GDL8s33N9W3MCMSx69pxmdMXDWjbRoQVIPXiQVCH5xTV 9/gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:cc:cc:subject:date:to:to :from; bh=lIRVIDrUAy6tn+Y2kH4vwiwl6lrmdhWHZZVoDS3bTxs=; b=DIAolLqWIw4qbzKULXNDu1Tub9dxIEurYr4N/LO7v3fSm3a9ib3eqDRe7YbPGgnfSJ d1q8xUp4NExzzRrnKLM+jJAIVPlUHIT/5SRK9dLizQA2+o1McisGVU4QqOvTkXy0zM7m nr/5hWVUca2lDS/CqIrV0M4iwjJKuVy3aU0RhwMKzUwO6VAzkrY0SU+93KNDou37Adgc ZbzlfJVCyTz3ILDqDGI18w7WdKhYbCasQF3Hq/TfTTyh98u8oEXBii1BcjGHtjx9+Fdj WUAIVeeClyj+T+8+SqSMJ8peYUvoZbPaYt7TgN6v75wCGHjvz7WuFavHOoLZAzulGrX5 YQ6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gt13si6090048ejb.611.2020.11.08.18.59.37; Sun, 08 Nov 2020 19:00:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729050AbgKICzI (ORCPT + 99 others); Sun, 8 Nov 2020 21:55:08 -0500 Received: from mx2.suse.de ([195.135.220.15]:55560 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727979AbgKICzH (ORCPT ); Sun, 8 Nov 2020 21:55:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "To" Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D1A38ABF4; Mon, 9 Nov 2020 02:55:05 +0000 (UTC) From: NeilBrown To: Tejun Heo , Lai Jiangshan To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Date: Mon, 09 Nov 2020 13:54:59 +1100 Subject: [PATCH rfc] workqueue: honour cond_resched() more effectively. cc: Trond Myklebust , Michal Hocko cc: linux-kernel@vger.kernel.org Message-ID: <87v9efp7cs.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable If a worker task for a normal (bound, non-CPU-intensive) calls cond_resched(), this allows other non-workqueue processes to run, but does *not* allow other workqueue workers to run. This is because workqueue will only attempt to run one task at a time on each CPU, unless the current task actually sleeps. This is already a problem for should_reclaim_retry() in mm/page_alloc.c, which inserts a small sleep just to convince workqueue to allow other workers to run. It can also be a problem for NFS when closing a very large file (e.g. 100 million pages in memory). NFS can call the final iput() from a workqueue, which can then take long enough to trigger a workqueue-lockup warning, and long enough for performance problems to be observed. I added a WARN_ON_ONCE() in cond_resched() to report when it is run from a workqueue, and got about 20 hits during boot, many of system_wq (aka "events") which suggests there is a real chance that worker are being delayed unnecessarily be tasks which are written to politely relinquish the CPU. I think that once a worker calls cond_resched(), it should be treated as though it was run from a WQ_CPU_INTENSIVE queue, because only cpu-intensive tasks need to call cond_resched(). This would allow other workers to be scheduled. The following patch achieves this I believe. Signed-off-by: NeilBrown =2D-- include/linux/sched.h | 7 ++++++- include/linux/workqueue.h | 2 ++ kernel/sched/core.c | 4 ++++ kernel/workqueue.c | 16 +++++++++++++++- mm/page_alloc.c | 12 +----------- 5 files changed, 28 insertions(+), 13 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 4418f5cb8324..728870965df1 100644 =2D-- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1784,7 +1784,12 @@ static inline int test_tsk_need_resched(struct task_= struct *tsk) #ifndef CONFIG_PREEMPTION extern int _cond_resched(void); #else =2Dstatic inline int _cond_resched(void) { return 0; } +static inline int _cond_resched(void) +{ + if (current->flags & PF_WQ_WORKER) + workqueue_cond_resched(); + return 0; +} #endif =20 #define cond_resched() ({ \ diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 8b505d22fc0e..7bcc5717d80f 100644 =2D-- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -626,6 +626,8 @@ static inline bool schedule_delayed_work(struct delayed= _work *dwork, return queue_delayed_work(system_wq, dwork, delay); } =20 +void workqueue_cond_resched(void); + #ifndef CONFIG_SMP static inline long work_on_cpu(int cpu, long (*fn)(void *), void *arg) { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9a2fbf98fd6f..5b2e38567a0c 100644 =2D-- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5620,6 +5620,8 @@ SYSCALL_DEFINE0(sched_yield) #ifndef CONFIG_PREEMPTION int __sched _cond_resched(void) { + if (current->flags & PF_WQ_WORKER) + workqueue_cond_resched(); if (should_resched(0)) { preempt_schedule_common(); return 1; @@ -5643,6 +5645,8 @@ int __cond_resched_lock(spinlock_t *lock) int resched =3D should_resched(PREEMPT_LOCK_OFFSET); int ret =3D 0; =20 + if (current->flags & PF_WQ_WORKER) + workqueue_cond_resched(); lockdep_assert_held(lock); =20 if (spin_needbreak(lock) || resched) { diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 891ccad5f271..fd2e9557b1a6 100644 =2D-- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -2296,7 +2296,7 @@ __acquires(&pool->lock) spin_lock_irq(&pool->lock); =20 /* clear cpu intensive status */ =2D if (unlikely(cpu_intensive)) + if (unlikely(worker->flags & WORKER_CPU_INTENSIVE)) worker_clr_flags(worker, WORKER_CPU_INTENSIVE); =20 /* tag the worker for identification in schedule() */ @@ -5330,6 +5330,20 @@ int workqueue_set_unbound_cpumask(cpumask_var_t cpum= ask) return ret; } =20 +void workqueue_cond_resched(void) +{ + struct worker *worker =3D current_wq_worker(); + + if (worker && !(worker->flags & WORKER_CPU_INTENSIVE)) { + struct worker_pool *pool =3D worker->pool; + + worker_set_flags(worker, WORKER_CPU_INTENSIVE); + if (need_more_worker(pool)) + wake_up_worker(pool); + } +} +EXPORT_SYMBOL(workqueue_cond_resched); + #ifdef CONFIG_SYSFS /* * Workqueues with WQ_SYSFS flag set is visible to userland via diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 13cc653122b7..0d5720ecbfe7 100644 =2D-- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4420,17 +4420,7 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, } =20 out: =2D /* =2D * Memory allocation/reclaim might be called from a WQ context and the =2D * current implementation of the WQ concurrency control doesn't =2D * recognize that a particular WQ is congested if the worker thread is =2D * looping without ever sleeping. Therefore we have to do a short sleep =2D * here rather than calling cond_resched(). =2D */ =2D if (current->flags & PF_WQ_WORKER) =2D schedule_timeout_uninterruptible(1); =2D else =2D cond_resched(); + cond_resched(); return ret; } =20 =2D-=20 2.29.2 --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQJCBAEBCAAsFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAl+or4MOHG5laWxiQHN1 c2UuZGUACgkQOeye3VZigbm8fw//QQj2OJJmdYIwlK74vrB/MWouaOnNA8fEj1u+ 1K8XzrVpmFFUmZjldX6VKEbubPy6XfTu6+AMKrJakWes+8XNIs4XJRgNXx44ZVdx B8KIrTr3ZoDsb4egwmlwy2LlbpqS5b5HBnDqkeOa6d4HeHXC+0w97/TgE0VRF6EI waWyLVZq7booxf99KASi34nNGcidsVcVFmwV0HVTrgfiVpVP612mb/FliOo7/0it ABs8LEq9dnlNiQhnsSergl0imrKBFY8K1cdxUe0ay4dlAas0X1ksVqDW9ozrLUrb +7N7vpa3VMrPth9mxq3xPYwgJxznxm5j3fBvw5+sxI/zqjpyG5qOPAmnQ2SwpoMF wvc6jLebAEfW0e5P/jv4EIbbRZfhI800kPONPw5NS9PAir3sMBCyAhiC/ks/py5J /n1ij3EmdWN1dWSFX1oKhBrUnFlPJDnrsLgl3OG/xYmVKx1570LUSeIoy4yAdT8u DMBy9CIdt9gefKSKo5ReeHiXHFOsU45ekwttsADUISU7HZa5r0E6qYTJtMgZZhUF uUhJdD5rLUbe/n8BrG7KoesHbCsLZNTU1J5LED+bWvEgOvVHyk50Glp18yPAFoMI gEVDzEc3Y7TFL2Zvz2BP/MUL5UVZRE+TvCJzZ8v9rjN30+HB7sdrsANsFU5YVk/P CkdwRsY= =ONf0 -----END PGP SIGNATURE----- --=-=-=--