Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1142294imm; Wed, 23 May 2018 10:57:11 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqaFLR7bRokAIjCsxHnRAIiXrCxJ4hK9E/mPi3bhgxCmgpoeTWwcoRuyNDAoTA7G0riH9Fn X-Received: by 2002:a17:902:26a:: with SMTP id 97-v6mr3939605plc.367.1527098231881; Wed, 23 May 2018 10:57:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527098231; cv=none; d=google.com; s=arc-20160816; b=fMtBtA6mF3FJLWWfeFfK2u1/ZgEJ3UOp4bkGOEcdwY2WSYhpAOU8Sa1dYdTSmOFvFO NLFGca92foNLSpg/bIdAcQxjLxzenW9ZXvc+GOno29GN+9/XfmbFG6A4A2W8HZbnBQ+u 7yxF1cyE0XgV05FuJo4AOMOfWTPKpJ+5WjNeT26ZXZRRuAlpOjq7SOr1B8SAzJwbDJ1f QALRfQIMxtEUiENb6ftQzcsRhvSTIZ4xlORfe9HEAY0iuIkJ3Fox9WrT1u2k34+1U4J8 PMGr6D/O3E2wwYDXgK9t1umSIg6U2lpCKU49Wt4MeC4XL9us6ysR4RBniTlHidhUlTcd wEtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:cc:to:from:date:dkim-signature :arc-authentication-results; bh=+xBxV0uU5n0JFRj1k+49v1e0+Sdx+1bvtMqn1qz1TKU=; b=f+a5F/ze+VdsxquJORqdvZTpjfAEs7H4j+q8Sa0EBMwntIa/bZLanbzNK8qcz0J3PP +9fcgeLFJZMDE5aJBdv3aosFcSW2kKSJsyt2cRrukDIaCfgOP5MaPdHArurkLTxZebyg OUQ4yaQloZaYt/viMnZ8iyB7SmtlWtsHwOX43UUQ43Xnt2umjkVcCqKj8V3tCHLRr42A CDMo3xd/N63kwHyVlj3u/LnqAGliAA5KAKPH05V3bxszFANSUpbHQixdc9gdj+yAtqBw 7P6UU25t4K131RSCEB63rcRyqEn+5jyQtqe9m0sh3wqw8h6qxedppI0vcpIMZdvevMpr vy/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=FDEB8LTB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u198-v6si2782418pgb.136.2018.05.23.10.56.56; Wed, 23 May 2018 10:57:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=FDEB8LTB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933066AbeEWR4k (ORCPT + 99 others); Wed, 23 May 2018 13:56:40 -0400 Received: from mail-yw0-f195.google.com ([209.85.161.195]:37658 "EHLO mail-yw0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754623AbeEWR4g (ORCPT ); Wed, 23 May 2018 13:56:36 -0400 Received: by mail-yw0-f195.google.com with SMTP id u83-v6so6978746ywc.4 for ; Wed, 23 May 2018 10:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=+xBxV0uU5n0JFRj1k+49v1e0+Sdx+1bvtMqn1qz1TKU=; b=FDEB8LTBiWC2swPROrcVPNAFu6RYwFFK0LUeTmDmjicAVwXBO6Z21bmgSA4OziRFxY 8SU4m8yXCE1laEjWDAf/Xz5GBWbYjRAbBFTOzGhmJ53PwiMtEHv8t9rFxExFeH/OnhNA zjaePm1soMjCd+tfabc2wFdJNm6JyWwRd1OXcKjmcdpxRlVlYPhbiT8I/zjNtEnfv1bm 4op6bEEmAz2YkTjUntlDgNZpQPB3lzaqfpXBaRNqe82uzNa0aX0WDPSiOojAfYMwtKTe x9gy4hRb3kc59cD9uHgmuOOVVl/n24Ei9i3uuLELmeh2MjmIOv6dMvr4weCzC/MtrUsc Z1xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :mime-version:content-disposition:user-agent; bh=+xBxV0uU5n0JFRj1k+49v1e0+Sdx+1bvtMqn1qz1TKU=; b=neHKyGuRMP+DTD6E6EN3RFhLwmIS/q/whU/jR3OPdTgj3ZxLDtGdCaTON89niiHm2l gxI1DMCirVAfp8LmBQ0yQgZ27ZX8Aa3jocc5tOb0Xey3lbEX9oLZD15sJsPEnPdeE+ZL ps86YdOu9G3BxYq8NUmnldX2/KOeZ1b8O3scslolNqRzh+CbPlxRuOtX06prtkJYYLC1 Z1Y81bO8e9DYmRX3W+WJTcvNMM1BzogUe4wQorlDEQXWUPTUXYw5wMGY7L6GCqT3jomv BVc+IPa/b8lyLlvgooiHecWdXdhX3XyodYv2qyhRBKApprNVvG5eqNosUzXPiMSVQzxR T9ZQ== X-Gm-Message-State: ALKqPwdbn1tNXGTxHaKugk87+jYChuLQjABzaAbcpTHd+vF1X2ksJCfF Fj4b4G8XQ92J72cINgihBi0= X-Received: by 2002:a81:b1c4:: with SMTP id p187-v6mr1999296ywh.282.1527098195996; Wed, 23 May 2018 10:56:35 -0700 (PDT) Received: from localhost ([2620:10d:c091:180::1:8428]) by smtp.gmail.com with ESMTPSA id z128-v6sm9708583ywg.16.2018.05.23.10.56.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 May 2018 10:56:34 -0700 (PDT) Date: Wed, 23 May 2018 10:56:32 -0700 From: Tejun Heo To: Jens Axboe Cc: linux-kernel@vger.kernel.org, "Paul E. McKenney" , Jan Kara , Andrew Morton , kernel-team@fb.com Subject: [PATCH] bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue Message-ID: <20180523175632.GO1718769@devbig577.frc2.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 23 May 2018 10:29:00 -0700 cgwb_release() punts the actual release to cgwb_release_workfn() on system_wq. Depending on the number of cgroups or block devices, there can be a lot of cgwb_release_workfn() in flight at the same time. We're periodically seeing close to 256 kworkers getting stuck with the following stack trace and overtime the entire system gets stuck. [] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330 [] synchronize_rcu_expedited+0x24/0x30 [] bdi_unregister+0x53/0x290 [] release_bdi+0x89/0xc0 [] wb_exit+0x85/0xa0 [] cgwb_release_workfn+0x54/0xb0 [] process_one_work+0x150/0x410 [] worker_thread+0x6d/0x520 [] kthread+0x12c/0x160 [] ret_from_fork+0x29/0x40 [] 0xffffffffffffffff The events leading to the lockup are... 1. A lot of cgwb_release_workfn() is queued at the same time and all system_wq kworkers are assigned to execute them. 2. They all end up calling synchronize_rcu_expedited(). One of them wins and tries to perform the expedited synchronization. 3. However, that invovles queueing rcu_exp_work to system_wq and waiting for it. Because #1 is holding all available kworkers on system_wq, rcu_exp_work can't be executed. cgwb_release_workfn() is waiting for synchronize_rcu_expedited() which in turn is waiting for cgwb_release_workfn() to free up some of the kworkers. We shouldn't be scheduling hundreds of cgwb_release_workfn() at the same time. There's nothing to be gained from that. This patch updates cgwb release path to use a dedicated percpu workqueue with @max_active of 1. While this resolves the problem at hand, it might be a good idea to isolate rcu_exp_work to its own workqueue too as it can be used from various paths and is prone to this sort of indirect A-A deadlocks. Signed-off-by: Tejun Heo Cc: "Paul E. McKenney" Cc: stable@vger.kernel.org --- mm/backing-dev.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 7441bd9..8fe3ebd 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -412,6 +412,7 @@ static void wb_exit(struct bdi_writeback *wb) * protected. */ static DEFINE_SPINLOCK(cgwb_lock); +static struct workqueue_struct *cgwb_release_wq; /** * wb_congested_get_create - get or create a wb_congested @@ -522,7 +523,7 @@ static void cgwb_release(struct percpu_ref *refcnt) { struct bdi_writeback *wb = container_of(refcnt, struct bdi_writeback, refcnt); - schedule_work(&wb->release_work); + queue_work(cgwb_release_wq, &wb->release_work); } static void cgwb_kill(struct bdi_writeback *wb) @@ -784,6 +785,21 @@ static void cgwb_bdi_register(struct backing_dev_info *bdi) spin_unlock_irq(&cgwb_lock); } +static int __init cgwb_init(void) +{ + /* + * There can be many concurrent release work items overwhelming + * system_wq. Put them in a separate wq and limit concurrency. + * There's no point in executing many of these in parallel. + */ + cgwb_release_wq = alloc_workqueue("cgwb_release", 0, 1); + if (!cgwb_release_wq) + return -ENOMEM; + + return 0; +} +subsys_initcall(cgwb_init); + #else /* CONFIG_CGROUP_WRITEBACK */ static int cgwb_bdi_init(struct backing_dev_info *bdi) -- 2.9.5