Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp5694660imb; Thu, 7 Mar 2019 23:39:26 -0800 (PST) X-Google-Smtp-Source: APXvYqwe6uPLbHcB4GV+srcEZR2A2zYrjv5OFtlUWb0Hl7IIyTLJ4dLKLjBscDc7GSJtFMjZLefp X-Received: by 2002:a62:569b:: with SMTP id h27mr16745277pfj.163.1552030766345; Thu, 07 Mar 2019 23:39:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1552030766; cv=none; d=google.com; s=arc-20160816; b=cRODwDZPQk8dRIq/86SzF2mylVZ4yP/GcqDInMipDGawfMcRmvimhQ5OAbOYFPZ9CM WT5nft0UC8pmD23C6dOWXnRSTl06B1/hngMKpN8XhxgVLUCe1glctj+9I3vC4aoXHoFM KBX/e/HuZybhrMwD0ErMnTpdEZ/kA5fivI0BSvolGFsDZF+0Id7opflv2+bIS/FfbCJA uuQ1sGupRVZhzyECEfNVDsDvcbK7KvdTaGEqB6J+ej9bQfqz777dIkQi2cd13zzuHD/e OaZI5+/uemGqoMAEfiBsAsX0G+UD2kk4ZSAWA3IGqmGIpMGUaixF6tk9K4Soc3x4Fxsz yD8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=1XOvH1ZwcKfdCsQmFCknNIDKEgX2lwdmfDpwL/zvWFU=; b=GFlibXn/4kS3wHdU+Pjxtyvmq2Hr4mVIctWjFJgtESygsk64Bd2sY2GNQp3KlYCOpv TCYgoS4Cb759aJzX0SonsCuNFsB4zY8X9Eqz3mPFEE9tiPmTWKAm8bsZfN5qle4YkMOU KbjG6WLl0QaU4kEFi6NYTTq5TZf2yZoNhcRUxHp05ctt/ZBR1GU712GQAxT4xvVtSXn/ crh0U7CPVEbLgnJhxHhu7DlDEN+xL6cc/Sx6/dsveAASlzJGhz9IrjgS9eE4myFsiM60 qjH7S/56+w2Z6cVTRK/6Wnoa31hO6h6+UshAHyC66s6iNoBoQDpngi7+gko8c82Q2kKj dWwQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 34si6460988plf.43.2019.03.07.23.39.10; Thu, 07 Mar 2019 23:39:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726313AbfCHHis (ORCPT + 99 others); Fri, 8 Mar 2019 02:38:48 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:37222 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725776AbfCHHis (ORCPT ); Fri, 8 Mar 2019 02:38:48 -0500 Received: from mail-wm1-f71.google.com ([209.85.128.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h2A5d-0002zs-V5 for linux-kernel@vger.kernel.org; Fri, 08 Mar 2019 07:38:45 +0000 Received: by mail-wm1-f71.google.com with SMTP id y66so3880490wmd.0 for ; Thu, 07 Mar 2019 23:38:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=1XOvH1ZwcKfdCsQmFCknNIDKEgX2lwdmfDpwL/zvWFU=; b=ThQIE6PHbsrqznHdMGYPGIlxeWEclJZkCWvR2iwt3eP7f+36FpW8xdm9Nuw5Z1SinR xw9K7IlO6583i2/vR/Y2t5ktaXeQKtE8TiKM2e8LmW7pX+QhObTq/basrlI/pYmpcHFV QDuTY5fg3BBGLDvbTU1hJB54MnEYCTZ5aa4EL6r2iPQ/fnmN5UlsU9UFotXgVeAx078N 3XAYqDF3UNdXK/8vRCNohQAzDPUoMRiMqDOkzUtMyNMtA+IbKNNY3wwqwJTY4GvNaPtK KJf3bYayjymXSvBEN1oAveTqCw8ahPCLeJVdPVtHg0cCRF+5beKFdmqGBjIB58x/56lj T5vQ== X-Gm-Message-State: APjAAAVgfRuvxFnjTvQUEtL6jMy7Vd8wCxhT0Ej9yFkka86PFWc5Fb3p VSctEhC8mGCs653o2pALZsqaliBbK3OfIhyJIIHS3WvFnlB7QdHbjUnzgQ6EVrutgl9P7pTSZ6S 0kFyNqwol8CTXvqu3oKAEhSh0aJ153BdAZpgW3jdWSw== X-Received: by 2002:a1c:80d6:: with SMTP id b205mr8479022wmd.109.1552030725617; Thu, 07 Mar 2019 23:38:45 -0800 (PST) X-Received: by 2002:a1c:80d6:: with SMTP id b205mr8478997wmd.109.1552030725252; Thu, 07 Mar 2019 23:38:45 -0800 (PST) Received: from localhost (host22-124-dynamic.46-79-r.retail.telecomitalia.it. [79.46.124.22]) by smtp.gmail.com with ESMTPSA id y1sm8080826wrh.65.2019.03.07.23.38.44 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 07 Mar 2019 23:38:44 -0800 (PST) Date: Fri, 8 Mar 2019 08:38:43 +0100 From: Andrea Righi To: Josef Bacik Cc: Tejun Heo , Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 1/3] blkcg: prevent priority inversion problem during sync() Message-ID: <20190308073843.GA9732@xps-13> References: <20190307180834.22008-1-andrea.righi@canonical.com> <20190307180834.22008-2-andrea.righi@canonical.com> <20190307221051.ruhpp73q6ek2at3d@macbook-pro-91.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190307221051.ruhpp73q6ek2at3d@macbook-pro-91.dhcp.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 07, 2019 at 05:10:53PM -0500, Josef Bacik wrote: > On Thu, Mar 07, 2019 at 07:08:32PM +0100, Andrea Righi wrote: > > Prevent priority inversion problem when a high-priority blkcg issues a > > sync() and it is forced to wait the completion of all the writeback I/O > > generated by any other low-priority blkcg, causing massive latencies to > > processes that shouldn't be I/O-throttled at all. > > > > The idea is to save a list of blkcg's that are waiting for writeback: > > every time a sync() is executed the current blkcg is added to the list. > > > > Then, when I/O is throttled, if there's a blkcg waiting for writeback > > different than the current blkcg, no throttling is applied (we can > > probably refine this logic later, i.e., a better policy could be to > > adjust the throttling I/O rate using the blkcg with the highest speed > > from the list of waiters - priority inheritance, kinda). > > > > Signed-off-by: Andrea Righi > > --- > > block/blk-cgroup.c | 131 +++++++++++++++++++++++++++++++ > > block/blk-throttle.c | 11 ++- > > fs/fs-writeback.c | 5 ++ > > fs/sync.c | 8 +- > > include/linux/backing-dev-defs.h | 2 + > > include/linux/blk-cgroup.h | 23 ++++++ > > mm/backing-dev.c | 2 + > > 7 files changed, 178 insertions(+), 4 deletions(-) > > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > > index 2bed5725aa03..4305e78d1bb2 100644 > > --- a/block/blk-cgroup.c > > +++ b/block/blk-cgroup.c > > @@ -1351,6 +1351,137 @@ struct cgroup_subsys io_cgrp_subsys = { > > }; > > EXPORT_SYMBOL_GPL(io_cgrp_subsys); > > > > +#ifdef CONFIG_CGROUP_WRITEBACK > > +struct blkcg_wb_sleeper { > > + struct backing_dev_info *bdi; > > + struct blkcg *blkcg; > > + refcount_t refcnt; > > + struct list_head node; > > +}; > > + > > +static DEFINE_SPINLOCK(blkcg_wb_sleeper_lock); > > +static LIST_HEAD(blkcg_wb_sleeper_list); > > + > > +static struct blkcg_wb_sleeper * > > +blkcg_wb_sleeper_find(struct blkcg *blkcg, struct backing_dev_info *bdi) > > +{ > > + struct blkcg_wb_sleeper *bws; > > + > > + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node) > > + if (bws->blkcg == blkcg && bws->bdi == bdi) > > + return bws; > > + return NULL; > > +} > > + > > +static void blkcg_wb_sleeper_add(struct blkcg_wb_sleeper *bws) > > +{ > > + list_add(&bws->node, &blkcg_wb_sleeper_list); > > +} > > + > > +static void blkcg_wb_sleeper_del(struct blkcg_wb_sleeper *bws) > > +{ > > + list_del_init(&bws->node); > > +} > > + > > +/** > > + * blkcg_wb_waiters_on_bdi - check for writeback waiters on a block device > > + * @blkcg: current blkcg cgroup > > + * @bdi: block device to check > > + * > > + * Return true if any other blkcg different than the current one is waiting for > > + * writeback on the target block device, false otherwise. > > + */ > > +bool blkcg_wb_waiters_on_bdi(struct blkcg *blkcg, struct backing_dev_info *bdi) > > +{ > > + struct blkcg_wb_sleeper *bws; > > + bool ret = false; > > + > > + spin_lock(&blkcg_wb_sleeper_lock); > > + list_for_each_entry(bws, &blkcg_wb_sleeper_list, node) > > + if (bws->bdi == bdi && bws->blkcg != blkcg) { > > + ret = true; > > + break; > > + } > > + spin_unlock(&blkcg_wb_sleeper_lock); > > + > > + return ret; > > +} > > No global lock please, add something to the bdi I think? Also have a fast path > of OK, I'll add a list per-bdi and a lock as well. > > if (list_empty(blkcg_wb_sleeper_list)) > return false; OK. > > we don't need to be super accurate here. Thanks, > > Josef Thanks, -Andrea