Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3460243imu; Fri, 18 Jan 2019 10:45:45 -0800 (PST) X-Google-Smtp-Source: ALg8bN6IzZSu0SJYlfpVOF2ynBsMcthBJt8KWjcWqPNWuOeo7Nsv0+1VuZEkUesqlgW8S/YUgblj X-Received: by 2002:a17:902:850c:: with SMTP id bj12mr19970348plb.46.1547837144952; Fri, 18 Jan 2019 10:45:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547837144; cv=none; d=google.com; s=arc-20160816; b=W5owXBwcWv5QgNBPlgcyLqsz+WH4/QrVorwIjriROCIYbHMxtHyAWYZCoF/TQzuIWN yokLA9EWkPxCetVZyk9eMi68VPBBifzR8tmst4wYkR7sfduGVH7n5VEIDm3IGd8mF3c5 +hPXOh1F7ezXzqI/+CV77I/Kwx2Trnq37+MfjsnA4deI16SrBRBHPwqAh0P+hS1WhCBn QD/JA0k9XW9gXVIYj52J2bVYFLdWmC+vWqzz7IleITGXdKkS4PTE18ZlL3T/kqJaXJaW LH95Y/jqTe1Km+Mfp89UZqmGkKo6u5qBP4ICmpMvRxsk1qwOfFHCmUw62YYMB4UPwSbu 67Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=npe1ieDjm263/B2NTZOjdeR5wj0sxGNNGEfjAzD1woc=; b=L5jFgBGA5a4+OfRbjh7RnpIazRKbqv3BCEWfg1kbhNkxbKL5GK1+bROfi5d+mnW03k ItC6z0vQXQMlMroHWC63Skl7BBLoa4v2jIFqNSOK6wxdvo1wLNg5iN3+nbLDAruyBiWN RaFnvE1YTUCPSoFjU88rMzXye3D363p8sEPdswFsV8a9vqwjN4TSQq/AfJ+0Gsfipbbg Fptv/UKedyB2lrrpJ6/baWAVgle+YexrjUXp1XjvBXeZ86kO8hntPJ366Nr9SeSv7yqS xH4gDQPYBM9HG5fvTX2yS09R5njCpS/yuEUZGxG+ToQ2PnAV+z3o8ot4iS0HCe+C6bD5 DI7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BBN4i9jv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e11si5150052pfh.147.2019.01.18.10.45.25; Fri, 18 Jan 2019 10:45:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=BBN4i9jv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728829AbfARSoH (ORCPT + 99 others); Fri, 18 Jan 2019 13:44:07 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:43955 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728592AbfARSoH (ORCPT ); Fri, 18 Jan 2019 13:44:07 -0500 Received: by mail-wr1-f68.google.com with SMTP id r10so16236621wrs.10; Fri, 18 Jan 2019 10:44:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=npe1ieDjm263/B2NTZOjdeR5wj0sxGNNGEfjAzD1woc=; b=BBN4i9jvq4T9ua/h6BIwPQ6Tas91zEMg8YA3pqH2l8xC6HXoFmIP84Kbx1KLdS4P20 8BVOdZitziQH8R4fm+wQPWoa8+PHLH+ETWUrlcgOzPuU/4dSzCW+ObjBs89T/3LAernF LNjaYox9ts1aicVguoKMGHLOR4d8n7rrM4syQsvJKh9s5U12r9syV97T9P3r8wufaNla pEL/TeTGTaxe4oGhJxy3M5bMvqQ4QIZk0W0SAs0ouyIYNNGW6Hu30YBoe5demoXLz7aJ ZWT2Vt9dxkt0BQLVb0y9t8wE9HGi/RUk7VIWGW+3SqSVrouABvf9k3sblEeJHcUcjmZF jvVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=npe1ieDjm263/B2NTZOjdeR5wj0sxGNNGEfjAzD1woc=; b=Ylm2yy6PkhzRF+tk7ayIsDcFBx/F7xqJXaAwiN426bhRAf8SEs157zXySHvAb6svRU Kq4xMryeYOluNVnoKe0rClK8F6Rp7yBLe8ryOf0gV8f6ib/1ffohpOTbpRanewaEvsz9 eQGI2E2QJyf1Qr7zBj7ZGajSHxIRLxThpUp5sUgz5rC6+KEOLpHfrYTj/Ds/wfh6UwrA Hw6qia+i6jbh9uc7CKXzVoenqOf8nDfO9kPZUtewnrhyNbye2MzMPt6+i+3gyiqGqiKN 0Nww7sKM5yiP+vcyaodDjGF3m0w6ieILtwuPDeoumJlDULXx2POV5DImKTalcqw54ELt ugPQ== X-Gm-Message-State: AJcUukce7Qb5Ic0YkHF//x8v5j2w7ZDCFt6jWy65mBV8JJCRJpZ5TFkZ LXTEdHp/3bJtWSDpqq0NKw== X-Received: by 2002:adf:b243:: with SMTP id y3mr18331382wra.184.1547837044764; Fri, 18 Jan 2019 10:44:04 -0800 (PST) Received: from localhost (host89-130-dynamic.43-79-r.retail.telecomitalia.it. [79.43.130.89]) by smtp.gmail.com with ESMTPSA id g188sm52222185wmf.32.2019.01.18.10.44.03 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 18 Jan 2019 10:44:04 -0800 (PST) Date: Fri, 18 Jan 2019 19:44:03 +0100 From: Andrea Righi To: Josef Bacik Cc: Tejun Heo , Li Zefan , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/3] cgroup: fsio throttle controller Message-ID: <20190118184403.GB1535@xps-13> References: <20190118103127.325-1-righi.andrea@gmail.com> <20190118163530.w5wpzpjkcnkektsp@macbook-pro-91.dhcp.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190118163530.w5wpzpjkcnkektsp@macbook-pro-91.dhcp.thefacebook.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 18, 2019 at 11:35:31AM -0500, Josef Bacik wrote: > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > > This is a redesign of my old cgroup-io-throttle controller: > > https://lwn.net/Articles/330531/ > > > > I'm resuming this old patch to point out a problem that I think is still > > not solved completely. > > > > = Problem = > > > > The io.max controller works really well at limiting synchronous I/O > > (READs), but a lot of I/O requests are initiated outside the context of > > the process that is ultimately responsible for its creation (e.g., > > WRITEs). > > > > Throttling at the block layer in some cases is too late and we may end > > up slowing down processes that are not responsible for the I/O that > > is being processed at that level. > > How so? The writeback threads are per-cgroup and have the cgroup stuff set > properly. So if you dirty a bunch of pages, they are associated with your > cgroup, and then writeback happens and it's done in the writeback thread > associated with your cgroup and then that is throttled. Then you are throttled > at balance_dirty_pages() because the writeout is taking longer. Right, writeback is per-cgroup and slowing down writeback affects only that specific cgroup, but, there are cases where other processes from other cgroups may require to wait on that writeback to complete before doing I/O (for example an fsync() to a file shared among different cgroups). In this case we may end up blocking cgroups that shouldn't be blocked, that looks like a priority-inversion problem. This is the problem that I'm trying to address. > > I introduced the blk_cgroup_congested() stuff for paths that it's not easy to > clearly tie IO to the thing generating the IO, such as readahead and such. If > you are running into this case that may be something worth using. Course it > only works for io.latency now but there's no reason you can't add support to it > for io.max or whatever. IIUC blk_cgroup_congested() is used in readahead I/O (and swap with memcg), something like this: if the cgroup is already congested don't generate extra I/O due to readahead. Am I right? > > > > > = Proposed solution = > > > > The main idea of this controller is to split I/O measurement and I/O > > throttling: I/O is measured at the block layer for READS, at page cache > > (dirty pages) for WRITEs, and processes are limited while they're > > generating I/O at the VFS level, based on the measured I/O. > > > > This is what blk_cgroup_congested() is meant to accomplish, I would suggest > looking into that route and simply changing the existing io controller you are > using to take advantage of that so it will actually throttle things. Then just > sprinkle it around the areas where we indirectly generate IO. Thanks, Absolutely, I can probably use blk_cgroup_congested() as a method to determine when a cgroup should be throttled (instead of doing my own I/O measuring), but to prevent the "slow writeback slowing down other cgroups" issue I still need to apply throttling when pages are dirtied in page cache. Thanks, -Andrea