Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp42914yba; Mon, 20 May 2019 04:41:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqxdyHV3jUtAsQoQIDEpREtIzCb2ZrNFVxILFMtCN3ucVTKjWmVbVRx8ED0/D2r7uB81yHCn X-Received: by 2002:a62:1cd5:: with SMTP id c204mr40937652pfc.205.1558352494585; Mon, 20 May 2019 04:41:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558352494; cv=none; d=google.com; s=arc-20160816; b=qDKdHp42rnBGptUXVQyG5nhWwc2c6NeU4ug6SQs3C5145v6yqwiigqYz7G9mAKJRhX EK7HxHIcfJfbXi/Ef1Asi5zKIE0vqSyk5voBSGSqamvjjwDqX861ZvCK3aTxedwU7rD5 0pu4IrMb3MF39HcbyCf0EUqnjDoYDuEwPZ878wLnzSPu8h5kRx+h8SAC8vnQkaLDbDyZ bGwn6GJtm+4vmawK0Eu0CvqJvAGPfdETGP7dXWGFc2SDJSwhgYksP8N5a52DIheKfbul iBG0BfOwi/ICn8qpZpjmdQboMiZaqj5UAm72DFF5rs8Lzu9QFcqP65VfxTv2ejsVFvJ2 9tNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=FNQffb8QL8IgGta8QHGMIkJOjNxQ6yOoFcpL2RGNOxo=; b=F+1sl+iIdZtct0KI9ZL/b6raik+62i8+ZpG9Oz8uO74Bayhf9drHs44a3r33vpI8Tu v5DT6mvb0K5Tg8Ixt9PIfw4pDSwkRQMfxw8DiEvsHZyvWGaqM8ekZ8WTCdAIbt929rfc rC8k/zQsfnUTMCtlaFNinxhCjhsDeEhHgaIyibU21ameVPZ4xjvA/VvNb8pm4M5bz7Ta OBx0sCmkfT81+36AHorFcJv7HPvGq87z/Fvahp8QcmYHeFjIbjfLcYqs3LquDaXYki8x l2ITHr3SXqNHv4MvhBRnSxwxWx12pQULRXDzEo2adymeQubhcT1K1pE0+UQT/kckJcSe EpeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Yx3Oh9sd; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q20si20527041pfn.139.2019.05.20.04.41.20; Mon, 20 May 2019 04:41:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Yx3Oh9sd; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731820AbfETKqF (ORCPT + 99 others); Mon, 20 May 2019 06:46:05 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:32959 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731728AbfETKqE (ORCPT ); Mon, 20 May 2019 06:46:04 -0400 Received: by mail-wm1-f68.google.com with SMTP id c66so13981144wme.0 for ; Mon, 20 May 2019 03:46:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=FNQffb8QL8IgGta8QHGMIkJOjNxQ6yOoFcpL2RGNOxo=; b=Yx3Oh9sdjhtxkjy2qUZOvGLEh/omRfY9Mkk8SZgdDDixIStb27nmcn3B7LNLkywYKf JSJs2y09yTFai60dfvSU6bs3U2Y+HZSEgZwWN6UlzEZ1wASRtjIcHxGBtD4y6tsAuqVS 3ujdLIe/3srn1f0SjS94N6OmFmNXyQjA8jYYAeTVbdE87jeskYdVzPBwTuGiV5VXMofg Qcc4TKaT9Wtzyo5PNWXoXjNZN8OY3QNXt/d/SPjmZQCZVqxth/sR5PvwpcgaROR92mOk Z6Us/2nTa0Du6iMydUy7SM5zXTl+W32h+hkN9oHJisAKE0vA0TrVU7YYUAkZEQA//CVY df6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=FNQffb8QL8IgGta8QHGMIkJOjNxQ6yOoFcpL2RGNOxo=; b=dK1RLoBUNP7lT6zs/B3R8Gkuopif0LEkDwyNi/dzpXbNsDppP8090W2OVHDa7P1CzG 9j9Km3vHWSyixOdrEa/eEVr+EHiBNNm0ZlGuU7JgsLFT+op3UNbo4C3g49DhtAO/FIug gq/xwTXDVaNrznV8FxTnhKKpTdiXRDpqOWZ43MKHdbDsBrMJQyp1aK5gJvY9IVTn8l+3 OhMrMBaKaPB9qnir4Wi9Hk1TIEvYU7B7v+czPkEFZGgTHvpIpZF5cjNA7q93RyJc+qmO piwqxn7KrLKcbWVq7vzdu6hsvlE9GGd+bOvHR/IQBnVsSFVYuCXJEUibFXos4fzwh2n0 fQZg== X-Gm-Message-State: APjAAAXcq2FdPmN+z/wET6L33pLDRWIsTzyqdYFXxtvfTcgjjqFuFniD wiTcOWjZ0fiohyTOLYDg6c9OAg== X-Received: by 2002:a1c:385:: with SMTP id 127mr11200126wmd.109.1558349161708; Mon, 20 May 2019 03:46:01 -0700 (PDT) Received: from [192.168.0.100] ([88.147.73.106]) by smtp.gmail.com with ESMTPSA id x187sm18555952wmb.33.2019.05.20.03.45.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 20 May 2019 03:46:00 -0700 (PDT) From: Paolo Valente Message-Id: <1C0A2FC8-620C-4AFE-A921-35EDAC377BD4@linaro.org> Content-Type: multipart/signed; boundary="Apple-Mail=_695B9A4F-9A68-4C92-B622-6C792D193B9F"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller Date: Mon, 20 May 2019 12:45:58 +0200 In-Reply-To: <20190520091558.GC2172@quack2.suse.cz> Cc: Theodore Ts'o , "Srivatsa S. Bhat" , linux-fsdevel@vger.kernel.org, linux-block , linux-ext4@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, axboe@kernel.dk, jmoyer@redhat.com, amakhalov@vmware.com, anishs@vmware.com, srivatsab@vmware.com To: Jan Kara References: <8d72fcf7-bbb4-2965-1a06-e9fc177a8938@csail.mit.edu> <1812E450-14EF-4D5A-8F31-668499E13652@linaro.org> <20190518192847.GB14277@mit.edu> <20190520091558.GC2172@quack2.suse.cz> X-Mailer: Apple Mail (2.3445.104.8) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org --Apple-Mail=_695B9A4F-9A68-4C92-B622-6C792D193B9F Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii > Il giorno 20 mag 2019, alle ore 11:15, Jan Kara ha scritto: > > On Sat 18-05-19 15:28:47, Theodore Ts'o wrote: >> On Sat, May 18, 2019 at 08:39:54PM +0200, Paolo Valente wrote: >>> I've addressed these issues in my last batch of improvements for >>> BFQ, which landed in the upcoming 5.2. If you give it a try, and >>> still see the problem, then I'll be glad to reproduce it, and >>> hopefully fix it for you. >> >> Hi Paolo, I'm curious if you could give a quick summary about what you >> changed in BFQ? >> >> I was considering adding support so that if userspace calls fsync(2) >> or fdatasync(2), to attach the process's CSS to the transaction, and >> then charge all of the journal metadata writes the process's CSS. If >> there are multiple fsync's batched into the transaction, the first >> process which forced the early transaction commit would get charged >> the entire journal write. OTOH, journal writes are sequential I/O, so >> the amount of disk time for writing the journal is going to be >> relatively small, and especially, the fact that work from other >> cgroups is going to be minimal, especially if hadn't issued an >> fsync(). > > But this makes priority-inversion problems with ext4 journal worse, doesn't > it? If we submit journal commit in blkio cgroup of some random process, it > may get throttled which then effectively blocks the whole filesystem. Or do > you want to implement a more complex back-pressure mechanism where you'd > just account to different blkio cgroup during journal commit and then > throttle as different point where you are not blocking other tasks from > progress? > >> In the case where you have three cgroups all issuing fsync(2) and they >> all landed in the same jbd2 transaction thanks to commit batching, in >> the ideal world we would split up the disk time usage equally across >> those three cgroups. But it's probably not worth doing that... >> >> That being said, we probably do need some BFQ support, since in the >> case where we have multiple processes doing buffered writes w/o fsync, >> we do charnge the data=ordered writeback to each block cgroup. Worse, >> the commit can't complete until the all of the data integrity >> writebacks have completed. And if there are N cgroups with dirty >> inodes, and slice_idle set to 8ms, there is going to be 8*N ms worth >> of idle time tacked onto the commit time. > > Yeah. At least in some cases, we know there won't be any more IO from a > particular cgroup in the near future (e.g. transaction commit completing, > or when the layers above IO scheduler already know which IO they are going > to submit next) and in that case idling is just a waste of time. Yep. Issues like this are targeted exactly by the improvement I mentioned in my previous reply. > But so far > I haven't decided how should look a reasonably clean interface for this > that isn't specific to a particular IO scheduler implementation. > That's an interesting point. So far, I've assumed that nobody would have told anything to BFQ. But if you guys think that such a communication may be acceptable at some degree, then I'd be glad to try to come up with some solution. For instance: some hook that any I/O scheduler may export if meaningful. Thanks, Paolo > Honza > -- > Jan Kara > SUSE Labs, CR --Apple-Mail=_695B9A4F-9A68-4C92-B622-6C792D193B9F Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEpYoduex+OneZyvO8OAkCLQGo9oMFAlzihWYACgkQOAkCLQGo 9oPXOA//SHPnVMxI3rHezSW0oYnbHoHp+FGr9dlhy3tQXGYXlhkAVnAO5z8rFFiF l3i0Rm84B/BLL/v/a2NMj50boLqfrjMA7YhUscj+uoGyQgmA8LZGfMv9RmSQt2mf 6KZfuJL4UdDkbjagKpWOaRjuOnvrP1L2psg0rbngdSil8ZS/D60FbWL6f8NmDSmz tb/s/ZS8YM4b58Qp6rtoMLwQVfj6vT+4QJib4C/YNbo9wY8+JULuuJRllRYqATsL cxOJFGwfL5fvcvQ/agaqp4lorVBLrMMMNEi9NpH6AFcQ8ALAZ2jEzKANOmrP8f43 cQpuLhsOVBAZuWQpmAYwX5au9VUaGTZsrhqYPEeMWY23Q0LHmJt8k4FoEF4wZRnt F7pVokpmMlwjcgw0+OzFm+OngQHdXFxbFwY8boWtmSXdSiZiZ61nbTfAnYNlp+/W Of0RlXgzZTMH5gqRLaFjZamQUIE5oWTtIvCPC9cF4CE2+tyQVHbla7Azgo/JvFRw LEbg21BDrSUxUfJukpaYrjtZ+LIyuFG3R+Wn3HN1qVAmUW+T3imrTN9U+xXft3c1 mkp8LwHND/8tnCGOsxYZXpEK4SmPykGOPpJS8Zjc+XWcr70HYXw8xMcajb0Fsl9p y9BLh0rZNMYmI58YAucv5thw+MbdtdoKCWh2R+Oebh7Il2+0kLM= =xSol -----END PGP SIGNATURE----- --Apple-Mail=_695B9A4F-9A68-4C92-B622-6C792D193B9F--