Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp4768885yba; Mon, 20 May 2019 03:31:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqzkbBTgUEe+DvoqHxyLJ1uwXLugk50YJAUz8ikXLq9MFxQ5PMaaoFVdjwWuL2Pyx0fyNusg X-Received: by 2002:a65:430a:: with SMTP id j10mr28822514pgq.133.1558348283288; Mon, 20 May 2019 03:31:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558348283; cv=none; d=google.com; s=arc-20160816; b=h5teupEAWJsFkb2gs0lXSiuN0R1kQlVPZW/MPR8k7oJdzxBOo0wV/p5y/6mgEq1eC9 TV60+T4mkbqSn3q5jx6Nw5taFdu4yQ5WXjJCLGkjH0POEqY3UyxvWzswIxhkWucbMSkC dhQVyGsffnVV4xfavdrnIqriHlXF48KZi4ptNVyaO5YGdjmyt4S+CP6rhTJHELTEPCZW io4dxxLQjLgx+8/QKhqxKMQJe6fIPq6Rq2Ukhsn0D9KkCHsWOLXCzswrGzGiOxLlAjlV /DtYP8jYJNTNHxL6gvpYMR/qxL65DUIgLUJ+oUPEhrvWJwSv8B/H21mKBKW04YZjmFrE w38Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ihWUKB1mzp7iaOqOGUNtze20HUF1XcAQPb5seagyips=; b=FTI9Af0eHeGB1GvUbs7j3VGFarUPmorJ1ocIKrqrBjFOf1gzFbXJTFlHBOBPKAq6cr kBA3Kh1DrMMlFTu8eDDMb+cEkK9X2KHbkKPDkJOwj0OXj2S6Xye6CYTveYZGLpIFSfz+ vq1d8aMSPtf+EHqU/9HkEd2/Q4A58tC+9XND8TnFnc0bQW4EpKX79rvOFRounUn56ycS JFy97MHMHVlOJR6+q6ofPHR/quzeNqDIhqUYviLicKHJVi5epxi2YGnq2WBKc6SW5PqS zFhBUz9zpt8kc8btytzAXmYHY7uBBnqJJ9J+ebkRZZxk7P5WgmYkveesJZwUe+Mo/pFN T1Cg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 67si18281119plf.382.2019.05.20.03.31.04; Mon, 20 May 2019 03:31:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731864AbfETJQC (ORCPT + 99 others); Mon, 20 May 2019 05:16:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:37692 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730677AbfETJQC (ORCPT ); Mon, 20 May 2019 05:16:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BE23FAE4B; Mon, 20 May 2019 09:15:59 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id F16311E3C5F; Mon, 20 May 2019 11:15:58 +0200 (CEST) Date: Mon, 20 May 2019 11:15:58 +0200 From: Jan Kara To: Theodore Ts'o Cc: Paolo Valente , "Srivatsa S. Bhat" , linux-fsdevel@vger.kernel.org, linux-block , linux-ext4@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, axboe@kernel.dk, jack@suse.cz, jmoyer@redhat.com, amakhalov@vmware.com, anishs@vmware.com, srivatsab@vmware.com Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller Message-ID: <20190520091558.GC2172@quack2.suse.cz> References: <8d72fcf7-bbb4-2965-1a06-e9fc177a8938@csail.mit.edu> <1812E450-14EF-4D5A-8F31-668499E13652@linaro.org> <20190518192847.GB14277@mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190518192847.GB14277@mit.edu> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Sat 18-05-19 15:28:47, Theodore Ts'o wrote: > On Sat, May 18, 2019 at 08:39:54PM +0200, Paolo Valente wrote: > > I've addressed these issues in my last batch of improvements for > > BFQ, which landed in the upcoming 5.2. If you give it a try, and > > still see the problem, then I'll be glad to reproduce it, and > > hopefully fix it for you. > > Hi Paolo, I'm curious if you could give a quick summary about what you > changed in BFQ? > > I was considering adding support so that if userspace calls fsync(2) > or fdatasync(2), to attach the process's CSS to the transaction, and > then charge all of the journal metadata writes the process's CSS. If > there are multiple fsync's batched into the transaction, the first > process which forced the early transaction commit would get charged > the entire journal write. OTOH, journal writes are sequential I/O, so > the amount of disk time for writing the journal is going to be > relatively small, and especially, the fact that work from other > cgroups is going to be minimal, especially if hadn't issued an > fsync(). But this makes priority-inversion problems with ext4 journal worse, doesn't it? If we submit journal commit in blkio cgroup of some random process, it may get throttled which then effectively blocks the whole filesystem. Or do you want to implement a more complex back-pressure mechanism where you'd just account to different blkio cgroup during journal commit and then throttle as different point where you are not blocking other tasks from progress? > In the case where you have three cgroups all issuing fsync(2) and they > all landed in the same jbd2 transaction thanks to commit batching, in > the ideal world we would split up the disk time usage equally across > those three cgroups. But it's probably not worth doing that... > > That being said, we probably do need some BFQ support, since in the > case where we have multiple processes doing buffered writes w/o fsync, > we do charnge the data=ordered writeback to each block cgroup. Worse, > the commit can't complete until the all of the data integrity > writebacks have completed. And if there are N cgroups with dirty > inodes, and slice_idle set to 8ms, there is going to be 8*N ms worth > of idle time tacked onto the commit time. Yeah. At least in some cases, we know there won't be any more IO from a particular cgroup in the near future (e.g. transaction commit completing, or when the layers above IO scheduler already know which IO they are going to submit next) and in that case idling is just a waste of time. But so far I haven't decided how should look a reasonably clean interface for this that isn't specific to a particular IO scheduler implementation. Honza -- Jan Kara SUSE Labs, CR