Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp484220ybi; Thu, 30 May 2019 01:40:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqxufaABc41ajWqskE0Pf6kRfWkiaD+BY3CX/yn724kmqjab78ucJh0Tjn9K1HYeLGCv2fRU X-Received: by 2002:a17:902:a50c:: with SMTP id s12mr2536111plq.273.1559205648799; Thu, 30 May 2019 01:40:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559205648; cv=none; d=google.com; s=arc-20160816; b=AB/J2j6rRUaDIfmDymgD6yr59nDZBKsYgEVHBiBlK/BEBjcoUUA7IrfoRQ1Yq4/KV1 Tcm1YaWSebawE3WN5FxKI6oE2Egnk34EKHS4VUo4VxjmFy6p8SvrHeKUa0BnV+D7yzpj 429JKnjKejFDAxDJ5C+li745LRSJqvOJm84FQv68nIR+nULT1s70ZQkLpe0jodr/+Zpx mIuNIO4UnRiuueG64l416snabwf3F5jsQa4+lpJdHQx3d9u7FjszJqRFtdxOEK/eml/i lGun6FMrRAMLKAPqqCFoy0juXrMqPlwY4J/OkHS0w6CBBWk4jjP9h9XiOnJs/ryTMFt/ nl2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject; bh=9dYC9xjJ+ylU/xtKh2aZ5vQajqhwGt/8o4m7w+vAisY=; b=ZqczCdr420h0/ScP8XA/X4NaZUd0acjJ1+hiadzc9sRCGGBbnw0+XKP/HHjNCS1s1l g+LKziESuJujhBLwUwoPxuyR38dSzA2HaqypQcPGpU8PCvMbdH3cuEMM1wcKCZGzxbYB 8ioENC0CRzsv9lEXCqP4Un1tpFknDyGKvC32HOIGdqbpgruRqTE+HPYdiFDjHZjL8lp0 2m6huIIwxCp6sv8RSrXg4YQSy5+Pt59fjyd/V/IADt0jhg07H2/KRHy2OKGwNymGfrg7 DAcikW6nXZXhp9484FsXAZQCEGLiT835+l0jW2FYD9jQSCdEv9geGyxrxJA3gK1kWUOl v7UQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=csail.mit.edu Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id gn11si2414691plb.119.2019.05.30.01.40.30; Thu, 30 May 2019 01:40:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=csail.mit.edu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726787AbfE3IjH (ORCPT + 99 others); Thu, 30 May 2019 04:39:07 -0400 Received: from outgoing-stata.csail.mit.edu ([128.30.2.210]:56549 "EHLO outgoing-stata.csail.mit.edu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726439AbfE3IjG (ORCPT ); Thu, 30 May 2019 04:39:06 -0400 Received: from c-73-193-85-113.hsd1.wa.comcast.net ([73.193.85.113] helo=srivatsab-a01.vmware.com) by outgoing-stata.csail.mit.edu with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1hWGaT-000Qui-MV; Thu, 30 May 2019 04:39:01 -0400 Subject: Re: CFQ idling kills I/O performance on ext4 with blkio cgroup controller From: "Srivatsa S. Bhat" To: Paolo Valente Cc: linux-fsdevel@vger.kernel.org, linux-block , linux-ext4@vger.kernel.org, cgroups@vger.kernel.org, kernel list , Jens Axboe , Jan Kara , jmoyer@redhat.com, Theodore Ts'o , amakhalov@vmware.com, anishs@vmware.com, srivatsab@vmware.com References: <8d72fcf7-bbb4-2965-1a06-e9fc177a8938@csail.mit.edu> <1812E450-14EF-4D5A-8F31-668499E13652@linaro.org> <46c6a4be-f567-3621-2e16-0e341762b828@csail.mit.edu> <07D11833-8285-49C2-943D-E4C1D23E8859@linaro.org> <5B6570A2-541A-4CF8-98E0-979EA6E3717D@linaro.org> <2CB39B34-21EE-4A95-A073-8633CF2D187C@linaro.org> <0e3fdf31-70d9-26eb-7b42-2795d4b03722@csail.mit.edu> <686D6469-9DE7-4738-B92A-002144C3E63E@linaro.org> <01d55216-5718-767a-e1e6-aadc67b632f4@csail.mit.edu> Message-ID: Date: Thu, 30 May 2019 01:38:59 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 5/23/19 4:32 PM, Srivatsa S. Bhat wrote: > On 5/22/19 7:30 PM, Srivatsa S. Bhat wrote: >> On 5/22/19 3:54 AM, Paolo Valente wrote: >>> >>> >>>> Il giorno 22 mag 2019, alle ore 12:01, Srivatsa S. Bhat ha scritto: >>>> >>>> On 5/22/19 2:09 AM, Paolo Valente wrote: >>>>> >>>>> First, thank you very much for testing my patches, and, above all, for >>>>> sharing those huge traces! >>>>> >>>>> According to the your traces, the residual 20% lower throughput that you >>>>> record is due to the fact that the BFQ injection mechanism takes a few >>>>> hundredths of seconds to stabilize, at the beginning of the workload. >>>>> During that setup time, the throughput is equal to the dreadful ~60-90 KB/s >>>>> that you see without this new patch. After that time, there >>>>> seems to be no loss according to the trace. >>>>> >>>>> The problem is that a loss lasting only a few hundredths of seconds is >>>>> however not negligible for a write workload that lasts only 3-4 >>>>> seconds. Could you please try writing a larger file? >>>>> >>>> >>>> I tried running dd for longer (about 100 seconds), but still saw around >>>> 1.4 MB/s throughput with BFQ, and between 1.5 MB/s - 1.6 MB/s with >>>> mq-deadline and noop. >>> >>> Ok, then now the cause is the periodic reset of the mechanism. >>> >>> It would be super easy to fill this gap, by just gearing the mechanism >>> toward a very aggressive injection. The problem is maintaining >>> control. As you can imagine from the performance gap between CFQ (or >>> BFQ with malfunctioning injection) and BFQ with this fix, it is very >>> hard to succeed in maximizing the throughput while at the same time >>> preserving control on per-group I/O. >>> >> >> Ah, I see. Just to make sure that this fix doesn't overly optimize for >> total throughput (because of the testcase we've been using) and end up >> causing regressions in per-group I/O control, I ran a test with >> multiple simultaneous dd instances, each writing to a different >> portion of the filesystem (well separated, to induce seeks), and each >> dd task bound to its own blkio cgroup. I saw similar results with and >> without this patch, and the throughput was equally distributed among >> all the dd tasks. >> > Actually, it turns out that I ran the dd tasks directly on the block > device for this experiment, and not on top of ext4. I'll redo this on > ext4 and report back soon. > With all your patches applied (including waker detection for the low latency case), I ran four simultaneous dd instances, each writing to a different ext4 partition, and each dd task bound to its own blkio cgroup. The throughput continued to be well distributed among the dd tasks, as shown below (I increased dd's block size from 512B to 8KB for these experiments to get double-digit throughput numbers, so as to make comparisons easier). bfq with low_latency = 1: 819200000 bytes (819 MB, 781 MiB) copied, 16452.6 s, 49.8 kB/s 819200000 bytes (819 MB, 781 MiB) copied, 17139.6 s, 47.8 kB/s 819200000 bytes (819 MB, 781 MiB) copied, 17251.7 s, 47.5 kB/s 819200000 bytes (819 MB, 781 MiB) copied, 17384 s, 47.1 kB/s bfq with low_latency = 0: 819200000 bytes (819 MB, 781 MiB) copied, 16257.9 s, 50.4 kB/s 819200000 bytes (819 MB, 781 MiB) copied, 17204.5 s, 47.6 kB/s 819200000 bytes (819 MB, 781 MiB) copied, 17220.6 s, 47.6 kB/s 819200000 bytes (819 MB, 781 MiB) copied, 17348.1 s, 47.2 kB/s Regards, Srivatsa VMware Photon OS