Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp3669003imj; Tue, 19 Feb 2019 07:28:29 -0800 (PST) X-Google-Smtp-Source: AHgI3IbyOUYtMGOOlcRxmFzSyEX6f7nKIyWCKCsH+Y0CB6nEP4V1g5sp4at4ncBTqTcytgC5AtrP X-Received: by 2002:a63:d453:: with SMTP id i19mr9432280pgj.237.1550590109327; Tue, 19 Feb 2019 07:28:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550590109; cv=none; d=google.com; s=arc-20160816; b=YmYdnHk5sTu1VHSr02aid2teCoESMUa9QBJFdB7kJCvzdHSl3QMKrNzwgfd5Cmv3Uy yHq+UVmpw68Si+QTnZ6uSYeBw0ZAPHOVSujqzeJQ7TUnYi288+QKsZQp/ctQiiGTtnUO 97Jc1VWFnJplc/VybvY6UvVJiiMYCiLdGbp0vliY4GAgcNJDc4AWiajCfMwJH51W2Tk9 7rnJU2Sb0mJMLBwtRuvBc+KrUz1pwz6PJyA0DQ7d5hMbF7opVSpNfAOtGY59lO21d+Q4 O/QVEwH15lMXgDfeYWKltqpQZLcXon1eo+Xancewjo4w+7qaNH6rT5Izw2/wNDUVWOP6 5L2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature; bh=QRLeOF9sct2PPfTZBOGt3U1G0PtSP4YDwZzbf5CfyMs=; b=bS0/w+rrDmNAt/726tkTZn6e0U1SglgCRlaSfO/bGnTYtQ71LA1xbsxBuOn2xcbv7n UR4k/Xk9/4KnNV/Vb46vfKLnf7whLCg+GaN2ifltA8So5jVVB9Sy8xQcG9KGj33bc5q+ aI2VCa8cmrepxci+Li7uhzEVSdRjE06iLAXEVNKVrVqZ1ct2Qo3zs8oMX8pj0sI8eIum GCayrIFZ9Mj4eMW1+SWBZ0hizBr9ojuGzKHPKDKhg7h47aHLdOKLkOjFVzF71u5Gng1n owy055n0acp1iMOSNyPenpPv6S+yRydPmfi6gm6ZfGzyrCboB4Ht/z1B+/m1epaZ2ixD A34A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=qjsik3eW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h78si11590684pfj.70.2019.02.19.07.28.13; Tue, 19 Feb 2019 07:28:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=qjsik3eW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728704AbfBSP1o (ORCPT + 99 others); Tue, 19 Feb 2019 10:27:44 -0500 Received: from mail-wm1-f68.google.com ([209.85.128.68]:32875 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726110AbfBSP1n (ORCPT ); Tue, 19 Feb 2019 10:27:43 -0500 Received: by mail-wm1-f68.google.com with SMTP id h22so2524352wmb.0; Tue, 19 Feb 2019 07:27:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=QRLeOF9sct2PPfTZBOGt3U1G0PtSP4YDwZzbf5CfyMs=; b=qjsik3eWzetZmcOurISZU93jAfURe1xQJ3zFn1INz0b4uqN0WD/DG6piBFWXI5HIOB uHppK7mOw2Ama+bq9uodFNG4GF8b0cVOb0qJ9UJG+c9Pp4f5FfKpipecy0YSJb1D3xqO xXGpkRh56J+mOI29KwoucgpXywq93YadzoZVJYZz6nccUTwM25DwesLO/pxPaxEeeQJ/ GnEbIqmqRt4r9Ad0VT/RDd3js22e9pTGifXFXPEs3d9eWYM8Og0/M1Ax8Uxv69wSH3C0 YkIt0yTjg5HbIkrN98PUdmPtwexTDvcHxzT/codZO/gEZyLgYH2tTvlCQR2ExjJhskSF sapg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=QRLeOF9sct2PPfTZBOGt3U1G0PtSP4YDwZzbf5CfyMs=; b=mmHDhMbFB9F0uu/e8H9IrbBtS2mPazrxKSHBi83Gh6gc1gvbyeH4Jh1lM2YYlfocNl xd/JrCqjxERDhNzRGLBYQU9wJfJuazGoFVpMYlE9/qViezmrrkN9TLOhNB09zgKXlxwc kTpR1Imeby/oMoqchNtErrcshSrKR7wpvg9BuX0Z6vIQOjdbhDymaANOfUifolE1WclT LdCk+P7Y679qrBJNmCofC4r8wvQVp3pPUZZpLT9/4NED7h7AoPN5Cs24Dy2UydzS9Fbu Qi1KOM1yZ2Drr9nKFrMOd7cR/eORF9esChVt2vfG6CNo4inmkqISAfBpSNVanf+/pP4O t7vg== X-Gm-Message-State: AHQUAuZIKnKPYY1cBveVcUsKgvsa0bLadulNLeW67OrHtVVJbLG2xUei +S7bqRcnVLw+rmKpujXi+A== X-Received: by 2002:a1c:4946:: with SMTP id w67mr3112790wma.20.1550590060817; Tue, 19 Feb 2019 07:27:40 -0800 (PST) Received: from xps-13.homenet.telecomitalia.it (host117-125-dynamic.33-79-r.retail.telecomitalia.it. [79.33.125.117]) by smtp.gmail.com with ESMTPSA id v6sm29029503wrd.88.2019.02.19.07.27.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Feb 2019 07:27:40 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/3] blkcg: sync() isolation Date: Tue, 19 Feb 2019 16:27:09 +0100 Message-Id: <20190219152712.9855-1-righi.andrea@gmail.com> X-Mailer: git-send-email 2.17.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org = Problem = When sync() is executed from a high-priority cgroup, the process is forced to wait the completion of the entire outstanding writeback I/O, even the I/O that was originally generated by low-priority cgroups potentially. This may cause massive latencies to random processes (even those running in the root cgroup) that shouldn't be I/O-throttled at all, similarly to a classic priority inversion problem. This topic has been previously discussed here: https://patchwork.kernel.org/patch/10804489/ [ Thanks to Josef for the suggestions ] = Solution = Here's a slightly more detailed description of the solution, as suggested by Josef and Tejun (let me know if I misunderstood or missed anything): - track the submitter of wb work (when issuing sync()) and the cgroup that originally dirtied any inode, then use this information to determine the proper "sync() domain" and decide if the I/O speed needs to be boosted or not in order to prevent priority-inversion problems - by default when sync() is issued, all the outstanding writeback I/O is boosted to maximum speed to prevent priority inversion problems - if sync() is issued by the same throttled cgroup that generated the dirty pages, the corresponding writeback I/O is still throttled normally - add a new flag to cgroups (io.sync_isolation) that would make sync()'ers in that cgroup only be allowed to write out dirty pages that belong to its cgroup = Test = Here's a trivial example to trigger the problem: - create 2 cgroups: cg1 and cg2 # mkdir /sys/fs/cgroup/unified/cg1 # mkdir /sys/fs/cgroup/unified/cg2 - set an I/O limit of 1MB/s on cg1/io.ma: # echo "8:0 rbps=1048576 wbps=1048576" > /sys/fs/cgroup/unified/cg1/io.max - run a write-intensive workload in cg1 # cat /proc/self/cgroup 0::/cg1 # fio --rw=write --bs=1M --size=32M --numjobs=16 --name=writer --time_based --runtime=30 - run sync in cg2 and measure time == Vanilla kernel == # cat /proc/self/cgroup 0::/cg2 # time sync real 9m32,618s user 0m0,000s sys 0m0,018s Ideally "sync" should complete almost immediately, because cg2 is unlimited and it's not doing any I/O at all. Instead, the entire system is totally sluggish, waiting for the throttled writeback I/O to complete, and it also triggers many hung task timeout warnings. == With this patch set applied and io.sync_isolation=0 (default) == # cat /proc/self/cgroup 0::/cg2 # time sync real 0m2,044s user 0m0,009s sys 0m0,000s [ Time range goes from 2s to 4s ] == With this patch set applied and io.sync_isolation=1 == # cat /proc/self/cgroup 0::/cg2 # time sync real 0m0,768s user 0m0,001s sys 0m0,008s [ Time range goes from 0.7s to 1.6s ] Andrea Righi (3): blkcg: prevent priority inversion problem during sync() blkcg: introduce io.sync_isolation blkcg: implement sync() isolation Documentation/admin-guide/cgroup-v2.rst | 9 +++ block/blk-cgroup.c | 120 ++++++++++++++++++++++++++++++++ block/blk-throttle.c | 48 ++++++++++++- fs/fs-writeback.c | 57 ++++++++++++++- fs/inode.c | 1 + fs/sync.c | 8 ++- include/linux/backing-dev-defs.h | 2 + include/linux/blk-cgroup.h | 52 ++++++++++++++ include/linux/fs.h | 4 ++ mm/backing-dev.c | 2 + mm/page-writeback.c | 1 + 11 files changed, 297 insertions(+), 7 deletions(-)