Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp5213360imb; Thu, 7 Mar 2019 10:11:01 -0800 (PST) X-Google-Smtp-Source: APXvYqxS389yRibIpxf3GlOUR4ZKCOw51fAOAbGvKCI8A8y7PJuYWZg5UZwnLsJUm4OBRmVsvHgv X-Received: by 2002:a63:5c66:: with SMTP id n38mr12340039pgm.15.1551982261466; Thu, 07 Mar 2019 10:11:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551982261; cv=none; d=google.com; s=arc-20160816; b=iKLx2noCuLK0FapU+26rikt9GRB7DW4G/wagYgfq9acZK3ezhEx0xuzikvp+mDGCJL tgX6HCj5RazkhaGqFToeDENx5//AKAj/2GlBNfuV1lcu5U409RoWtsR8m+5SBvw8D6W5 hleVa2ki4lidSizsVL71+vIIxgcpciegECuJdyUYNS6kqre6jbGsHZjJO6ZrNvXWfSmn 2bZEx5pGWyl+3nsbu58tixxUVvwB9oFcfaswmR9FCDfrdRgTOnisYscWeFc/izCJRK0N 4dg0wiSo9O0BPm7gvmzP8Eyx3Hreb5XvegEUiaHmbkQ6m3BFRXl3iVF5ETeIdQDZ91lx Vptg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=orfBePFDVPvptP/SIEB4LRUszsdy1P7zJ6ur8xGTkSA=; b=FmOnA4kmdWLwV8AlgPM4aElaInUELJyy7ERcds4lnlObCuTp4lerorz27MW6J7dGTq l7wGICbKDeW/Nc5n83TUdF25I+tvR8mW/bNb2xFa+pgCSvjutgSuuvHg6S/d3l7akzGx 4nQYUVMkcH0jFKFpdxpNGWFYT6/g2XjfnL61uYmTc0QGaSizxUQ/P9iLWsW9s2LudjmO Gbwl5jHV3NUBUcaD+8vrOTGQjsaUVYXcA9yxUDnlFeNOk2U5idXdEqjXF61F/IWq89Vz hLmXvovlGh9hrfATg/YFAodpRRomOeBYO7ccdoX7pCZ0xNpVPWQsaNPWz3xHvM6ORhYL 7j0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id cv2si5183782plb.192.2019.03.07.10.10.46; Thu, 07 Mar 2019 10:11:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726411AbfCGSJa (ORCPT + 99 others); Thu, 7 Mar 2019 13:09:30 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:51164 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726243AbfCGSJa (ORCPT ); Thu, 7 Mar 2019 13:09:30 -0500 Received: from mail-wr1-f69.google.com ([209.85.221.69]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1h1xSR-0001jh-G7 for linux-kernel@vger.kernel.org; Thu, 07 Mar 2019 18:09:27 +0000 Received: by mail-wr1-f69.google.com with SMTP id l5so8953250wrv.19 for ; Thu, 07 Mar 2019 10:09:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=orfBePFDVPvptP/SIEB4LRUszsdy1P7zJ6ur8xGTkSA=; b=CfK1teyJf0UNiauofzOqAFvGi9e3uIt/F/sN5V1KBlBjgHnaQG0S+80KUQRIQmKONC 5KTdtMYRNiTLKk2gnTzotlaF0D7gY3HwdRZx53znJVuv5L7LxyOIu/aAVeoy/ZsBZ1+V iQHiDGQz0WXLOqYfFvX9ep3n+0QSde/yQTSYlgnB6FS4kQEJ9mjIHvbnWRl1Oo652JEO dZ2JWNaGkh06KWYCtxSM3O21uyATKilcOf0aNxF53vTlGsivS3VrToKRrDDz9ksnRSPJ EOlhF4JbpQCq3OzmE5kmvh6BesJgdMxdg88GoXA8Tc5vda1IKE57TSEmvorjB26XL4xY I4gQ== X-Gm-Message-State: APjAAAWK2OmOCzwSboXfSRLnC/VaN3As3nlhRWDMbiTUnVx54rpcgUgA d/+Axm2SdXR2M2BPjNjqHibaquAbSfhlBQl6gl1TBV3i7a4vbn1cB3BU85J+Nk46pi6kdhtYzq3 I7NMNrtbyiUgUTe2v+4Tyd4ew+PcmU9HunTLL30h+gw== X-Received: by 2002:adf:e8c7:: with SMTP id k7mr8149005wrn.298.1551982167159; Thu, 07 Mar 2019 10:09:27 -0800 (PST) X-Received: by 2002:adf:e8c7:: with SMTP id k7mr8148976wrn.298.1551982166849; Thu, 07 Mar 2019 10:09:26 -0800 (PST) Received: from localhost.localdomain (host22-124-dynamic.46-79-r.retail.telecomitalia.it. [79.46.124.22]) by smtp.gmail.com with ESMTPSA id a74sm7872747wma.22.2019.03.07.10.09.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Mar 2019 10:09:26 -0800 (PST) From: Andrea Righi To: Josef Bacik , Tejun Heo Cc: Li Zefan , Paolo Valente , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 0/3] blkcg: sync() isolation Date: Thu, 7 Mar 2019 19:08:31 +0100 Message-Id: <20190307180834.22008-1-andrea.righi@canonical.com> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org = Problem = When sync() is executed from a high-priority cgroup, the process is forced to wait the completion of the entire outstanding writeback I/O, even the I/O that was originally generated by low-priority cgroups potentially. This may cause massive latencies to random processes (even those running in the root cgroup) that shouldn't be I/O-throttled at all, similarly to a classic priority inversion problem. This topic has been previously discussed here: https://patchwork.kernel.org/patch/10804489/ [ Thanks to Josef for the suggestions ] = Solution = Here's a slightly more detailed description of the solution, as suggested by Josef and Tejun (let me know if I misunderstood or missed anything): - track the submitter of wb work (when issuing sync()) and the cgroup that originally dirtied any inode, then use this information to determine the proper "sync() domain" and decide if the I/O speed needs to be boosted or not in order to prevent priority-inversion problems - by default when sync() is issued, all the outstanding writeback I/O is boosted to maximum speed to prevent priority inversion problems - if sync() is issued by the same throttled cgroup that generated the dirty pages, the corresponding writeback I/O is still throttled normally - add a new flag to cgroups (io.sync_isolation) that would make sync()'ers in that cgroup only be allowed to write out dirty pages that belong to its cgroup = Test = Here's a trivial example to trigger the problem: - create 2 cgroups: cg1 and cg2 # mkdir /sys/fs/cgroup/unified/cg1 # mkdir /sys/fs/cgroup/unified/cg2 - set an I/O limit of 1MB/s on cg1/io.ma: # echo "8:0 rbps=1048576 wbps=1048576" > /sys/fs/cgroup/unified/cg1/io.max - run a write-intensive workload in cg1 # cat /proc/self/cgroup 0::/cg1 # fio --rw=write --bs=1M --size=32M --numjobs=16 --name=writer --time_based --runtime=30 - run sync in cg2 and measure time == Vanilla kernel == # cat /proc/self/cgroup 0::/cg2 # time sync real 9m32,618s user 0m0,000s sys 0m0,018s Ideally "sync" should complete almost immediately, because cg2 is unlimited and it's not doing any I/O at all. Instead, the entire system is totally sluggish, waiting for the throttled writeback I/O to complete, and it also triggers many hung task timeout warnings. == With this patch set applied and io.sync_isolation=0 (default) == # cat /proc/self/cgroup 0::/cg2 # time sync real 0m2,044s user 0m0,009s sys 0m0,000s [ Time range goes from 2s to 4s ] == With this patch set applied and io.sync_isolation=1 == # cat /proc/self/cgroup 0::/cg2 # time sync real 0m0,768s user 0m0,001s sys 0m0,008s [ Time range goes from 0.7s to 1.6s ] Changes in v2: - fix: properly keep track of sync waiters when a blkcg is writing to many block devices at the same time Andrea Righi (3): blkcg: prevent priority inversion problem during sync() blkcg: introduce io.sync_isolation blkcg: implement sync() isolation Documentation/admin-guide/cgroup-v2.rst | 9 +++ block/blk-cgroup.c | 178 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ block/blk-throttle.c | 48 ++++++++++++++- fs/fs-writeback.c | 57 +++++++++++++++++- fs/inode.c | 1 + fs/sync.c | 8 ++- include/linux/backing-dev-defs.h | 2 + include/linux/blk-cgroup.h | 52 +++++++++++++++++ include/linux/fs.h | 4 ++ mm/backing-dev.c | 2 + mm/page-writeback.c | 1 + 11 files changed, 355 insertions(+), 7 deletions(-)