Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp2732463imj; Mon, 11 Feb 2019 07:40:26 -0800 (PST) X-Google-Smtp-Source: AHgI3Iae5pUqKv8PJGeBKPEAfehKwsiYrkg44SPVo7KPZ3KbdoebQvVZ8ESd/ymGD1K1JlekbMou X-Received: by 2002:a17:902:8498:: with SMTP id c24mr28633041plo.265.1549899625954; Mon, 11 Feb 2019 07:40:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549899625; cv=none; d=google.com; s=arc-20160816; b=QDPwnMPHzYWx0ycU00iC1gmbWuiGsmuV4RCeUZaUAm9AptxHkG+opgMq0AyKveZEzo b1PPd7MYZGAkxlHZ83XekHkKxQi8d1unAjj25XqsPWHdfZRSJvtUS0yINcVfz6CfW/RK k185skJN2GDKPujhUAzd8fKwhMDysAqxOHE5Uu3WPdB7gOK8Zv2yUdWTlOD+HvicLbjg NEhZ7XBG9dv2a56VnYvBLIn2rU2JoizN6uwXlwF0p0JZEBeDDnGcasIV59CC2LGQHIol eZavkybKW2vPyq6esYR/wKq9KWkTuuV2+hoRFyUuPOSWUIoTdw0h5hWYEGcCQFlIvgXE CAww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=boQyFNiX0XoZWerTpTg2C8lpdchzW9CQNXvCXi3hMik=; b=mVjiAhkyzds/WTK4Ptyhzre6CQkUYpvRWnunUoco1Ms/55Z2I2+R1ffXaDm3HxlyXz tCNuBJPTSl8NEdK4yUhzmY/gmyOW9mHVP7ndEH4lN7oc/mumBUkUKNhkEWJDQT81/nqF 0Csbd/jb1SebWCy76br7xlXZzc0WfPrlPJ31nFL0Lwn14x8CCCmp2OH9883CcW5IcL+L VujScXkj5VkhVUFOBSnbWcre5pTXky70kp9Pk7XX8aGeaQ4xCQDR4LaNe6uolRtX5fU4 ZLiiZiUV1NiTgKt6/FWmpFgWNCupLENE0iJlV+v35rDIHWgmQJ2x049D9D8mRx8k6ZVV ++ZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b="Cz7C/op5"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c26si9270626pfd.2.2019.02.11.07.40.10; Mon, 11 Feb 2019 07:40:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b="Cz7C/op5"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388332AbfBKPjW (ORCPT + 99 others); Mon, 11 Feb 2019 10:39:22 -0500 Received: from mail-yb1-f194.google.com ([209.85.219.194]:38040 "EHLO mail-yb1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388277AbfBKPjT (ORCPT ); Mon, 11 Feb 2019 10:39:19 -0500 Received: by mail-yb1-f194.google.com with SMTP id x9so4330074ybj.5 for ; Mon, 11 Feb 2019 07:39:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=boQyFNiX0XoZWerTpTg2C8lpdchzW9CQNXvCXi3hMik=; b=Cz7C/op581GUNxTLcUjtjsQTNCi0Y3Bp+qn6dcjHeM04i+snxFdxl2Q5t8K3x6nuMG MLWGeA7SWY3SxAdEypKir9AeURyWSTDOW0hzPET0xaW0nbw59ZZNfyWG74PGMNHbh/Qi XlD9pOhblu5UvvQ1oQ7QNkcpgK5jBb4HzC6FsArd/SoyQKTXh+rxfgitkYrFKaDLelP3 eCw9ucw79AY/LwZBpNsRSvCvedq6YIo56HtK9c1LOuOiojgeC8VFCbd71JC70o3GU1bI qjOcMd7JZACS8h0JahatSh38wSHCIMJT29nry+7AakeP/Qm2uiSMtdZMUTdGQseSWZR+ RDDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=boQyFNiX0XoZWerTpTg2C8lpdchzW9CQNXvCXi3hMik=; b=VYAe17N2bxXseq4eXO7dHvDszF5uYZG+Pejp1PtgYMjXsi/E55v/aYwgCzp1Kc2gb2 ohEznGrrE30mDsU4Pawk6lUwDdqMztalVtpZ1EQ1jsujkEaAp7twYqPvO5NsEe0MhKi0 Xihf8P6z41yXZlje96IyFD+rIJZc/I7hN/7U6LYWz/EVkVsngwN7UkScQGwi9XqeEzJe DVzwR6+vlMR3O+9hXby8k/difaJbmJ3GKZAaPRidx3Y8b0Yuob7zV1w7lIqcUapFI1Xb LOwkoJGlbQQtBkd1zj28jXde6GUlz/vBqJYeOiDGpKUn8l8E4H893ucW8p09dZx5+1wb 5NFQ== X-Gm-Message-State: AHQUAuY2UVGm7Ka+9yKtGvWKMeWiQ0JkjMORs4Q2P/dhceL50vKkQ9cN C5QffMZpF9tUZhqnDVHt4fpyXA== X-Received: by 2002:a25:3291:: with SMTP id y139mr26204187yby.79.1549899558746; Mon, 11 Feb 2019 07:39:18 -0800 (PST) Received: from localhost ([2620:10d:c091:200::7:9135]) by smtp.gmail.com with ESMTPSA id 11sm4318587ywv.109.2019.02.11.07.39.17 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Feb 2019 07:39:17 -0800 (PST) Date: Mon, 11 Feb 2019 10:39:34 -0500 From: Josef Bacik To: Andrea Righi Cc: Josef Bacik , Paolo Valente , Tejun Heo , Li Zefan , Johannes Weiner , Jens Axboe , Vivek Goyal , Dennis Zhou , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v2] blkcg: prevent priority inversion problem during sync() Message-ID: <20190211153933.p26pu5jmbmisbkos@macbook-pro-91.dhcp.thefacebook.com> References: <20190209140749.GB1910@xps-13> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190209140749.GB1910@xps-13> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote: > This is an attempt to mitigate the priority inversion problem of a > high-priority blkcg issuing a sync() and being forced to wait the > completion of all the writeback I/O generated by any other low-priority > blkcg, causing massive latencies to processes that shouldn't be > I/O-throttled at all. > > The idea is to save a list of blkcg's that are waiting for writeback: > every time a sync() is executed the current blkcg is added to the list. > > Then, when I/O is throttled, if there's a blkcg waiting for writeback > different than the current blkcg, no throttling is applied (we can > probably refine this logic later, i.e., a better policy could be to > adjust the throttling I/O rate using the blkcg with the highest speed > from the list of waiters - priority inheritance, kinda). > > This topic has been discussed here: > https://lwn.net/ml/cgroups/20190118103127.325-1-righi.andrea@gmail.com/ > > But we didn't come up with any definitive solution. > > This patch is not a definitive solution either, but it's an attempt to > continue addressing this issue and handling the priority inversion > problem with sync() in a better way. > > Signed-off-by: Andrea Righi Talked with Tejun about this some and we agreed the following is probably the best way forward 1) Track the submitter of the wb work to the writeback code. 2) Sync() defaults to the root cg, and and it writes all the things as the root cg. 3) Add a flag to the cgroups that would make sync()'ers in that group only be allowed to write out things that belong to its group. This way we avoid the priority inversion of having things like systemd or random logged in user doing sync() and having to wait, and we keep low prio cgroups from causing big IO storms by syncing out stuff and getting upgraded to root priority just to avoid the inversion. Obviously by default we want this flag to be off since its such a big change, but people/setups really worried about this behavior (Facebook for instance would likely use this flag) can go ahead and set it and be sure we're getting good isolation and still avoiding the priority inversion associated with running sync from a high priority context. Thanks, Josef