Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp789044ybz; Wed, 29 Apr 2020 09:24:00 -0700 (PDT) X-Google-Smtp-Source: APiQypIrL5wauTb60qgX162QfWgZFTQIsAS1JCFCNuZpVcbzBChefh0d1LfNITswCOU3g6RTkH6T X-Received: by 2002:a50:c44c:: with SMTP id w12mr3250959edf.83.1588177440148; Wed, 29 Apr 2020 09:24:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588177440; cv=none; d=google.com; s=arc-20160816; b=QjtfOBJo3PCaCjOpickPGrYSRs70QoCKUUIK33l1FtmQu0WGaSkkU7naFrR7cRP84a C38fONegxvzD/tlMR/u6Ppp9LXX1ExwzE2hlspGtBgz4Jo/AylAuSLJyEur4CBZzRxOI x9CL0BWQxDUlH5+eSlkbXfs9ytOuMQ4rCfsjW8MDjdyFPXXmiBNZczU3dCMhSouCvsfW 3IoFjavv0UjPTbE8Mycco1RIJl+u0vlSn/7GHEIYJ5y4Q9Cgy+WkkhifINtOGxhJv4VW 865+jegWAkla7qlsKd1zVZsENcMXB08DeX5Tw3/CA4ncS8a2C1U5OTptsjPn7PGj1gj9 AoZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=x5XiehpsWaF93JHquMDSFXBtqUkHfKEcP4Q4RdX9Mts=; b=QGvujEBpSmxDSLXRjeMuJtaLWRz6vj8pNd2HzgQSN7jC8RmJI22XrL6ev6NoVUqqzq 8PYjK9M9+X7umeODFqebF0/Rf0nkwOcU3qI5OGUb9u8yJqhYFT8UZ/TlIc5hVa/dZIm/ od3qEgm0b2cKYGaEQYU4t9xtd9z6FdDXh+H8FUoM5UE6FX8IlSQdad/Roo1KZlQtAlMK QXfYo0ImPITDxPLcSN9n6rTMyb9RyDHUwZIXxRBaP/EpGI0Aph5wRzf+I1dsNTspxX4E sRLGR0iZeg+ADrOaaVJjk/M+wwRWrWw/ADJ/9PDGKzOYK6ywd9NRerQLvfwiY5QGX+FO RioQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cw21si4100385ejb.158.2020.04.29.09.23.30; Wed, 29 Apr 2020 09:24:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726915AbgD2QV0 (ORCPT + 99 others); Wed, 29 Apr 2020 12:21:26 -0400 Received: from mx2.suse.de ([195.135.220.15]:43240 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726526AbgD2QVZ (ORCPT ); Wed, 29 Apr 2020 12:21:25 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 881EDAF77; Wed, 29 Apr 2020 16:21:22 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id D262D1E1298; Wed, 29 Apr 2020 18:21:20 +0200 (CEST) Date: Wed, 29 Apr 2020 18:21:20 +0200 From: Jan Kara To: Tejun Heo Cc: Jan Kara , Dave Chinner , Dan Schatzberg , Jens Axboe , Alexander Viro , Amir Goldstein , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Ingo Molnar , "Peter Zijlstra (Intel)" , Mathieu Desnoyers , "Kirill A. Shutemov" , Andrea Arcangeli , Thomas Gleixner , "open list:BLOCK LAYER" , open list , "open list:FILESYSTEMS (VFS and infrastructure)" , "open list:CONTROL GROUP (CGROUP)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup Message-ID: <20200429162120.GB12716@quack2.suse.cz> References: <20200428161355.6377-1-schatzberg.dan@gmail.com> <20200428214653.GD2005@dread.disaster.area> <20200429102540.GA12716@quack2.suse.cz> <20200429142230.GE5462@mtj.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200429142230.GE5462@mtj.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 29-04-20 10:22:30, Tejun Heo wrote: > Hello, > > On Wed, Apr 29, 2020 at 12:25:40PM +0200, Jan Kara wrote: > > Yeah, I was thinking about the same when reading the patch series > > description. We already have some cgroup workarounds for btrfs kthreads if > > I remember correctly, we have cgroup handling for flush workers, now we are > > adding cgroup handling for loopback device workers, and soon I'd expect > > someone comes with a need for DM/MD worker processes and IMHO it's getting > > out of hands because the complexity spreads through the kernel with every > > subsystem comming with slightly different solution to the problem and also > > the number of kthreads gets multiplied by the number of cgroups. So I > > agree some generic solution how to approach IO throttling of kthreads / > > workers would be desirable. > > > > OTOH I don't have a great idea how the generic infrastructure should look > > like... > > I don't really see a way around that. The only generic solution would be > letting all IOs through as root and handle everything through backcharging, > which we already can do as backcharging is already in use to handle metadata > updates which can't be controlled directly. However, doing that for all IOs > would make the control quality a lot worse as all control would be based on > first incurring deficit and then try to punish the issuer after the fact. Yeah, it will be probably somewhat worse but OTOH given we'd track the IO balance per cgroup there will deficit only when a cgroup is starting so it could be bearable. I'm more concerned about issues like that for some IO controllers (e.g. for blk-iolatency or for the work preserving controllers), it is not obvious how to sensibly estimate some cost to charge to a cgroup since these controllers are more about giving priority to IO of some cgroup in presence of IO from another cgroup rather than some hard throughput limit or something like that. > The infrastructure work done to make IO control work for btrfs is generic > and the changes needed on btrfs side was pretty small. Most of the work was > identifying non-regular IO pathways (bouncing through different kthreads and > whatnot) and making sure they're annotating IO ownership and the needed > mechanism correctly. The biggest challenge probably is ensuring that the > filesystem doesn't add ordering dependency between separate data IOs, which > is a nice property to have with or without cgroup support. > > That leaves the nesting drivers, loop and md/dm. Given that they sit in the > middle of IO stack and proxy a lot of its roles, they'll have to be updated > to be transparent in terms of cgroup ownership if IO control is gonna work > through them. Maybe we can have a common infra shared between loop, dm and > md but they aren't many and may also be sufficiently different. idk Yeah, as I said, I don't really have a better alternative :-| Honza -- Jan Kara SUSE Labs, CR