Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp459155ybz; Wed, 29 Apr 2020 03:27:30 -0700 (PDT) X-Google-Smtp-Source: APiQypJsA6AzkdgyPbz/vHNrYgkKgSgd7qvJi/B/XXV62JONSQYL+0+NINpQ4iTU2IoXxwG5tKsE X-Received: by 2002:a17:906:18a2:: with SMTP id c2mr1987487ejf.167.1588156050488; Wed, 29 Apr 2020 03:27:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588156050; cv=none; d=google.com; s=arc-20160816; b=iLsEuYrL9gXt21wqx0WHj1/q659E2CRkN4CbRTWlr8S5UAZcC3GjLLehlQFVFADsYZ 1rBhwlqBaMmFG2JbKzu5iYVKBsnzQejAjZIY4WGnzhvAwxLoplLjkZq9YyGBoQY+rUwW TjqH/037fjGHL7X+GVJZY4nQ5VTIwUStOgq2Nsde9jbC/stfeRk5+09L8GFsSaVwkuH5 UcekjTwbzHUX5Vg0mYVczHPWzZkaJXwoH+V0BTzIiLCp0FuIwiU7+jtcXoAszTntcHEb tiCnMvYYt++cByKZnzX87EEOV+xNFzoOChfKWvDRJ3c22dev+1y+wHD0vl86lfThB5Vo wuAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=zaRzi+FZ6bLImpSe0X8/1ATu4y7YxtRdQPLgNMC80V0=; b=0hmGcYlXbUGjl1NYVpioePY0QCWu1dS05pM7ib3U2BJheXyxfGobmrZPpJ81tZCbzo ai2WjD7BfgXQz/+CZtobHx9JmfHhsFO4FPq4Xfq4t/mXwEHbmgAiCk1WZsp6eoDb6qMi zZilbxKsoFXKYlSSdaqmjzJXfXD/X+60NobCIU57zExQyTbYPmdys3xFg8qrvY36zfb6 F04MGz5hQt//eXYI8TZ8QcxLlOnkNAOT4khUTxURDK3y/YcC6SQbDoC189mWURugj6ls PjZV3AKtrY3Tb7UpqigSeq3pyJu1g0lsCIfDqzCMw5Rkptb+MLztP7suEnVRZP47xim7 uKFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v21si3057612edl.93.2020.04.29.03.27.06; Wed, 29 Apr 2020 03:27:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726686AbgD2KZq (ORCPT + 99 others); Wed, 29 Apr 2020 06:25:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:41060 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726355AbgD2KZp (ORCPT ); Wed, 29 Apr 2020 06:25:45 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 3F218AC44; Wed, 29 Apr 2020 10:25:42 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id DD8A91E1298; Wed, 29 Apr 2020 12:25:40 +0200 (CEST) Date: Wed, 29 Apr 2020 12:25:40 +0200 From: Jan Kara To: Dave Chinner Cc: Dan Schatzberg , Jens Axboe , Alexander Viro , Jan Kara , Amir Goldstein , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Ingo Molnar , "Peter Zijlstra (Intel)" , Mathieu Desnoyers , "Kirill A. Shutemov" , Andrea Arcangeli , Thomas Gleixner , "open list:BLOCK LAYER" , open list , "open list:FILESYSTEMS (VFS and infrastructure)" , "open list:CONTROL GROUP (CGROUP)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup Message-ID: <20200429102540.GA12716@quack2.suse.cz> References: <20200428161355.6377-1-schatzberg.dan@gmail.com> <20200428214653.GD2005@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200428214653.GD2005@dread.disaster.area> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 29-04-20 07:47:34, Dave Chinner wrote: > On Tue, Apr 28, 2020 at 12:13:46PM -0400, Dan Schatzberg wrote: > > The loop device runs all i/o to the backing file on a separate kworker > > thread which results in all i/o being charged to the root cgroup. This > > allows a loop device to be used to trivially bypass resource limits > > and other policy. This patch series fixes this gap in accounting. > > How is this specific to the loop device? Isn't every block device > that offloads work to a kthread or single worker thread susceptible > to the same "exploit"? > > Or is the problem simply that the loop worker thread is simply not > taking the IO's associated cgroup and submitting the IO with that > cgroup associated with it? That seems kinda simple to fix.... > > > Naively charging cgroups could result in priority inversions through > > the single kworker thread in the case where multiple cgroups are > > reading/writing to the same loop device. > > And that's where all the complexity and serialisation comes from, > right? > > So, again: how is this unique to the loop device? Other block > devices also offload IO to kthreads to do blocking work and IO > submission to lower layers. Hence this seems to me like a generic > "block device does IO submission from different task" issue that > should be handled by generic infrastructure and not need to be > reimplemented multiple times in every block device driver that > offloads work to other threads... Yeah, I was thinking about the same when reading the patch series description. We already have some cgroup workarounds for btrfs kthreads if I remember correctly, we have cgroup handling for flush workers, now we are adding cgroup handling for loopback device workers, and soon I'd expect someone comes with a need for DM/MD worker processes and IMHO it's getting out of hands because the complexity spreads through the kernel with every subsystem comming with slightly different solution to the problem and also the number of kthreads gets multiplied by the number of cgroups. So I agree some generic solution how to approach IO throttling of kthreads / workers would be desirable. OTOH I don't have a great idea how the generic infrastructure should look like... Honza -- Jan Kara SUSE Labs, CR