Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp4631859ybz; Tue, 28 Apr 2020 15:16:35 -0700 (PDT) X-Google-Smtp-Source: APiQypLSUws+nJLV4teqU5kUav53VDioj4vtuU/x0JL+aBJqc9ZE/4K1hFUOKkzinMNqp8MA8BzH X-Received: by 2002:a05:6402:2293:: with SMTP id cw19mr19175665edb.351.1588112195780; Tue, 28 Apr 2020 15:16:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588112195; cv=none; d=google.com; s=arc-20160816; b=XsqZ1pvDsOfqgWe0D1iXGr/GmRU0v4KwV8/RazG4D7w3G8CR8fFwElVm0QX0rK/fac UyeL3m+R5v1SC7MzIaDSwFEsMZ6M67xYJVX3kxAOQugMJP8DXdg6gkuQo+YssGEKXVLh T/8jH4NFOo0NiYKlvODQBbPKuq6XPTAlWZkQN/F0TdHASCQ6beeryNMkqeXxDilX0431 TOSsbM7W0PXgJCgnbzkVM0ZIOSfAy3HIwrdvQUGiqPWBSWHAd05X2L7M01FlxibzTTih cgpli+Foj+iCn6+5/VBAwTZ2gcHG9JJO1XzmCvy4pz4bC3LlPn4YLqNRoH/klMyaOnCR Xs5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=L0IFWO+zzvP3O1g6y6lfc/TyBwzg1m3M9K0O/OnNDXA=; b=c9VDqRGZUl8IflXTQBaxFxH8TuZF5STyqIoWhB9E83odvONkQMGZvO4OsNtLdp2rjm v0T/jpjmaSreDhBkaYBFD7weoap5QVbhcJAWrpH7G4YYp07WCxLIiQVQqXnkITKVpV/F nXqQ3a/n5ggUGyMk97e4fA6foX4PqCq+o6mRivL3/B/HzHjN46ZLGFZSl43ZJMs7nG+I zl+ahZpoJy2T33uhn3MFtuaEekbOtzM2svOXkkBZB2ZCazOUbZuz+nPtc0e0n2UIQeQn DSU4bp0Bjn5NRQre4MZ50gdTxlq+fJWdu0pfc1ml29KN5PDhmaeJtnRbPTkSAGl7WtMn OlyQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a18si2674730ejy.194.2020.04.28.15.16.09; Tue, 28 Apr 2020 15:16:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726783AbgD1WOZ (ORCPT + 99 others); Tue, 28 Apr 2020 18:14:25 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:60412 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726571AbgD1WOY (ORCPT ); Tue, 28 Apr 2020 18:14:24 -0400 Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 543753A44C1; Wed, 29 Apr 2020 07:47:35 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jTY4k-0008LL-IM; Wed, 29 Apr 2020 07:47:34 +1000 Date: Wed, 29 Apr 2020 07:47:34 +1000 From: Dave Chinner To: Dan Schatzberg Cc: Jens Axboe , Alexander Viro , Jan Kara , Amir Goldstein , Tejun Heo , Li Zefan , Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Hugh Dickins , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Ingo Molnar , "Peter Zijlstra (Intel)" , Mathieu Desnoyers , "Kirill A. Shutemov" , Andrea Arcangeli , Thomas Gleixner , "open list:BLOCK LAYER" , open list , "open list:FILESYSTEMS (VFS and infrastructure)" , "open list:CONTROL GROUP (CGROUP)" , "open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" Subject: Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup Message-ID: <20200428214653.GD2005@dread.disaster.area> References: <20200428161355.6377-1-schatzberg.dan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200428161355.6377-1-schatzberg.dan@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=X6os11be c=1 sm=1 tr=0 a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17 a=kj9zAlcOel0A:10 a=cl8xLZFz6L8A:10 a=7-415B0cAAAA:8 a=1gqJplI6GZ5HzG39sUQA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Apr 28, 2020 at 12:13:46PM -0400, Dan Schatzberg wrote: > The loop device runs all i/o to the backing file on a separate kworker > thread which results in all i/o being charged to the root cgroup. This > allows a loop device to be used to trivially bypass resource limits > and other policy. This patch series fixes this gap in accounting. How is this specific to the loop device? Isn't every block device that offloads work to a kthread or single worker thread susceptible to the same "exploit"? Or is the problem simply that the loop worker thread is simply not taking the IO's associated cgroup and submitting the IO with that cgroup associated with it? That seems kinda simple to fix.... > Naively charging cgroups could result in priority inversions through > the single kworker thread in the case where multiple cgroups are > reading/writing to the same loop device. And that's where all the complexity and serialisation comes from, right? So, again: how is this unique to the loop device? Other block devices also offload IO to kthreads to do blocking work and IO submission to lower layers. Hence this seems to me like a generic "block device does IO submission from different task" issue that should be handled by generic infrastructure and not need to be reimplemented multiple times in every block device driver that offloads work to other threads... > This patch series does some > minor modification to the loop driver so that each cgroup can make > forward progress independently to avoid this inversion. > > With this patch series applied, the above script triggers OOM kills > when writing through the loop device as expected. NACK! The IO that is disallowed should fail with ENOMEM or some similar error, not trigger an OOM kill that shoots some innocent bystander in the head. That's worse than using BUG() to report errors... Cheers, Dave. -- Dave Chinner david@fromorbit.com