Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2304864pxb; Mon, 12 Apr 2021 21:30:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQABtsZ9YSVL64LeFhAtgLe+PPdsf0itXpTAueR5qTIMmXaiNWNwt6CuJ+v954lPzHKNEp X-Received: by 2002:a05:6402:1517:: with SMTP id f23mr33465655edw.272.1618288234381; Mon, 12 Apr 2021 21:30:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618288234; cv=none; d=google.com; s=arc-20160816; b=vBKUdxxp8qTYSmmsBxfpWM0+YIX945Z+G0r357IdrImko15VrcGJop+eLQonMqBHN0 zZcVCMniwsVUZ43Fq3wb4fpfyISFZTGgQ4TRKh7vTJ8v51mDBxggjSQDhexCHMotRygZ qZqFnfwRUVooftrJZ8TxLxADvTd69Bxl2gX8c+mAuElWhUlDdyJqC+6XXuiQmwet03Nv eplrgjlweYxhyulZgM8Jw923TuIDabT+hVTqfFmYyQIkKQmoOmHU/j4x1bnPVnBRk3uV ZmUv13l7MLmzXiaTiXtDQAOt0mHd17mj+1VjLVI1/x/PPE3lv5Lc+s9XX/Q3zAIy6TXR F0Pw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Kfoth3DUauDj/zrpRCce8PdDzNFJJLR4/XpIPoAKXQU=; b=ceyqYRGz9pL5su/GTiWl42QxOxXmJ2bVuG4751kOG3uo979QwOgL4/B+ma/hkt1riz HydPpVw4T8uLlqJlsiQqZwYL8SKAkdJa1+2DXsNcihIwcFs9Hx/BbFoEkZBqTuq1dPCq CgRoUsX7QcsRyI0aF2F003R+Fd4KmP1iQsmUvfZTbiS3gaSxDDsO6M3g24u4XlMcxppK XIXmXFRivqH4wd5HJ+Kcrol2EBr6F8jixihPrJr2FsnCviDM8HrFBAcjoOoxIQE7bMAS nZ1+f35wBUJDB0wTX9fP1h2qU8IZnqaAssBrchIhCe/dSkKzXEU9pAoclEiv0mn1D8vL Jr4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=Owzv05Wh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ox26si9263553ejb.146.2021.04.12.21.30.08; Mon, 12 Apr 2021 21:30:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=Owzv05Wh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242991AbhDLPqE (ORCPT + 99 others); Mon, 12 Apr 2021 11:46:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242882AbhDLPqD (ORCPT ); Mon, 12 Apr 2021 11:46:03 -0400 Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C3F4C06174A for ; Mon, 12 Apr 2021 08:45:45 -0700 (PDT) Received: by mail-qt1-x82b.google.com with SMTP id y12so10251140qtx.11 for ; Mon, 12 Apr 2021 08:45:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Kfoth3DUauDj/zrpRCce8PdDzNFJJLR4/XpIPoAKXQU=; b=Owzv05Whms1oDjuZM/6YbmVlFXK8+Ru+WOmEeTUw67LZ9MzyO3tjJcMFUYIgu8JsB/ F+sdUXv/uPOYDFFOuLUXCEhTfHNicqCX2k8/+9mUBv6WL89MISNCnmgx/F/je51Oqvhh ZKfJQzNtFnPaqxkS5robFFOh/ZI2NYKdCGFjCaqgQJrNwko2SThfMxcwbaWyFYSq2cb8 87AjDiof+oM9uMz9xSWz3ehvEmrX6eySasPvyQHIVtg+6Vr8Rtwy44bpcBJoyOZx/5yy YgGWHnrkJw2Q77RhyHZxAINS4Z2N8V8UXge4yxiozkuQWMOPCzhfLxSHpwy4L4YjZVW3 kZLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Kfoth3DUauDj/zrpRCce8PdDzNFJJLR4/XpIPoAKXQU=; b=YzLQT4WCLokfCbrXOzOPzE/WeBQ+ex4dgsUuxlJ87gE3hqNVnKwavrka0p9KwWDkJn KgFD8HLD7zsVnNF+NkCOQNJpeol/CX58JLZBn79UKPP3+hmYFbwL+6NbB4+btVeaJ7T+ 4diQrFQYN8tHTld1qbcl3sAdn+gEHM1ecFvfqmAL1N7Xp5ioaNVxScGicc8bOHwyMeNZ WgXr9IGL/vv7Lon2gcYRv9ok0TpuJYId+NXjLKKcIK/w12JCGzGyMIuP83NfQGO6WW3V v/zVBZY7X3l+Ang+XobdzezvrFYyoXF9UIJGbD55yWGKMyzzq9h1LbrPDJPHXL84SDAN c8Iw== X-Gm-Message-State: AOAM531yX7wByrk6Y5q+tRJ6MDhZZMWmIQEK6n5/0UYuZ1oA/YD+Rkvt T711pro9oX4m4g0Q3lI7GUzXzQ== X-Received: by 2002:ac8:4b7b:: with SMTP id g27mr25360040qts.220.1618242344613; Mon, 12 Apr 2021 08:45:44 -0700 (PDT) Received: from localhost (70.44.39.90.res-cmts.bus.ptd.net. [70.44.39.90]) by smtp.gmail.com with ESMTPSA id k17sm1523563qtp.26.2021.04.12.08.45.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Apr 2021 08:45:44 -0700 (PDT) Date: Mon, 12 Apr 2021 11:45:43 -0400 From: Johannes Weiner To: Andrew Morton , Jens Axboe Cc: Dan Schatzberg , Tejun Heo , Zefan Li , Michal Hocko , Vladimir Davydov , Hugh Dickins , Shakeel Butt , Roman Gushchin , Muchun Song , Yang Shi , Alex Shi , Alexander Duyck , Wei Yang , "open list:BLOCK LAYER" , open list , "open list:CONTROL GROUP (CGROUP)" , "open list:MEMORY MANAGEMENT" Subject: Re: [PATCH V12 0/3] Charge loop device i/o to issuing cgroup Message-ID: References: <20210402191638.3249835-1-schatzberg.dan@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210402191638.3249835-1-schatzberg.dan@gmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It looks like all feedback has been addressed and there hasn't been any new activity on it in a while. As per the suggestion last time [1], Andrew, Jens, could this go through the -mm tree to deal with the memcg conflicts? [1] https://lore.kernel.org/lkml/CALvZod6FMQQC17Zsu9xoKs=dFWaJdMC2Qk3YiDPUUQHx8teLYg@mail.gmail.com/ On Fri, Apr 02, 2021 at 12:16:31PM -0700, Dan Schatzberg wrote: > No major changes, rebased on top of latest mm tree > > Changes since V12: > > * Small change to get_mem_cgroup_from_mm to avoid needing > get_active_memcg > > Changes since V11: > > * Removed WQ_MEM_RECLAIM flag from loop workqueue. Technically, this > can be driven by writeback, but this was causing a warning in xfs > and likely other filesystems aren't equipped to be driven by reclaim > at the VFS layer. > * Included a small fix from Colin Ian King. > * reworked get_mem_cgroup_from_mm to institute the necessary charge > priority. > > Changes since V10: > > * Added page-cache charging to mm: Charge active memcg when no mm is set > > Changes since V9: > > * Rebased against linus's branch which now includes Roman Gushchin's > patch this series is based off of > > Changes since V8: > > * Rebased on top of Roman Gushchin's patch > (https://lkml.org/lkml/2020/8/21/1464) which provides the nesting > support for setting active memcg. Dropped the patch from this series > that did the same thing. > > Changes since V7: > > * Rebased against linus's branch > > Changes since V6: > > * Added separate spinlock for worker synchronization > * Minor style changes > > Changes since V5: > > * Fixed a missing css_put when failing to allocate a worker > * Minor style changes > > Changes since V4: > > Only patches 1 and 2 have changed. > > * Fixed irq lock ordering bug > * Simplified loop detach > * Added support for nesting memalloc_use_memcg > > Changes since V3: > > * Fix race on loop device destruction and deferred worker cleanup > * Ensure charge on shmem_swapin_page works just like getpage > * Minor style changes > > Changes since V2: > > * Deferred destruction of workqueue items so in the common case there > is no allocation needed > > Changes since V1: > > * Split out and reordered patches so cgroup charging changes are > separate from kworker -> workqueue change > > * Add mem_css to struct loop_cmd to simplify logic > > The loop device runs all i/o to the backing file on a separate kworker > thread which results in all i/o being charged to the root cgroup. This > allows a loop device to be used to trivially bypass resource limits > and other policy. This patch series fixes this gap in accounting. > > A simple script to demonstrate this behavior on cgroupv2 machine: > > ''' > #!/bin/bash > set -e > > CGROUP=/sys/fs/cgroup/test.slice > LOOP_DEV=/dev/loop0 > > if [[ ! -d $CGROUP ]] > then > sudo mkdir $CGROUP > fi > > grep oom_kill $CGROUP/memory.events > > # Set a memory limit, write more than that limit to tmpfs -> OOM kill > sudo unshare -m bash -c " > echo \$\$ > $CGROUP/cgroup.procs; > echo 0 > $CGROUP/memory.swap.max; > echo 64M > $CGROUP/memory.max; > mount -t tmpfs -o size=512m tmpfs /tmp; > dd if=/dev/zero of=/tmp/file bs=1M count=256" || true > > grep oom_kill $CGROUP/memory.events > > # Set a memory limit, write more than that limit through loopback > # device -> no OOM kill > sudo unshare -m bash -c " > echo \$\$ > $CGROUP/cgroup.procs; > echo 0 > $CGROUP/memory.swap.max; > echo 64M > $CGROUP/memory.max; > mount -t tmpfs -o size=512m tmpfs /tmp; > truncate -s 512m /tmp/backing_file > losetup $LOOP_DEV /tmp/backing_file > dd if=/dev/zero of=$LOOP_DEV bs=1M count=256; > losetup -D $LOOP_DEV" || true > > grep oom_kill $CGROUP/memory.events > ''' > > Naively charging cgroups could result in priority inversions through > the single kworker thread in the case where multiple cgroups are > reading/writing to the same loop device. This patch series does some > minor modification to the loop driver so that each cgroup can make > forward progress independently to avoid this inversion. > > With this patch series applied, the above script triggers OOM kills > when writing through the loop device as expected. > > Dan Schatzberg (3): > loop: Use worker per cgroup instead of kworker > mm: Charge active memcg when no mm is set > loop: Charge i/o to mem and blk cg > > drivers/block/loop.c | 244 ++++++++++++++++++++++++++++++------- > drivers/block/loop.h | 15 ++- > include/linux/memcontrol.h | 6 + > kernel/cgroup/cgroup.c | 1 + > mm/filemap.c | 2 +- > mm/memcontrol.c | 49 +++++--- > mm/shmem.c | 4 +- > 7 files changed, 253 insertions(+), 68 deletions(-) > > -- > 2.30.2 > >