Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3050514ybt; Mon, 29 Jun 2020 13:59:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx6grQvhY7sWIPbAKPWOEgUFptY/d4uQ4PQp4sJfab6QbccmGLJk37UfYBxAtFlb/MXKyuK X-Received: by 2002:a50:d0dc:: with SMTP id g28mr18894050edf.169.1593464346356; Mon, 29 Jun 2020 13:59:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593464346; cv=none; d=google.com; s=arc-20160816; b=g86db9IuvYNgKN0gjKWbDYZocngUmygLkwOBIbcIJGO856OoQqrYK3ZL1ehtixowj7 CD5sYwP99LUE6pUpU79Mn0T7JtJYXcpfbtshBXmioLZuS01jWm8e0Ff5MOA2bJaNNNFR 6zbTKSnbnwyWYoS1QiHno0C0QRGxGJa/iHX0GgrREEMY5627hHpfBa6OF9K3bkYyB9GC dzGwbMfXURQT4HkSTKpZwz9GJDaz0a06hSaj3yAUeJ81Fm4tDzACt/jqqL46k4ikgSPN iCPhMPYfsPg1IOJDzwJTeHMHcingWNsk6e541T1F/LYfLTfx6ldaW1HXhWRzzEQyy7/9 ft8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=mPrydynbK5+rnDYJyI/EITf6/MzUF0CRG2o081Vq5dc=; b=KtWAvIvRXh8T+hnKev4YW0THLV6sL4uCKKiL6ScyfeH4hI1zNizZOo981aYHjTReKu i+Ly20rIYNk/M69dgHkyQ9pWdCKAkY/cojUQ7mRrWM7AWtr9oJGzNNsf0rgKGP6/TyiG 7dAn02wipxiRKhRDmrjgWyWPd462zOsmJMppzRmV0XByfJUnJmjeANuQhsy15zBJp/tk vt00uqTOjYf1A+L6zQ/mhLzTvqhF5xtwPBFMVPGwwrKYVDEv9xUkjj6JhQLXrZJo9kvk XZ4aDhkQsmSW/B/JJ8yL2JNOqUC7LkDNTvAagMHWmzmTmC0W/CILKJAdmBDu0fD7QQ3f Y0Mg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a9si301813edn.530.2020.06.29.13.58.43; Mon, 29 Jun 2020 13:59:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389912AbgF2U5Y (ORCPT + 99 others); Mon, 29 Jun 2020 16:57:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43380 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729386AbgF2TNG (ORCPT ); Mon, 29 Jun 2020 15:13:06 -0400 Received: from mail-ej1-x641.google.com (mail-ej1-x641.google.com [IPv6:2a00:1450:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01BD9C0147C0; Mon, 29 Jun 2020 01:22:13 -0700 (PDT) Received: by mail-ej1-x641.google.com with SMTP id dr13so15669757ejc.3; Mon, 29 Jun 2020 01:22:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mPrydynbK5+rnDYJyI/EITf6/MzUF0CRG2o081Vq5dc=; b=cAUwQQoKDC06YT4LHqdA4xDyz4JhAwt/ytb8hC2uFAgm3TaJ0OqssW2rznu5kn46N0 JEfbQYCX8VjtVSYRhGVHgjVYgSFWExMm1vkZXi1UwgkfXGAynHOqVs6b8Uyz4VSmSm7d g8vZqCOtQq89cSyg0hzhyyAdIElZ9ayjmNLD5XwRybW4IiJvn7V3HaL/koa8Rn1Zk4F7 LXKY7Wn3Qug30+dPR0hRk132yk2/ruEAxMLBk/x6JjQrK0mqQawVfXpByLN2YxpGFNJA +CRptDvNTYQkuOkCl7DSICXv2c9+yGZNOBEZtT2ghVeHYlDtKRZ0N+kEKvfD1WUSSt2Q gm9g== X-Gm-Message-State: AOAM533eUkhnbbKD9GNDNyDTvijEnMJmgU0VUOBWOli7pTBGPUWkLY6Y sQL3l0vxf7X3S8m+C5hMF7o= X-Received: by 2002:a17:906:fa9b:: with SMTP id lt27mr12396115ejb.365.1593418931557; Mon, 29 Jun 2020 01:22:11 -0700 (PDT) Received: from localhost (ip-37-188-168-3.eurotel.cz. [37.188.168.3]) by smtp.gmail.com with ESMTPSA id x10sm15592704ejc.46.2020.06.29.01.22.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jun 2020 01:22:10 -0700 (PDT) Date: Mon, 29 Jun 2020 10:22:09 +0200 From: Michal Hocko To: Dave Chinner Cc: Mikulas Patocka , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-xfs@vger.kernel.org, dm-devel@redhat.com, Jens Axboe , NeilBrown Subject: Re: [PATCH 0/6] Overhaul memalloc_no* Message-ID: <20200629082209.GC32461@dhcp22.suse.cz> References: <20200625113122.7540-1-willy@infradead.org> <20200626230847.GI2005@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200626230847.GI2005@dread.disaster.area> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 27-06-20 09:08:47, Dave Chinner wrote: > On Fri, Jun 26, 2020 at 11:02:19AM -0400, Mikulas Patocka wrote: > > Hi > > > > I suggest to join memalloc_noio and memalloc_nofs into just one flag that > > prevents both filesystem recursion and i/o recursion. > > > > Note that any I/O can recurse into a filesystem via the loop device, thus > > it doesn't make much sense to have a context where PF_MEMALLOC_NOFS is set > > and PF_MEMALLOC_NOIO is not set. > > Correct me if I'm wrong, but I think that will prevent swapping from > GFP_NOFS memory reclaim contexts. IOWs, this will substantially > change the behaviour of the memory reclaim system under sustained > GFP_NOFS memory pressure. Sustained GFP_NOFS memory pressure is > quite common, so I really don't think we want to telling memory > reclaim "you can't do IO at all" when all we are trying to do is > prevent recursion back into the same filesystem. > > Given that the loop device IO path already operates under > memalloc_noio context, (i.e. the recursion restriction is applied in > only the context that needs is) I see no reason for making that a > global reclaim limitation.... > > In reality, we need to be moving the other way with GFP_NOFS - to > fine grained anti-recursion contexts, not more broad contexts. Absolutely agreed! It is not really hard to see system struggling due to heavy FS metadata workload while there are objects which could be reclaimed. > That is, GFP_NOFS prevents recursion into any filesystem, not just > the one that we are actively operating on and needing to prevent > recursion back into. We can safely have reclaim do relcaim work on > other filesysetms without fear of recursion deadlocks, but the > memory reclaim infrastructure does not provide that capability.(*) > > e.g. if memalloc_nofs_save() took a reclaim context structure that > the filesystem put the superblock, the superblock's nesting depth > (because layering on loop devices can create cross-filesystem > recursion dependencies), and any other filesyetm private data the > fs wanted to add, we could actually have reclaim only avoid reclaim > from filesytsems where there is a deadlock possiblity. e.g: > > - superblock nesting depth is different, apply GFP_NOFS > reclaim unconditionally > - superblock different apply GFP_KERNEL reclaim > - superblock the same, pass context to filesystem to > decide if reclaim from the sueprblock is safe. > > At this point, we get memory reclaim able to always be able to > reclaim from filesystems that are not at risk of recursion > deadlocks. Direct reclaim is much more likely to be able to make > progress now because it is much less restricted in what it can > reclaim. That's going to make direct relcaim faster and more > efficient, and taht's the ultimate goal we are aiming to acheive > here... Yes, we have discussed something like that few years back at LSFMM IIRC. The scoped NOFS/NOIO api was just a first step to reduce explicit NOFS/NOIO usage with a hope that we will get no-recursion entry points much more well defined and get rid of many instances where "this is a fs code so it has to use NOFS gfp mask". Some of that has happened and that is really great. On the other hand many people still like to use that api as a workaround for an immediate problem because no-recursion scopes are much harder to recognize unless you are supper familiar with the specific fs/IO layer implementation. So this is definitely not a project for somebody to go over all code and just do the clean up. Thanks! -- Michal Hocko SUSE Labs