Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp975828ybl; Fri, 6 Dec 2019 09:12:14 -0800 (PST) X-Google-Smtp-Source: APXvYqzGSAO4no1ZHc1xiQHSgOXxUyZsiZJgOUAfiemopTZtc6gLbwyocXGgHhHv1SyAnP4vmXly X-Received: by 2002:a9d:4706:: with SMTP id a6mr10985551otf.331.1575652334240; Fri, 06 Dec 2019 09:12:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575652334; cv=none; d=google.com; s=arc-20160816; b=rIFoxuDLxWTBzM+lZ7D+yPK7uJY4SPr0tZQ8AoMJzJwV3R58zExykPvFbo0MI6X1Lz yNHbQ9eSbLhK0X3mk7FhXkFJLTD08WhbNM3dSkvAWeStxoYCPX+S/HcV0q7E2mQgwW8W 6qjDVylL1YrMw6GGu7LetWQRvTwBZNrjcAsdwFxqAY1/FhyollL5we35u7af0LMpO1SP 1U9bJMi/HR758HJS5PKv4epSpqR/occBT6hLks2x6tlHd0nm1JAkbk3VfgA1DkYf2hCo Tk/wOTK7Cz5Y3zJIfEEI1ItXa7rGsSBMjpU4e436I7IRIjG8hfx5UZcAtIxRpo3od1AF 43JQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jD0+Lu/50umCcnFk47baEBnRzer/9Wr3R9ZTFDVWkyo=; b=V4iNFea0dreKrveIvznx0TSP3mrhg2NEflyv5zGvHuZFnsl8R0NS0lAZrg5J6GPdtj 1v2ZclblAulwxYXr+RR/KCh8porzwmn2+FhgfoUsj04JbfrV0+5VbKry2n0cvrDMWhtb iEA6jLXhKYj1k7DOE9R587V9k2lR2l3hO45VOaj8HZUvAEDXggAOJy/cf3xXuyL76lvk 5ry3yCbkO8NFxpy4gX80V+cKxBK3EUyILohx33/iZzgOK2bJGKkZkEKxnEgdnVfa/PWX XU5Y1J9NiLdXJNj7YjqGiNyOjre/D2pgfhBzww6v+6VsYr+XLfY5aNGkXQc8FJDJWNZB zPjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=BtnQMvEb; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 2si6793563oii.67.2019.12.06.09.11.51; Fri, 06 Dec 2019 09:12:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=BtnQMvEb; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726464AbfLFRLj (ORCPT + 99 others); Fri, 6 Dec 2019 12:11:39 -0500 Received: from mail-ot1-f65.google.com ([209.85.210.65]:37703 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726460AbfLFRLj (ORCPT ); Fri, 6 Dec 2019 12:11:39 -0500 Received: by mail-ot1-f65.google.com with SMTP id k14so6423957otn.4 for ; Fri, 06 Dec 2019 09:11:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jD0+Lu/50umCcnFk47baEBnRzer/9Wr3R9ZTFDVWkyo=; b=BtnQMvEbwtt6/VLxRAoWXkpfefD3gvNVveWwFIJ55OWOS4sCdvULYIKLAAkPQ38NUQ VSuDsqRZFzSJjfwQrNCfJzPZ62NIdVWvCXTynbU8nKxDGJfCvMJ+2yH5YwDwxaNCtQb0 vTkiHzOWn+n5Bcu7n7lYX8QMuS97lWhR9SPmFVwP0lKo+bs8nvgpRSrlUoa3FPbJvRIW ykxfTtJYA3V5WqjWRJ4xznqgTPEP2j9SIo3DHyN4AjhJjcKbQtVyORXp1JlkNAl/Mt36 at7IPpKDE2UV+UCesHN7YzVUdIuaDZVqTf8//UOw6emtqxahLvhaFcXbno6Pb4z7HEc5 Hing== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jD0+Lu/50umCcnFk47baEBnRzer/9Wr3R9ZTFDVWkyo=; b=rLUa1u5FhDOVZMVtAqToh2wdJcRtScNcYPGWPFmW77vBDXz6D9JmM7ttpRO+7TvHSP bua+mvFeauS2dKuWCcshfNgWRHAHc7+YXRK0BFON2LU6cOnPNrm1CT7Bey56fxB+bfjE zQ7pruXy/RbFGTy7bLi4A9Icr89iRpD3DNo4ujnTfxkJAWtj0dKsGsA1bv0DGYCcV/d0 m+eXBkt/zH2Yjm7oASbggfbPSw8r2PxbI9Al9/3ezzsQ7hkhHt6JenpB1Uk1mezeJ0/0 rsjkxSd/+CqohlkYduaF2kwBq1bqb+53/dKljEt4cw9pv8yzKeHM5vtApVuxAw2gzM+Z 5w4w== X-Gm-Message-State: APjAAAWEtHHpPQqyHO5enV7PH/q7bUCt30lb5/2cW1PFdEXCLcsxo4s3 5t0v0RZ5KOiRvli8zDS33sbHOy8Br+d3zV4uaJa4Sg== X-Received: by 2002:a05:6830:10d5:: with SMTP id z21mr12202292oto.30.1575652297469; Fri, 06 Dec 2019 09:11:37 -0800 (PST) MIME-Version: 1.0 References: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> <4e2d959a-0b0e-30aa-59b4-8e37728e9793@virtuozzo.com> <20191206020953.GS2695@dread.disaster.area> In-Reply-To: <20191206020953.GS2695@dread.disaster.area> From: Shakeel Butt Date: Fri, 6 Dec 2019 09:11:25 -0800 Message-ID: Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab To: Dave Chinner Cc: Andrey Ryabinin , Pavel Tikhomirov , Andrew Morton , LKML , Cgroups , Linux MM , Johannes Weiner , Michal Hocko , Vladimir Davydov , Roman Gushchin , Chris Down , Yang Shi , Tejun Heo , Thomas Gleixner , "Kirill A . Shutemov" , Konstantin Khorenko , Kirill Tkhai , Trond Myklebust , Anna Schumaker , "J. Bruce Fields" , Chuck Lever , linux-nfs@vger.kernel.org, Alexander Viro , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, Dec 5, 2019 at 6:10 PM Dave Chinner wrote: > > [please cc me on future shrinker infrastructure modifications] > > On Mon, Dec 02, 2019 at 07:36:03PM +0300, Andrey Ryabinin wrote: > > > > On 11/30/19 12:45 AM, Pavel Tikhomirov wrote: > > > We have a problem that shrinker_rwsem can be held for a long time for > > > read in shrink_slab, at the same time any process which is trying to > > > manage shrinkers hangs. > > > > > > The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. > > > It tries to shrink something on nfs (hard) but nfs server is dead at > > > these moment already and rpc will never succeed. Generally any shrinker > > > can take significant time to do_shrink_slab, so it's a bad idea to hold > > > the list lock here. > > registering/unregistering a shrinker is not a performance critical > task. Yes, not performance critical but it can cause isolation issues. > If a shrinker is blocking for a long time, then we need to > work to fix the shrinker implementation because blocking is a much > bigger problem than just register/unregister. > Yes, we should be fixing the implementations of all shrinkers and yes it is bigger issue but we can also fix register/unregister isolation issue in parallel. Fixing all shrinkers would a tedious and long task and we should not block fixing isolation issue on it. > > > The idea of the patch is to inc a refcount to the chosen shrinker so it > > > won't disappear and release shrinker_rwsem while we are in > > > do_shrink_slab, after that we will reacquire shrinker_rwsem, dec > > > the refcount and continue the traversal. > > This is going to cause a *lot* of traffic on the shrinker rwsem. > It's already a pretty hot lock on large machines under memory > pressure (think thousands of tasks all doing direct reclaim across > hundreds of CPUs), and so changing them to cycle the rwsem on every > shrinker that will only make this worse. Esepcially when we consider > that there may be hundreds to thousands of registered shrinker > instances on large machines. > > As an example of how frequent cycling of a global lock in shrinker > instances causes issues, we used to take references to superblock > shrinker count invocations to guarantee existence. This was found to > be a scalability limitation when lots of near-empty superblocks were > present in a system (see commit d23da150a37c ("fs/superblock: avoid > locking counting inodes and dentries before reclaiming them")). > > This alleviated the problem for a while, but soon we had problems > with just taking a reference to the superblock in the callbacks that > did actual work. Hence we changed it to just take a per-superblock > rwsem to get rid of the global sb_lock spinlock in this path. See > commit eb6ef3df4faa ("trylock_super(): replacement for > grab_super_passive()". Now we don't have a scalability problem. > > IOWs, we already know that cycling a global rwsem on every > individual shrinker invocation is going to cause noticable > scalability problems. Hence I don't think that this sort of "cycle > the global rwsem faster to reduce [un]register latency" solution is > going to fly because of the runtime performance regressions it will > introduce.... > I agree with your scalability concern (though others would argue to first demonstrate the issue before adding more sophisticated scalable code). Most memory reclaim code is written without the performance or scalability concern, maybe we should switch our thinking. > > I don't think this patch solves the problem, it only fixes one minor symptom of it. > > The actual problem here the reclaim hang in the nfs. > > The nfs client is waiting on the NFS server to respond. It may > actually be that the server has hung, not the client... > > > It means that any process, including kswapd, may go into nfs inode reclaim and stuck there. > > *nod* > > > I think this should be handled on nfs/vfs level by making inode eviction during reclaim more asynchronous. > > That's what we are trying to do with similar blocking based issues > in XFS inode reclaim. It's not simple, though, because these days > memory reclaim is like a bowl full of spaghetti covered with a > delicious sauce of non-obvious heuristics and broken > functionality.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com