Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp184247ybl; Wed, 4 Dec 2019 00:36:51 -0800 (PST) X-Google-Smtp-Source: APXvYqyjXMgLtj2wOMMmxG5R53yr1DnScLYVeY9gfN8MAMjL39dV6xS/SMO5pHi1LIvewtt3EgyA X-Received: by 2002:a54:4511:: with SMTP id l17mr1656560oil.91.1575448611842; Wed, 04 Dec 2019 00:36:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575448611; cv=none; d=google.com; s=arc-20160816; b=zvdtfly+mrFs3CVnlUbLBChWQvU82Mj2XpyL6t6ugHQAVZ2Boa2MZ4AvdKq8CQA100 k9h3hanRgrZgvdwbzM6Wy/ncZRq4yUd0sXDlc2aOhTyeA6KKZK2EdMTQXocloH332Ob/ l6Gm2qZSYM9jivHz99RhQGCau6FSq7EcQdrLjbvVE48b8TX6eGJazwX4GJWOjdPIqpds 31qVNCIpPK/ZUjrsQO5jQUMj8XQ5NFRpF49C/xUfLvwJuze7a5PXUmGS9mQdYGwDe5tb eoFBUo0cCb49x9WMWXaESXhrIKplwKggaCdUMpl9hfVRI/ftckPuWkSjf+xleR5v3EKl q00Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=qV8BnIZcKdGe8g/B3hjF/qhzHGhiwqhh1CS3JHi7+oY=; b=PWpbpOZ62E017/HCr4WRCmHkyLnZwDOQvjn91qjK9WOceori1mP7r18chuAC/iFBqf 44pgn4UkDqV/RwX0vQ0UMK4DaUgeCBT9JXI0xOXwPbE9+aLWCNgZ1pJhdUOkavXD5f5j 0rfdv01GdjU6W6pxTevZAc7RXuxbZ5VNU095UfJ43yunBfO3m/PS0/TyM9/Ty94mkIn2 NcLngammhanFViZlf96uVvMVNzFFwBFEl5PivnRKXkU/I9iFabq9z5lDiXo5QO5YhDsY y2sESH7wvvPoSLysW41WQOI4zLjsFFK77ps2I0cgKhBzIBy9q3aqsDCqCJqdnExQw7d0 PB1A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si2551449oti.314.2019.12.04.00.36.38; Wed, 04 Dec 2019 00:36:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727724AbfLDIfT (ORCPT + 99 others); Wed, 4 Dec 2019 03:35:19 -0500 Received: from mail-wr1-f65.google.com ([209.85.221.65]:43511 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbfLDIfS (ORCPT ); Wed, 4 Dec 2019 03:35:18 -0500 Received: by mail-wr1-f65.google.com with SMTP id d16so2804122wre.10; Wed, 04 Dec 2019 00:35:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=qV8BnIZcKdGe8g/B3hjF/qhzHGhiwqhh1CS3JHi7+oY=; b=dJ+iOk0nrjo08ouSvTi9Phu9bYVfP8B8lGOK/FxWccFa2S09GSt6sHQiSJDY6in7o5 NToXPggo/i+sF+YzJiwmccXWyU+Nwx9BjbqbIt0WMJtYpj9FMaSf2LWoiCJ7bBKk+s+1 l3uNBaMfXJ4dRGOuTYXJv/10g0JbwUAtA+DJFdWRDi3J/OAW7STpDI3BYwRDopvI3TFD utOyn4i/biUJEO6L3RcgQFatG/yOPY9NZ8Tf/m4jo4V5OJCPPyEiq0QqAi1YFjn2xjf9 t53xVNVBrBnXNoGaay+S2E9Zw47aeJKCia6EA1t4MVsOZipgiEeDhpUosN8Y25K2232K y4zg== X-Gm-Message-State: APjAAAUIrOt11L1Fbxzp80aAYKzvADJKdr4um+qZ3F2u1bphyq4cgagH gvUcxW5vYkJrnhSnxDv7ZHE= X-Received: by 2002:adf:dd52:: with SMTP id u18mr2622697wrm.131.1575448515963; Wed, 04 Dec 2019 00:35:15 -0800 (PST) Received: from localhost (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id w13sm7529074wru.38.2019.12.04.00.35.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Dec 2019 00:35:15 -0800 (PST) Date: Wed, 4 Dec 2019 09:35:14 +0100 From: Michal Hocko To: Pavel Tikhomirov Cc: Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Tejun Heo , Thomas Gleixner , "Kirill A . Shutemov" , Konstantin Khorenko , Kirill Tkhai , Andrey Ryabinin Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab Message-ID: <20191204083514.GC25242@dhcp22.suse.cz> References: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat 30-11-19 00:45:41, Pavel Tikhomirov wrote: > We have a problem that shrinker_rwsem can be held for a long time for > read in shrink_slab, at the same time any process which is trying to > manage shrinkers hangs. > > The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. > It tries to shrink something on nfs (hard) but nfs server is dead at > these moment already and rpc will never succeed. Generally any shrinker > can take significant time to do_shrink_slab, so it's a bad idea to hold > the list lock here. Yes, this is a known problem and people have already tried to address it in the past. Have you checked previous attempts? SRCU based one http://lkml.kernel.org/r/153365347929.19074.12509495712735843805.stgit@localhost.localdomain but I believe there were others (I only had this one in my notes). Please make sure to Cc Dave Chinner when posting a next version because he had some concerns about the change of the behavior. > We have a similar problem in shrink_slab_memcg, except that we are > traversing shrinker_map+shrinker_idr there. > > The idea of the patch is to inc a refcount to the chosen shrinker so it > won't disappear and release shrinker_rwsem while we are in > do_shrink_slab, after that we will reacquire shrinker_rwsem, dec > the refcount and continue the traversal. The reference count part makes sense to me. RCU role needs a better explanation. Also do you have any reason to not use completion for the final step? Openconding essentially the same concept sounds a bit awkward to me. > We also need a wait_queue so that unregister_shrinker can wait for the > refcnt to become zero. Only after these we can safely remove the > shrinker from list and idr, and free the shrinker. [...] > crash> bt ... > PID: 18739 TASK: ... CPU: 3 COMMAND: "bash" > #0 [...] __schedule at ... > #1 [...] schedule at ... > #2 [...] rpc_wait_bit_killable at ... [sunrpc] > #3 [...] __wait_on_bit at ... > #4 [...] out_of_line_wait_on_bit at ... > #5 [...] _nfs4_proc_delegreturn at ... [nfsv4] > #6 [...] nfs4_proc_delegreturn at ... [nfsv4] > #7 [...] nfs_do_return_delegation at ... [nfsv4] > #8 [...] nfs4_evict_inode at ... [nfsv4] > #9 [...] evict at ... > #10 [...] dispose_list at ... > #11 [...] prune_icache_sb at ... > #12 [...] super_cache_scan at ... > #13 [...] do_shrink_slab at ... Are NFS people aware of this? Because this is simply not acceptable behavior. Memory reclaim cannot be block indefinitely or for a long time. There must be a way to simply give up if the underlying inode cannot be reclaimed. I still have to think about the proposed solution. It sounds a bit over complicated to me. -- Michal Hocko SUSE Labs