Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp5331305ybv; Tue, 11 Feb 2020 13:40:22 -0800 (PST) X-Google-Smtp-Source: APXvYqzb9FB/4T1kwFBr0rqjXRxwHdC+BR689lYn7HHG8tNBWpkoT64WjzgABB7J1h70N//VG6S6 X-Received: by 2002:aca:b608:: with SMTP id g8mr4292640oif.142.1581457222269; Tue, 11 Feb 2020 13:40:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581457222; cv=none; d=google.com; s=arc-20160816; b=g1PlJXICaSx7ZqkkpQMPyEocciR35SjhWfhuP2hDCrroOh1ibktMcdkpD4hfP1Xc0N 387rGqHt3etIX/Hjb8WJbww2SFoMyEPaBNuAKYoNk8P5j+xv62utl/juHixnsuW2e3nn GOGXTutpClvFvtnW37CBP/lrOV/IjSqGMzVpnCjtaX/1RZ68OGzUR5ZoFMsVV5UswN8X AFRs8sETxqHJ6s8x2+9bWTFefZGWbwpIgqyibbzm66X/9zkwnWSVWnDvqXPG42djLvvK 2XVKPTfRJKumjO1XsiMsftUvZ8UA+r65xhODwDoEr885mW/rnOXPbqRORxgtlHswNt6f Q9Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=SE2lIQx3P5mwY3EQaRSCUgCct2sPIMvI9iCNgtybvd8=; b=E+vaIXxeGwyBS0zcCeNyb8FsjqomtYZ+c3aelz/U6vpp0R8xDZJIvbquuIaxDXjTl8 n3Mslw5KKrmx7pqBqn2VN6DELP7yFQ8mXVUlmmpl7EuuZPH628669jWeaeWG7inA0FZZ QMuMPk56HoLrSmNn/tNH7H2OIOCz0YBUOlnMLGxAt475y0VWFmpMX4rBhcvzZoIBNb/B ajB0VkQDBrfO+6sCXmiL75fkppsBGOveUIe3dAL/kOhFmBQO0U8j04O/aEvBY/Bf5jZP jbnFKjn+0FPSN3gO5FtczqVfelFw9l/Qk4YuT67LxdruWZUoSvwyETGD18+UpLT+OJ5x eNBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=ReYqpjfP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v25si2419376oth.274.2020.02.11.13.40.10; Tue, 11 Feb 2020 13:40:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20150623.gappssmtp.com header.s=20150623 header.b=ReYqpjfP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731674AbgBKTbD (ORCPT + 99 others); Tue, 11 Feb 2020 14:31:03 -0500 Received: from mail-qk1-f193.google.com ([209.85.222.193]:41871 "EHLO mail-qk1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731667AbgBKTbD (ORCPT ); Tue, 11 Feb 2020 14:31:03 -0500 Received: by mail-qk1-f193.google.com with SMTP id d11so11288484qko.8 for ; Tue, 11 Feb 2020 11:31:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=SE2lIQx3P5mwY3EQaRSCUgCct2sPIMvI9iCNgtybvd8=; b=ReYqpjfPtN0mGGjim5Vxe1D7me7yyrMd6UiO9aJzmfLByAaHdNIpNtjB7n9qUXmSFK ugbHGIs7g+cxoaYJKyWMPXVoHRIpfbOJc6bSrffd7tbAPYExseRBxWT8fqYhJRuyEDR/ EnaUw8YK9RIQtJs8emRfcYw1FZSNtripVX5NARu5L+oeYeanwYE1TuLzfAkLz06lmhey KYplkiBKiR91Vn6oMybNQNj3JC2O/Zv5Zm0qsP9fsU+yYoTa+wpQkt5GMLaWHWpd/iAY BeCMdehRlvjIbQYeFvnPpTl0bz7+9JMPd5RbW/52X+D2jSSvxfEIWumljFERaAeX8gRX LyKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=SE2lIQx3P5mwY3EQaRSCUgCct2sPIMvI9iCNgtybvd8=; b=YOJ51BohuEaqJPVW9LH/CU5xe1bsebZ+JNyTP7FQ/JABFbcNifjovSu3JhCC+7QdBp fbL0jRDmnby5uthZf0eywFliiNtk3ZK1nwbgu0vrpLNI5YW3V6VG2tfHmEAADUZVwvNR /9jP/bbPContgqQUhQn7Gc/Apt6URex84tZv0YoG4rBmL9LY2TdjGtxr06nwLFkknKxE 9nOEbaQV9p9Cw3w4mLAY8OyZlQ2jRp5d231jsVmbO2h+QIeA9JSy4Jk2WYQom64CHNQ/ KLICuDMAY8wOUHYRJ4Q1kT1tnG/ihxE6R+unvLfFgq1H/SMGUaOToXokNJwjPVXx4PE9 lUNA== X-Gm-Message-State: APjAAAU7cXvVU6ibrwuQkG+zzLdPVfvW0YMvTZhYkCZgW8iki5ItdzER SZKqcAp+hGD2qks58hdn9dN5Tg== X-Received: by 2002:a05:620a:1010:: with SMTP id z16mr4305510qkj.237.1581449462363; Tue, 11 Feb 2020 11:31:02 -0800 (PST) Received: from localhost ([2620:10d:c091:500::3:3189]) by smtp.gmail.com with ESMTPSA id z21sm2537612qka.122.2020.02.11.11.31.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Feb 2020 11:31:01 -0800 (PST) Date: Tue, 11 Feb 2020 14:31:01 -0500 From: Johannes Weiner To: Rik van Riel Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Chinner , Yafang Shao , Michal Hocko , Roman Gushchin , Andrew Morton , Linus Torvalds , Al Viro , kernel-team@fb.com Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU Message-ID: <20200211193101.GA178975@cmpxchg.org> References: <20200211175507.178100-1-hannes@cmpxchg.org> <29b6e848ff4ad69b55201751c9880921266ec7f4.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29b6e848ff4ad69b55201751c9880921266ec7f4.camel@surriel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 11, 2020 at 02:05:38PM -0500, Rik van Riel wrote: > On Tue, 2020-02-11 at 12:55 -0500, Johannes Weiner wrote: > > The VFS inode shrinker is currently allowed to reclaim inodes with > > populated page cache. As a result it can drop gigabytes of hot and > > active page cache on the floor without consulting the VM (recorded as > > "inodesteal" events in /proc/vmstat). > > > > This causes real problems in practice. Consider for example how the > > VM > > would cache a source tree, such as the Linux git tree. As large parts > > of the checked out files and the object database are accessed > > repeatedly, the page cache holding this data gets moved to the active > > list, where it's fully (and indefinitely) insulated from one-off > > cache > > moving through the inactive list. > > > This behavior of invalidating page cache from the inode shrinker goes > > back to even before the git import of the kernel tree. It may have > > been less noticeable when the VM itself didn't have real workingset > > protection, and floods of one-off cache would push out any active > > cache over time anyway. But the VM has come a long way since then and > > the inode shrinker is now actively subverting its caching strategy. > > Two things come to mind when looking at this: > - highmem > - NUMA > > IIRC one of the reasons reclaim is done in this way is > because a page cache page in one area of memory (highmem, > or a NUMA node) can end up pinning inode slab memory in > another memory area (normal zone, other NUMA node). That's a good point, highmem does ring a bell now that you mention it. If we still care, I think this could be solved by doing something similar to what we do with buffer_heads_over_limit: allow a lowmem allocation to reclaim page cache inside the highmem zone if the bhs (or inodes in this case) have accumulated excessively. AFAICS, we haven't done anything similar for NUMA, so it might not be much of a problem there. I could imagine this is in part because NUMA nodes tend to be more balanced in size, and the ratio between cache memory and inode/bh memory means that these objects won't turn into a significant externality. Whereas with extreme highmem:lowmem ratios, they can.