Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752121AbYFINoe (ORCPT ); Mon, 9 Jun 2008 09:44:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751522AbYFINoZ (ORCPT ); Mon, 9 Jun 2008 09:44:25 -0400 Received: from mx1.redhat.com ([66.187.233.31]:51690 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751283AbYFINoY (ORCPT ); Mon, 9 Jun 2008 09:44:24 -0400 Date: Mon, 9 Jun 2008 09:44:07 -0400 From: Rik van Riel To: Andrew Morton Cc: linux-kernel@vger.kernel.org, lee.schermerhorn@hp.com, kosaki.motohiro@jp.fujitsu.com, linux-mm@kvack.org, eric.whitney@hp.com Subject: Re: [PATCH -mm 13/25] Noreclaim LRU Infrastructure Message-ID: <20080609094407.014fdfd4@bree.surriel.com> In-Reply-To: <20080608231053.31cfcfeb.akpm@linux-foundation.org> References: <20080606202838.390050172@redhat.com> <20080606202859.291472052@redhat.com> <20080606180506.081f686a.akpm@linux-foundation.org> <20080608163413.08d46427@bree.surriel.com> <20080608135704.a4b0dbe1.akpm@linux-foundation.org> <20080608173244.0ac4ad9b@bree.surriel.com> <20080608162208.a2683a6c.akpm@linux-foundation.org> <20080608193420.2a9cc030@bree.surriel.com> <20080608165434.67c87e5c.akpm@linux-foundation.org> <20080608205629.5b519110@bree.surriel.com> <20080608231053.31cfcfeb.akpm@linux-foundation.org> Organization: Red Hat, Inc. X-Mailer: Claws Mail 3.0.2 (GTK+ 2.10.4; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4526 Lines: 115 On Sun, 8 Jun 2008 23:10:53 -0700 Andrew Morton wrote: > Also, it is incumbent upon us to consider the other design proposals, > such as removing anon pages from the LRUs, removing mlocked pages from > the LRUs. That is certainly an option. We'll still need to keep track of what kind of page the page is, though, otherwise we won't know whether or not we can put it back onto the LRU lists at munlock time. > > After discussing this for a long time with Larry Woodman, > > Lee Schermerhorn and others, I am convinced that they can > > not be fixed by putting a bandaid on the current code. > > > > After all, the fundamental problem often is that the file backed > > and mem/swap backed pages are on the same LRU. > > That actually isn't a fundamental problem. > > It _becomes_ a problem because we try to treat the two types of pages > differently. > > Stupid question: did anyone try setting swappiness=100? What happened? The database shared memory segment got swapped out and the system crawled to a halt. Swap IO usually is less efficient than page cache IO, because page cache IO happens in larger chunks and does not involve a swap-out first and a swap-in later - the data is just read, which at least halves the disk IO compared to swap. Readahead tilts the IO cost even more in favor of evicting page cache pages, vs. swapping something out. > > Think of a case that is becoming more and more common: a database > > server with 128GB of RAM, 2GB of (hardly ever used) swap, 80GB of > > locked shared memory segment, 30GB of other anonymous memory and > > 5GB of page cache. > > > > Do you think it is reasonable for the VM to have to scan over > > 110GB of essentially unevictable memory, just to get at the 5GB > > of page cache? > > Well for starters that system was grossly misconfigured. Swapping out the database shared memory segment is not an option, because it is mlocked. Even if it was an option, swapping it out would be a bad idea because swap IO is simply less efficient than page cache IO (see above). > Secondly, I expect that removal of mlocked pages from the LRU (as was > discussed a year or two ago and perhaps implemented by Andrea) along > with swappiness=100 might be get us towards a fix. Don't know. Removing mlocked pages from the LRU can be done, but I suspect we'll still want to keep track of how many of these pages there are, right? > > > Because I guess we should have a think about alternative approaches. > > > > We have. We failed to come up with anything that avoids the > > problem without actually fixing the fundamental issues. > > Unless I missed it, none of your patch descriptions even attempt to > describe these fundamental issues. It's all buried in 20-deep email > threads. I'll add more problem descriptions to the next patch submission. I'm halfway the patch series making all the cleanups and changes you suggested. > One cause of problms is that we attempt to prioritise anon pages over > file-backed pagecache. And we prioritise mmapped pages, which your patches > don't address, do they? Stopping doing that would, I expect, prevent a > range of these problems. It would introduce others, probably. Try running a database with swappiness=100 and then doing a backup of the system simultaneously. The database will end up being swapped out, which slows down the database, causes extra IO and ends up slowing down the backup, too. The backup does not benefit from having its data cached, since it only reads everything once. > > Otherwise, please give us a chance to shake things out in -mm. > > -mm isn't a very useful testing place any more, I'm afraid. That's a problem. I can run tests on the VM patches, but you know as well as I do that the code needs to be shaken out by lots of users before we can be truly confident in it... > > I will prepare kernel RPMs for Fedora so users in the community can > > easily test these patches too, and help find scenarios where these > > patches do not perform as well as what the current kernel has. > > > > I have time to track down and fix any issues that people find. > > That helps. I sure hope so. I'll send you a cleaned-up patch series soon. Hopefully tonight or tomorrow. -- All rights reversed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/