Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761781AbZFRNF1 (ORCPT ); Thu, 18 Jun 2009 09:05:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752614AbZFRNFS (ORCPT ); Thu, 18 Jun 2009 09:05:18 -0400 Received: from cmpxchg.org ([85.214.51.133]:32860 "EHLO cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751755AbZFRNFR (ORCPT ); Thu, 18 Jun 2009 09:05:17 -0400 Date: Thu, 18 Jun 2009 15:01:21 +0200 From: Johannes Weiner To: Wu Fengguang Cc: "Barnes, Jesse" , Peter Zijlstra , KAMEZAWA Hiroyuki , Andrew Morton , Rik van Riel , Hugh Dickins , Andi Kleen , Minchan Kim , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: [patch v3] swap: virtual swap readahead Message-ID: <20090618130121.GA1817@cmpxchg.org> References: <1244626976.13761.11593.camel@twins> <20090610095950.GA514@localhost> <1244628314.13761.11617.camel@twins> <20090610113214.GA5657@localhost> <20090610102516.08f7300f@jbarnes-x200> <20090611052228.GA20100@localhost> <20090611101741.GA1974@cmpxchg.org> <20090612015927.GA6804@localhost> <20090615182216.GA1661@cmpxchg.org> <20090618091949.GA711@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090618091949.GA711@localhost> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3437 Lines: 79 On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote: > On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote: > > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote: > > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote: > > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote: > > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow > > > > > (even worse this time): > > > > > > > > Thanks for doing the tests. Do you know if the time difference comes > > > > from IO or CPU time? > > > > > > > > Because one reason I could think of is that the original code walks > > > > the readaround window in two directions, starting from the target each > > > > time but immediately stops when it encounters a hole where the new > > > > code just skips holes but doesn't abort readaround and thus might > > > > indeed read more slots. > > > > > > > > I have an old patch flying around that changed the physical ra code to > > > > use a bitmap that is able to represent holes. If the increased time > > > > is waiting for IO, I would be interested if that patch has the same > > > > negative impact. > > > > > > You can send me the patch :) > > > > Okay, attached is a rebase against latest -mmotm. > > > > > But for this patch it is IO bound. The CPU iowait field actually is > > > going up as the test goes on: > > > > It's probably the larger ra window then which takes away the bandwidth > > needed to load the new executables. This sucks. Would be nice to > > have 'optional IO' for readahead that is dropped when normal-priority > > IO requests are coming in... Oh, we have READA for bios. But it > > doesn't seem to implement dropping requests on load (or I am blind). > > Hi Hannes, > > Sorry for the long delay! A bad news is that I get many oom with this patch: Okay, evaluating this test-patch any further probably isn't worth it. It's too aggressive, I think readahead is stealing pages reclaimed by other allocations which in turn oom. Back to the original problem: you detected increased latency for launching new applications, so they get less share of the IO bandwidth than without the patch. I can see two reasons for this: a) the new heuristics don't work out and we read more unrelated pages than before b) we readahead more pages in total as the old code would stop at holes, as described above We can verify a) by comparing major fault numbers between the two kernels with your testload. If they increase with my patch, we anticipate the wrong slots and every fault has do the reading itself. b) seems to be a trade-off. After all, the IO resources you have less for new applications in your test is the bandwidth that is used by swapping applications. My qsbench numbers are a sign for this as the only IO going on is swap. Of course, the theory is not to improve swap performance by increasing the readahead window but to choose better readahead candidates. So I will run your tests and qsbench with a smaller page cluster and see if this improves both loads. Let me no if that doesn't make sense :) Thanks a lot for all your efforts so far, Hannes -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/