From: Nikita Danilov <nikita@clusterfs.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <16786.5789.465433.655127@thebsh.namesys.com>
Date: Wed, 10 Nov 2004 16:24:45 +0300
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Andrew Morton <akpm@osdl.org>, 76306.1226@compuserve.com,
       linux-kernel@vger.kernel.org, nickpiggin@yahoo.com.au
Subject: Re: balance_pgdat(): where is total_scanned ever updated?
In-Reply-To: <20041109185221.GA8414@logos.cnet>
References: <200411061418_MC3-1-8E17-8B6C@compuserve.com>
	<20041106161114.1cbb512b.akpm@osdl.org>
	<20041109104220.GB6326@logos.cnet>
	<20041109113620.16b47e28.akpm@osdl.org>
	<20041109180223.GG7632@logos.cnet>
	<20041109134032.124b55fa.akpm@osdl.org>
	<20041109185221.GA8414@logos.cnet>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2444
Lines: 69

Marcelo Tosatti writes:

[...]

 > 
 > Another related thing I noted this afternoon is that right now kswapd will
 > always block on full queues:
 > 
 > static int may_write_to_queue(struct backing_dev_info *bdi)
 > {
 >         if (current_is_kswapd())
 >                 return 1;
 >         if (current_is_pdflush())       /* This is unlikely, but why not... */
 >                 return 1;
 >         if (!bdi_write_congested(bdi))
 >                 return 1;
 >         if (bdi == current->backing_dev_info)
 >                 return 1;
 >         return 0;
 > }
 > 
 > We should make kswapd use the "bdi_write_congested" information and avoid
 > blocking on full queues. It should improve performance on multi-device 
 > systems with intense VM loads.

This will have following undesirable side effect: if
may_write_to_queue() returns false, page is not paged out, instead it is
thrown to the head of the inactive queue, thus destroying "LRU
ordering", shrink_list() will dive deeper into inactive list, reclaiming
hotter pages.

It's OK to accidentially skip pageout in direct reclaim path, because

 - we hope most pageout is done by kswapd, and

 - we don't want __alloc_pages() to stall

but _something_ in the kernel should take a pain of actually writing
pages out in LRU order.

 > 
 > Maybe something along the lines 
 > 
 > "if the reclaim ratio is high, do not writepage"
 > "if the reclaim ratio is below high, writepage but not block"
 > "if the reclaim ratio is low, writepage and block"

If kswapd blocking is a concern, inactive list scanning should be
decoupled from actual page-out (a la Solaris): kswapd queues pages to
the yet another kernel thread that calls pageout().

I played with this idea (see
http://nikita.w3.to/code/patches/2-6-10-rc1/async-writepage.txt note
that async_writepage() has to be adjusted to work for kswapd), but while
in some cases (large concurrent builds) it does provide a benefit, in
other cases (heavy write through mmap) it makes throughput slightly
worse.

Besides, this doesn't completely avoid the problem of destroying LRU
ordering, as kswapd still proceeds further through inactive list while
pages are sent out asynchronously.

Nikita.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/