Date: Tue, 9 Nov 2004 16:52:21 -0200
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Andrew Morton <akpm@osdl.org>
Cc: 76306.1226@compuserve.com, linux-kernel@vger.kernel.org,
       nickpiggin@yahoo.com.au
Subject: Re: balance_pgdat(): where is total_scanned ever updated?
Message-ID: <20041109185221.GA8414@logos.cnet>
References: <200411061418_MC3-1-8E17-8B6C@compuserve.com> <20041106161114.1cbb512b.akpm@osdl.org> <20041109104220.GB6326@logos.cnet> <20041109113620.16b47e28.akpm@osdl.org> <20041109180223.GG7632@logos.cnet> <20041109134032.124b55fa.akpm@osdl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20041109134032.124b55fa.akpm@osdl.org>
User-Agent: Mutt/1.5.5.1i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3973
Lines: 107

On Tue, Nov 09, 2004 at 01:40:32PM -0800, Andrew Morton wrote:
> > > > > I had a patch which fixes it in -mm for a while.  It does increase the
> > > > > number of pages which are reclaimed via direct reclaim and decreases the
> > > > > number of pages which are reclaimed by kswapd.  As one would expect from
> > > > > throttling kswapd.  This seems undesirable.
> > > > 
> > > > Hi Andrew,
> > > > 
> > > > Do you have any numbers to backup the claim "It does increase the
> > > > number of pages which are reclaimed via direct reclaim and decreases the
> > > > number of pages which are reclaimed by kswapd", please?
> > > 
> > > Run a workload and watch /proc/vmstat.  iirc, the one-line total_scanned
> > > fix takes the kswapd-vs-direct reclaim rate from 1:1 to 1:3 or thereabouts.
> > 
> > You're talking about laptop_mode ONLY, then?
> 
> No, not at all.
> 
> If we restore the total_scanned logic then kswapd will throttle itself, as
> designed.  Regardless of laptop_mode.  I did that, and monitored the page
> scanning and reclaim rates under various workloads.  I observed that with
> the fix in place, kswapd performed less page reclaim and direct-reclaim
> performed more reclaim.  And I wasn't able to demonstrate any benchmark
> improvements with the fix in place, so things are left as they are.

Ah, OK, I understand what you mean. I was thinking about sc->may_writepage 
only and its effects on shrink_list/pageout.

You remind me about the self throttling (blk_congestion_wait).
It makes sense now.

Andrea noted that blk_congestion_wait waits on IO which is not generated by 
reclaim - which is indeed a bad thing - it should only wait on IO which the
VM itself has started.

> > How can that have any effect if may_writepage is ignored if !laptop_mode? 
> 
> This is to do with kswapd throttling.  If we put kswapd to sleep more
> often, it does less scanning and reclaiming.

OK! 

> > About /proc/vmstat - each output is huge - do you actually read those?
> 
> yup.
> 
> 	cat /proc/vmstat > /tmp/1
> 	run workload
> 	cat /proc/vmstat > /tmp/2
> 	analyse /tmp/1 and /tmp/2

Will do that more often. :) 

> > We need a vmstat like tool for that information to be readable.
> 
> Would be nice.

I've been thinking on doing a Python based tool someday.

> > > > Because linux-2.6.10-rc1-mm2 (and 2.6.9) completly ignores sc->may_writepage 
> > > > under normal operation, its only used when laptop_mode is on:
> > > > 
> > > > 		if (laptop_mode && !sc->may_writepage)
> > > > 			goto keep_locked;
> > > > 
> > > > Is this intentional ???
> > > 
> > > yup.  In laptop mode we try to scan further to find a clean page rather
> > > than spinning up the disk for a writepage.
> > 
> > It might be interesting to use sc->may_writepage independantly of
> > laptop mode (ie make kswapd only writeout pages if the reclaim ratio 
> > is low).
> 
> sure.

Another related thing I noted this afternoon is that right now kswapd will
always block on full queues:

static int may_write_to_queue(struct backing_dev_info *bdi)
{
        if (current_is_kswapd())
                return 1;
        if (current_is_pdflush())       /* This is unlikely, but why not... */
                return 1;
        if (!bdi_write_congested(bdi))
                return 1;
        if (bdi == current->backing_dev_info)
                return 1;
        return 0;
}

We should make kswapd use the "bdi_write_congested" information and avoid
blocking on full queues. It should improve performance on multi-device 
systems with intense VM loads.

Maybe something along the lines 

"if the reclaim ratio is high, do not writepage"
"if the reclaim ratio is below high, writepage but not block"
"if the reclaim ratio is low, writepage and block"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/