Date: Thu, 26 Mar 2009 17:51:44 -0700 (PDT)
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andrew Morton <akpm@linux-foundation.org>
cc: Theodore Tso <tytso@mit.edu>, David Rees <drees76@gmail.com>,
       Jesper Krogh <jesper@krogh.cc>,
       Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Linux 2.6.29
In-Reply-To: <alpine.LFD.2.00.0903261723250.3994@localhost.localdomain>
Message-ID: <alpine.LFD.2.00.0903261742210.3994@localhost.localdomain>
References: <alpine.LFD.2.00.0903231617550.3032@localhost.localdomain> <49C87B87.4020108@krogh.cc> <72dbd3150903232346g5af126d7sb5ad4949a7b5041f@mail.gmail.com> <49C88C80.5010803@krogh.cc> <72dbd3150903241200v38720ca0x392c381f295bdea@mail.gmail.com>
 <20090325183011.GN32307@mit.edu> <alpine.LFD.2.00.0903251139260.3032@localhost.localdomain> <20090325220530.GR32307@mit.edu> <alpine.LFD.2.00.0903251622420.3032@localhost.localdomain> <20090326171148.9bf8f1ec.akpm@linux-foundation.org>
 <alpine.LFD.2.00.0903261723250.3994@localhost.localdomain>
User-Agent: Alpine 2.00 (LFD 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2144
Lines: 48


On Thu, 26 Mar 2009, Linus Torvalds wrote:
> 
> The only times tunables have worked for us is when they auto-tune. 
> 
> IOW, we don't have "use 35% of memory for buffer cache" tunables, we just 
> dynamically auto-tune memory use. And no, we don't expect user space to 
> run some "tuning program for their load" either.

IOW, what we could reasonably do is something along the lines of:

 - start off with some reasonable value for max background dirty (per 
   block device) that defaults to something sane (quite possibly based on 
   simply memory size).

 - assume that "foreground dirty" is just always 2* background dirty.

 - if we hit the "max foreground dirty" during memory allocation, then we 
   shrink the background dirty value (logic: we never want to have to wait 
   synchronously)

 - if we hit some maximum latency on writeback, shrink dirty aggressively 
   and based on how long the latency was (because at that point we have a 
   real _measure_ of how costly it is with that load).

 - if we start doing background dirtying, but never hit the foreground 
   dirty even in dirty balancing (ie when a writer is actually _writing_, 
   as opposed to hitting it when allocating memory by a non-writer), then 
   slowly open up the window - we may be limiting too early.

.. add heuristics to taste. The point being, that if we do this based on 
real loads, and based on hitting the real problems, then we might actually 
be getting somewhere. In particular, if the filesystem sucks at writeout 
(ie the limiter is not the _disk_, but the filesystem serialization), then 
it should automatically also shrink the max dirty state. 

The tunable then could become the maximum latency we accept or something 
like that. Or the hysteresis limits/rules for the soft "grow" or "shrink" 
events. At that point, maybe we could even find something that works for 
most people.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/