Date: Wed, 9 Mar 2005 20:09:36 -0800
From: Andrew Morton <akpm@osdl.org>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: linux-kernel@vger.kernel.org, axboe@suse.de
Subject: Re: Direct io on block device has performance regression on 2.6.x
 kernel
Message-Id: <20050309200936.0b1bea9e.akpm@osdl.org>
In-Reply-To: <200503100347.j2A3lRg28975@unix-os.sc.intel.com>
References: <20050309182550.0291c6fd.akpm@osdl.org>
	<200503100347.j2A3lRg28975@unix-os.sc.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2194
Lines: 57

"Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:
>
> Andrew Morton wrote Wednesday, March 09, 2005 6:26 PM
> > What does "1/3 of the total benchmark performance regression" mean?  One
> > third of 0.1% isn't very impressive.  You haven't told us anything at all
> > about the magnitude of this regression.
> 
> 2.6.9 kernel is 6% slower compare to distributor's 2.4 kernel (RHEL3).  Roughly
> 2% came from storage driver (I'm not allowed to say anything beyond that, there
> is a fix though).

The codepaths are indeed longer in 2.6.

> 2% came from DIO.

hm, that's not a lot.

Once you redo that patch to use aops and to work with O_DIRECT, the paths
will get a little deeper, but not much.  We really should do this so that
O_DIRECT works, and in case someone has gone and mmapped the blockdev.

Fine-grained alignment is probably too hard, and it should fall back to
__blockdev_direct_IO().

Does it do the right thing with a request which is non-page-aligned, but
512-byte aligned?

readv and writev?

2% is pretty thin :(

> The rest of 2% is still unaccounted for.  We don't know where.

General cache replacement, perhaps.  9MB is a big cache though.

> ...
> Around 2.6.5, we found global plug list is causing huge lock contention on
> 32-way numa box.  That got fixed in 2.6.7.  Then comes 2.6.8 which took a big
> dip at close to 20% regression.  Then we fixed 17% regression in the scheduler
> (fixed with cache_decay_tick).  2.6.9 is the last one we measured and it is 6%
> slower.  It's a constant moving target, a wild goose to chase.
> 

OK.  Seems that the 2.4 O(1) scheduler got it right for that machine.

> haven't got a chance to run transaction processing db workload on 2.6 kernel.
> Perhaps they have not compared, perhaps they are working on the same problem.
> I just don't know.

Maybe there are other factors which drown these little things out:
architecture improvements, choice of architecture, driver changes, etc.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/