Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756509AbXIEJOY (ORCPT ); Wed, 5 Sep 2007 05:14:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755540AbXIEJOQ (ORCPT ); Wed, 5 Sep 2007 05:14:16 -0400 Received: from mga05.intel.com ([192.55.52.89]:55509 "EHLO fmsmga101.fm.intel.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754909AbXIEJOP (ORCPT ); Wed, 5 Sep 2007 05:14:15 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.20,210,1186383600"; d="scan'208";a="293602402" Subject: Re: tbench regression - Why process scheduler has impact on tbench and why small per-cpu slab (SLUB) cache creates the scenario? From: "Zhang, Yanmin" To: Christoph Lameter Cc: LKML , mingo@elte.hu In-Reply-To: References: <1188953218.26438.34.camel@ymzhang> <1188969725.26438.46.camel@ymzhang> Content-Type: text/plain; charset=utf-8 Date: Wed, 05 Sep 2007 17:13:23 +0800 Message-Id: <1188983603.26438.55.camel@ymzhang> Mime-Version: 1.0 X-Mailer: Evolution 2.9.2 (2.9.2-2.fc7) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2357 Lines: 54 On Tue, 2007-09-04 at 23:58 -0700, Christoph Lameter wrote: > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > On Tue, 2007-09-04 at 20:59 -0700, Christoph Lameter wrote: > > > On Wed, 5 Sep 2007, Zhang, Yanmin wrote: > > > > > > > 8) kmalloc-4096 order is 1 which means one slab consists of 2 objects. So a > > > > > > You can change that by booting with slub_max_order=0. Then we can also use > > > the per cpu queues to get these order 0 objects which may speed up the > > > allocations because we do not have to take zone locks on slab allocation. > > > > > > Note also that Andrew's tree has a page allocator pass through for SLUB > > > for 4k kmallocs bypassing slab completely. That may also address the > > > issue. > > > > > > If you want SLUB to handle more objects in the 4k kmalloc cache > > > without going to the page allocator then you can boot f.e. with > > > > > > slub_max_order=3 slub_min_objects=8 > > I tried this approach. The testing result showed 2.6.23-rc4 is about > > 2.5% better than 2.6.22. It really resovles the issue. > > > > However, the approach treats the slabs in the same policy. Could we > > implement a per-slab specific approach like direct b)? > > I am not sure what you mean by same policy. Same configuration for all > slabs? Yes. > > > > Try the ways to address the issue that I mentioned above. > > I really appreciate your kind comments! > > Would it be possible to try the two other approaches that I suggested? I > think both of those may also solve the issue. Try booting with > slab_max_order=0 1) I tried slab_max_order=0 and the regression becomes 12.5%. It's still not good. 2) I apllied patch slub-direct-pass-through-of-page-size-or-higher-kmalloc.patch to kernel 2.6.23-rc4. The new testing result is much better, only 1% less than 2.6.22. So the best solution is booting kernel with "slub_max_order=3 slub_min_objects=8". > and see what effect it has. The queues of the page > allocator can be much larger than what slab has for 4k pages. There is > really not much of a point in using a slab allocator for page sized > allocations. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/