Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755731AbYLMCe6 (ORCPT ); Fri, 12 Dec 2008 21:34:58 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754027AbYLMCer (ORCPT ); Fri, 12 Dec 2008 21:34:47 -0500 Received: from nlpi025.sbcis.sbc.com ([207.115.36.54]:47091 "EHLO nlpi025.prodigy.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753611AbYLMCeq (ORCPT ); Fri, 12 Dec 2008 21:34:46 -0500 Date: Fri, 12 Dec 2008 20:34:35 -0600 (CST) From: Christoph Lameter X-X-Sender: cl@quilx.com To: Nick Piggin cc: Linux Kernel Mailing List , Linux Memory Management List , bcrl@kvack.org, list-linux-mm@kvack.org Subject: Re: [rfc][patch] SLQB slab allocator In-Reply-To: <20081212002518.GH8294@wotan.suse.de> Message-ID: References: <20081212002518.GH8294@wotan.suse.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Score: -2.6 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3830 Lines: 115 On Fri, 12 Dec 2008, Nick Piggin wrote: > + * SLQB : A slab allocator with object queues. > + * > + * (C) 2008 Nick Piggin My copyright notice is missing. Most of this is directly copied from SLUB. And I am not CCed? What should I be thinking? > +/* > + * SLQB: A slab allocator that focuses on per-CPU scaling, and good performance > + * with order-0 allocations. Fastpaths emphasis is placed on local allocaiton > + * and freeing, but with a secondary goal of good remote freeing (freeing on > + * another CPU from that which allocated). Good idea. I like new ideas based on SLUB. > + * > + * Using ideas and code from mm/slab.c, mm/slob.c, and mm/slub.c. Thats not right. The code was copied from mm/slub.c and then modified. It seems that the SLUB skeleton is very useful. Maybe we could abstract the common code out in order to make more creative variants possible in the future? > +/* > + * Issues still to be resolved: > + * > + * - Support PAGE_ALLOC_DEBUG. Should be easy to do. > + * > + * - Variable sizing of the per node arrays > + */ These are outdated SLUB comments.... > + /* XXX: might disobey node parameter here */ > + if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == numa_node_id())) { You can only decide not to obey if __GFP_THISNODE is not set. > +static __always_inline void *__slab_alloc(struct kmem_cache *s, > + gfp_t gfpflags, int node) > +{ > + void *object; > + struct kmem_cache_cpu *c; > + struct kmem_cache_list *l; > + > +again: > +#ifdef CONFIG_NUMA > + if (unlikely(node != -1) && unlikely(node != numa_node_id())) { > + object = __remote_slab_alloc(s, node); > + if (unlikely(!object)) > + goto alloc_new; > + return object; > + } > +#endif Does this mean that SLQB is less efficient than SLUB for off node allocations? SLUB can do off node allocations from the per cpu objects. It does not need to make the distinction for allocation. > + c = get_cpu_slab(s, smp_processor_id()); > + VM_BUG_ON(!c); if c = NULL then the deref will fail anyways with a NULL deref error. No need to have a VM_BUG_ON here. > + l = &c->list; > + object = __cache_list_get_object(s, l); Hmmm... __cache_list_get_object() is quite a bunch of heavy processing. This is all inline? > +static __always_inline void __slab_free(struct kmem_cache *s, struct slqb_page *page, void *object) > +{ > + struct kmem_cache_cpu *c; > + struct kmem_cache_list *l; > + int thiscpu = smp_processor_id(); > + > + __count_slqb_event(SLQB_FREE); > + > + c = get_cpu_slab(s, thiscpu); > + l = &c->list; > + > + if (!NUMA_BUILD || !numa_platform || > + likely(slqb_page_to_nid(page) == numa_node_id())) { SLUB does not need a node check. The page struct pointer comparison is enough. This also means that SLQB cannot do fast frees to remote node objects that were just allocated. per cpu queue is bound to the local numa node which is quite bad for memoryless nodes. In that case you always need to do remote allocs. In general I am a bit concerned that you kept the linked object list. You may be able to do much better by creating arrays of object pointers. That way you can allocate and free multiple objects by only accessing the cacheline that contains the object pointers. Only works if the objects are not immediately writen to before and after free. AFAICT this is the special case that matters in terms of the database test you are trying to improve. The case there is likely the result of bad cache unfriendly programming. You may actually improve the benchmark more if the cachelines would be kept hot there in the right way. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/