Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759676Ab3DBBhM (ORCPT ); Mon, 1 Apr 2013 21:37:12 -0400 Received: from LGEMRELSE6Q.lge.com ([156.147.1.121]:48011 "EHLO LGEMRELSE6Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759623Ab3DBBhK (ORCPT ); Mon, 1 Apr 2013 21:37:10 -0400 X-AuditID: 9c930179-b7c78ae000000e4b-c8-515a3643bac9 Date: Tue, 2 Apr 2013 10:37:21 +0900 From: Joonsoo Kim To: Christoph Lameter Cc: Paul Gortmaker , Steven Rostedt , LKML , RT , Thomas Gleixner , Clark Williams , Pekka Enberg Subject: Re: [RT LATENCY] 249 microsecond latency caused by slub's unfreeze_partials() code. Message-ID: <20130402013721.GB16699@lge.com> References: <1364234613.6345.184.camel@gandalf.local.home> <0000013da2ce20f8-0e3a64ef-67ed-4ab4-9f20-b77980c876c3-000000@email.amazonses.com> <1364236355.6345.185.camel@gandalf.local.home> <20130327025957.GA17125@lge.com> <1364355032.6345.200.camel@gandalf.local.home> <20130327061351.GB17125@lge.com> <0000013db20ca149-0064fbb8-2f81-4323-9095-a38f6abb79c5-000000@email.amazonses.com> <0000013dc63a9086-7d10c4a8-748c-4e19-829a-856d8d42c8eb-000000@email.amazonses.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0000013dc63a9086-7d10c4a8-748c-4e19-829a-856d8d42c8eb-000000@email.amazonses.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 12209 Lines: 383 Hello, Christoph. On Mon, Apr 01, 2013 at 03:32:43PM +0000, Christoph Lameter wrote: > On Thu, 28 Mar 2013, Paul Gortmaker wrote: > > > > Index: linux/init/Kconfig > > > =================================================================== > > > --- linux.orig/init/Kconfig 2013-03-28 12:14:26.958358688 -0500 > > > +++ linux/init/Kconfig 2013-03-28 12:19:46.275866639 -0500 > > > @@ -1514,6 +1514,14 @@ config SLOB > > > > > > endchoice > > > > > > +config SLUB_CPU_PARTIAL > > > + depends on SLUB > > > + bool "SLUB per cpu partial cache" > > > + help > > > + Per cpu partial caches accellerate freeing of objects at the > > > + price of more indeterminism in the latency of the free. > > > + Typically one would choose no for a realtime system. > > > > Is "batch" a better description than "accelerate" ? Something like > > Its not a batching but a cache that is going to be mainly used for new > allocations on the same processor. > > > Per cpu partial caches allows batch freeing of objects to maximize > > throughput. However, this can increase the length of time spent > > holding key locks, which can increase latency spikes with respect > > to responsiveness. Select yes unless you are tuning for a realtime > > oriented system. > > > > Also, I believe this will cause a behaviour change for people who > > just run "make oldconfig" -- since there is no default line. Meaning > > that it used to be unconditionally on, but now I think it will be off > > by default, if people just mindlessly hold down Enter key. > > Ok. > > > > For RT, we'll want default N if RT_FULL (RT_BASE?) but for mainline, > > I expect you'll want default Y in order to be consistent with previous > > behaviour? > > I was not sure exactly how to handle that one yet for realtime. So I need > two different patches? > > > I've not built/booted yet, but I'll follow up if I see anything else in doing > > that. > > Here is an updated patch. I will also send an updated fixup patch. > > > Subject: slub: Make cpu partial slab support configurable V2 > > cpu partial support can introduce level of indeterminism that is not wanted > in certain context (like a realtime kernel). Make it configurable. > > Signed-off-by: Christoph Lameter > > Index: linux/include/linux/slub_def.h > =================================================================== > --- linux.orig/include/linux/slub_def.h 2013-04-01 10:27:05.908964674 -0500 > +++ linux/include/linux/slub_def.h 2013-04-01 10:27:19.905178531 -0500 > @@ -47,7 +47,9 @@ struct kmem_cache_cpu { > void **freelist; /* Pointer to next available object */ > unsigned long tid; /* Globally unique transaction id */ > struct page *page; /* The slab from which we are allocating */ > +#ifdef CONFIG_SLUB_CPU_PARTIAL > struct page *partial; /* Partially allocated frozen slabs */ > +#endif > #ifdef CONFIG_SLUB_STATS > unsigned stat[NR_SLUB_STAT_ITEMS]; > #endif > @@ -84,7 +86,9 @@ struct kmem_cache { > int size; /* The size of an object including meta data */ > int object_size; /* The size of an object without meta data */ > int offset; /* Free pointer offset. */ > +#ifdef CONFIG_SLUB_CPU_PARTIAL > int cpu_partial; /* Number of per cpu partial objects to keep around */ > +#endif > struct kmem_cache_order_objects oo; > > /* Allocation and freeing of slabs */ When !CONFIG_SLUB_CPU_PARTIAL, should we remove these variable? Without removing these, we can make code more simpler and maintainable. > Index: linux/mm/slub.c > =================================================================== > --- linux.orig/mm/slub.c 2013-04-01 10:27:05.908964674 -0500 > +++ linux/mm/slub.c 2013-04-01 10:27:19.905178531 -0500 > @@ -1531,7 +1531,9 @@ static inline void *acquire_slab(struct > return freelist; > } > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain); > +#endif > static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags); > > /* > @@ -1570,10 +1572,20 @@ static void *get_partial_node(struct kme > object = t; > available = page->objects - (unsigned long)page->lru.next; > } else { > +#ifdef CONFIG_SLUB_CPU_PARTIAL > available = put_cpu_partial(s, page, 0); > stat(s, CPU_PARTIAL_NODE); > +#else > + BUG(); > +#endif > } > - if (kmem_cache_debug(s) || available > s->cpu_partial / 2) > + if (kmem_cache_debug(s) || > +#ifdef CONFIG_SLUB_CPU_PARTIAL > + available > s->cpu_partial / 2 > +#else > + available > 0 > +#endif > + ) > break; > > } How about introducing wrapper function, cpu_partial_enabled() like as kmem_cache_debug()? int cpu_partial_enabled(s) { return kmem_cache_debug(s) || blablabla } As you already know, when kmem_cache_debug() is enabled, cpu partial is maintained as zero. How about re-using this property for implementing !CONFIG_CPU_PARTIAL? Thanks. > @@ -1874,6 +1886,7 @@ redo: > } > } > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > /* > * Unfreeze all the cpu partial slabs. > * > @@ -1989,6 +2002,7 @@ static int put_cpu_partial(struct kmem_c > } while (this_cpu_cmpxchg(s->cpu_slab->partial, oldpage, page) != oldpage); > return pobjects; > } > +#endif > > static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c) > { > @@ -2013,7 +2027,9 @@ static inline void __flush_cpu_slab(stru > if (c->page) > flush_slab(s, c); > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > unfreeze_partials(s, c); > +#endif > } > } > > @@ -2029,7 +2045,11 @@ static bool has_cpu_slab(int cpu, void * > struct kmem_cache *s = info; > struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu); > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > return c->page || c->partial; > +#else > + return c->page; > +#endif > } > > static void flush_all(struct kmem_cache *s) > @@ -2225,7 +2245,10 @@ static void *__slab_alloc(struct kmem_ca > page = c->page; > if (!page) > goto new_slab; > + > +#ifdef CONFIG_SLUB_CPU_PARTIAL > redo: > +#endif > > if (unlikely(!node_match(page, node))) { > stat(s, ALLOC_NODE_MISMATCH); > @@ -2278,6 +2301,7 @@ load_freelist: > > new_slab: > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > if (c->partial) { > page = c->page = c->partial; > c->partial = page->next; > @@ -2285,6 +2309,7 @@ new_slab: > c->freelist = NULL; > goto redo; > } > +#endif > > freelist = new_slab_objects(s, gfpflags, node, &c); > > @@ -2491,6 +2516,7 @@ static void __slab_free(struct kmem_cach > new.inuse--; > if ((!new.inuse || !prior) && !was_frozen) { > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > if (!kmem_cache_debug(s) && !prior) > > /* > @@ -2499,7 +2525,9 @@ static void __slab_free(struct kmem_cach > */ > new.frozen = 1; > > - else { /* Needs to be taken off a list */ > + else > +#endif > + { /* Needs to be taken off a list */ > > n = get_node(s, page_to_nid(page)); > /* > @@ -2521,6 +2549,7 @@ static void __slab_free(struct kmem_cach > "__slab_free")); > > if (likely(!n)) { > +#ifdef CONFIG_SLUB_CPU_PARTIAL > > /* > * If we just froze the page then put it onto the > @@ -2530,6 +2559,7 @@ static void __slab_free(struct kmem_cach > put_cpu_partial(s, page, 1); > stat(s, CPU_PARTIAL_FREE); > } > +#endif > /* > * The list lock was not taken therefore no list > * activity can be necessary. > @@ -3036,7 +3066,7 @@ static int kmem_cache_open(struct kmem_c > * list to avoid pounding the page allocator excessively. > */ > set_min_partial(s, ilog2(s->size) / 2); > - > +#ifdef CONFIG_SLUB_CPU_PARTIAL > /* > * cpu_partial determined the maximum number of objects kept in the > * per cpu partial lists of a processor. > @@ -3064,6 +3094,7 @@ static int kmem_cache_open(struct kmem_c > s->cpu_partial = 13; > else > s->cpu_partial = 30; > +#endif > > #ifdef CONFIG_NUMA > s->remote_node_defrag_ratio = 1000; > @@ -4424,13 +4455,14 @@ static ssize_t show_slab_objects(struct > total += x; > nodes[node] += x; > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > page = ACCESS_ONCE(c->partial); > if (page) { > x = page->pobjects; > total += x; > nodes[node] += x; > } > - > +#endif > per_cpu[node]++; > } > } > @@ -4583,6 +4615,7 @@ static ssize_t min_partial_store(struct > } > SLAB_ATTR(min_partial); > > +#ifdef CONFIG_CPU_PARTIAL > static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf) > { > return sprintf(buf, "%u\n", s->cpu_partial); > @@ -4605,6 +4638,7 @@ static ssize_t cpu_partial_store(struct > return length; > } > SLAB_ATTR(cpu_partial); > +#endif > > static ssize_t ctor_show(struct kmem_cache *s, char *buf) > { > @@ -4644,6 +4678,7 @@ static ssize_t objects_partial_show(stru > } > SLAB_ATTR_RO(objects_partial); > > +#ifdef CONFIG_SLUB_CPU_PARTIAL > static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf) > { > int objects = 0; > @@ -4674,6 +4709,7 @@ static ssize_t slabs_cpu_partial_show(st > return len + sprintf(buf + len, "\n"); > } > SLAB_ATTR_RO(slabs_cpu_partial); > +#endif > > static ssize_t reclaim_account_show(struct kmem_cache *s, char *buf) > { > @@ -4997,11 +5033,13 @@ STAT_ATTR(DEACTIVATE_BYPASS, deactivate_ > STAT_ATTR(ORDER_FALLBACK, order_fallback); > STAT_ATTR(CMPXCHG_DOUBLE_CPU_FAIL, cmpxchg_double_cpu_fail); > STAT_ATTR(CMPXCHG_DOUBLE_FAIL, cmpxchg_double_fail); > +#ifdef CONFIG_CPU_PARTIAL > STAT_ATTR(CPU_PARTIAL_ALLOC, cpu_partial_alloc); > STAT_ATTR(CPU_PARTIAL_FREE, cpu_partial_free); > STAT_ATTR(CPU_PARTIAL_NODE, cpu_partial_node); > STAT_ATTR(CPU_PARTIAL_DRAIN, cpu_partial_drain); > #endif > +#endif > > static struct attribute *slab_attrs[] = { > &slab_size_attr.attr, > @@ -5009,7 +5047,9 @@ static struct attribute *slab_attrs[] = > &objs_per_slab_attr.attr, > &order_attr.attr, > &min_partial_attr.attr, > +#ifdef CONFIG_CPU_PARTIAL > &cpu_partial_attr.attr, > +#endif > &objects_attr.attr, > &objects_partial_attr.attr, > &partial_attr.attr, > @@ -5022,7 +5062,9 @@ static struct attribute *slab_attrs[] = > &destroy_by_rcu_attr.attr, > &shrink_attr.attr, > &reserved_attr.attr, > +#ifdef CONFIG_SLUB_CPU_PARTIAL > &slabs_cpu_partial_attr.attr, > +#endif > #ifdef CONFIG_SLUB_DEBUG > &total_objects_attr.attr, > &slabs_attr.attr, > @@ -5064,11 +5106,13 @@ static struct attribute *slab_attrs[] = > &order_fallback_attr.attr, > &cmpxchg_double_fail_attr.attr, > &cmpxchg_double_cpu_fail_attr.attr, > +#ifdef CONFIG_SLUB_CPU_PARTIAL > &cpu_partial_alloc_attr.attr, > &cpu_partial_free_attr.attr, > &cpu_partial_node_attr.attr, > &cpu_partial_drain_attr.attr, > #endif > +#endif > #ifdef CONFIG_FAILSLAB > &failslab_attr.attr, > #endif > Index: linux/init/Kconfig > =================================================================== > --- linux.orig/init/Kconfig 2013-04-01 10:27:05.908964674 -0500 > +++ linux/init/Kconfig 2013-04-01 10:31:46.497863625 -0500 > @@ -1514,6 +1514,17 @@ config SLOB > > endchoice > > +config SLUB_CPU_PARTIAL > + default y > + depends on SLUB > + bool "SLUB per cpu partial cache" > + help > + Per cpu partial caches accellerate objects allocation and freeing > + that is local to a processor at the price of more indeterminism > + in the latency of the free. On overflow these caches will be cleared > + which requires the taking of locks that may cause latency spikes. > + Typically one would choose no for a realtime system. > + > config MMAP_ALLOW_UNINITIALIZED > bool "Allow mmapped anonymous memory to be uninitialized" > depends on EXPERT && !MMU > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/