by Andrew Morton

[permalink] [raw]

Subject: Re: [patch 3/4] slab reclaim balancing

Manfred Spraul wrote:
>
> Ed Tomlinson wrote:
> >
> > There is no dispute that in some cases it will be slower from a slab perspective. As
> > Andrew and you have discussed there are things that can be done to speed things
> > up. Is not the question really, "Are the vm and slab faster together when slab pages
> > are freed asap?"
> >
>
> Some caches are quite bursty - what about the 2 kB generic cache that is
> used for the MTU sized socket buffers? With interrupt mitigation
> enabled, I'd expect that a GigE nic could allocate a few dozend 2kb
> objects in every interrupt, and I don't think it's the right approach to
> effectively disable the cache in slab.c for such loads.

Well that's all rather broken at present. Your gigE NIC has just
trashed 60k of cache-warm memory by putting it under busmastering
receive.

It would be better to use separate slabs for Rx and Tx. Rx ones
are backed by the cold page allocator and Tx ones by the hot page
allocator. This is an improvement, but Rx is still doing suboptimal
things because it "warms up" memory and we're not taking advantage
of that. That's starting to get tricky.

There are many things we can do. We need to get the core in place
and start using, tuning and using it in a few places before deciding
whether to go nuts using the same technique everywhere. I hope we do ;)

> I do not have many data points, but in a netbench run on 4-way Xeon,
> kmem_cache_free is called 5 million times/minute, and additional 4
> million calls to kfree - I agree that _reap right now is bad, but IMHO
> it's questionable if the fix should be inside the hot-path of the allocator.
>
> What about this approach:
>
> * enable batching even on UP, with a LIFO array in front of the lists.

That has to be right.

> * After flushing a batch back into the lists, the number of free objects
> in the lists is calculated. If freeable pages exist and the number
> exceeds a target, then the freeable pages above the target are returned
> to the page buddy.

Probably OK for now. But slab should _not_ hold onto an unused,
cache-warm page. Because do_anonymous_page() may want one.

> * The target of freeable pages is increased by kmem_cache_grow - if we
> had to get another page from gfp, then our own cache was too small.
>
> Since the test for the number of freeable objects only happens after
> batching, i.e. in the worst case once for every 30 kmem_cache_free
> calls, it doesn't matter if it's a bit expensive.
>
> Open problems:
>
> * What about cache with large objects (>PAGE_SIZE, e.g. the bio
> MAX_PAGES object, or the 16 kb socket buffers used over loopback)? Right
> now, they are not cached in the per-cpu arrays, to reduce the memory
> pressure. If the list processing becomes slower, we would slow down
> these slab users. But OTHO if you memcpy 16 kB, then a few cycles in
> kmalloc probably won't matter much.
>
> * Where to flush the per-cpu caches? On a 16-way system, they can
> contain up to 4000 objects, for each cache. Right now that happens in
> kmem_cache_reap(). One flush per second would be enough, just to avoid
> that on lightly loaded slabs, objects remain forever in the per-cpu
> arrays and prevent pages from becoming freeable.

kswapd could do that, or set up a timer and use pdflush_operation,

2002-09-27 19:33:19

by Manfred Spraul

[permalink] [raw]

Subject: Re: [patch 3/4] slab reclaim balancing

--- 2.4/mm/slab.c Fri Aug 30 18:39:22 2002
+++ build-2.4/mm/slab.c Fri Sep 27 21:01:31 2002
@@ -1727,6 +1735,9 @@
}
#endif

+unsigned long g_calls = 0;
+unsigned long g_pages = 0;
+unsigned long g_success = 0;
/**
* kmem_cache_reap - Reclaim memory from caches.
* @gfp_mask: the type of memory required.
@@ -1749,6 +1760,7 @@
if (down_trylock(&cache_chain_sem))
return 0;

+g_calls++;
scan = REAP_SCANLEN;
best_len = 0;
best_pages = 0;
@@ -1827,6 +1839,8 @@
perfect:
/* free only 50% of the free slabs */
best_len = (best_len + 1)/2;
+g_success++;
+g_pages+=best_len;
for (scan = 0; scan < best_len; scan++) {
struct list_head *p;

@@ -1907,6 +1921,7 @@
* Output format version, so at least we can change it
* without _too_ many complaints.
*/
+ seq_printf(m, "%lu %lu %lu.\n",g_calls, g_pages, g_success);
seq_puts(m, "slabinfo - version: 1.1"
#if STATS
" (statistics)"

Attachments:

patch-slab-stat (926.00 B)

2002-09-27 19:47:30

by Andrew Morton

[permalink] [raw]

Subject: Re: [patch 3/4] slab reclaim balancing

Manfred Spraul wrote:
>
> Andrew Morton wrote:
> >
> >>* After flushing a batch back into the lists, the number of free objects
> >>in the lists is calculated. If freeable pages exist and the number
> >>exceeds a target, then the freeable pages above the target are returned
> >>to the page buddy.
> >
> >
> > Probably OK for now. But slab should _not_ hold onto an unused,
> > cache-warm page. Because do_anonymous_page() may want one.
> >
> If the per-cpu caches are enabled on UP, too, then this is a moot point:
> by the time a batch is freed from the per-cpu array, it will be cache cold.

Well yes, it's all smoke, mirrors and wishful thinking. All we can
do is to apply local knowledge of typical behaviour in deciding whether
a page is likely to be usefully reused.

> And btw, why do you think a page is cache-warm when the last object on a
> page is freed? If the last 32-byte kmalloc is released on a page, 40xx
> bytes are probably cache-cold.

L2 caches are hundreds of K, and a single P4 cacheline is 1/32nd of
a page. Things are tending in that direction.

> Back to your first problem: You've mentioned excess hits on the
> cache_chain_semaphore. Which app did you use for stress testing?

I think it was dd-to-six-disks.

> Could you run a stress test with the applied patch?

Shall try to.

> I've tried dbench 50, with 128 MB RAM, on uniprocessor, with 2.4:
>
> There were 9100 calls to kmem_cache_reap, and in 90% of the calls, no
> freeable memory was found. Alltogether, only 1300 pages were freed from
> the slabs.
>
> Are there just too many calls to kmem_cache_reap()? Perhaps we should
> try to optimize the "nothing freeable exists" logic?

It certainly sounds like it. Some sort of counter which is accessed
outside locks would be appropriate. Test that before deciding to
take the lock.