Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Wed, 5 Jun 2002 07:50:08 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Wed, 5 Jun 2002 07:50:08 -0400 Received: from tomts21.bellnexxia.net ([209.226.175.183]:39326 "EHLO tomts21-srv.bellnexxia.net") by vger.kernel.org with ESMTP id ; Wed, 5 Jun 2002 07:49:50 -0400 Content-Type: text/plain; charset=US-ASCII From: Ed Tomlinson Organization: me To: linux-mm@kvack.org Subject: Re: [PATCH] move slabpages into the lru for rmap Date: Wed, 5 Jun 2002 07:49:31 -0400 X-Mailer: KMail [version 1.4] Cc: linux-kernel@vger.kernel.org, Rik van Riel In-Reply-To: <200206032120.23379.tomlins@cam.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Message-Id: <200206050749.31475.tomlins@cam.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Hi, Mike Galbraith asked why I was using the inactive clean list. This was done since, on UP, it stopped a race. Then Andrew Morton pointed out that spin locks on UP do not do very much, so I taught the cache grow code to avoid the race, but I did not remove the logic for inactive clean. This patch does. Comments, Ed Tomlinson ---- # This is a BitKeeper generated patch for the following project: # Project Name: Linux kernel tree # This patch format is intended for GNU patch command version 2.5 or higher. # This patch includes the following deltas: # ChangeSet 1.425 -> 1.426 # mm/vmscan.c 1.71 -> 1.72 # mm/slab.c 1.19 -> 1.20 # # The following is the BitKeeper ChangeSet Log # -------------------------------------------- # 02/06/05 ed@oscar.et.ca 1.426 # With the race between a cache growing in interrupt context and # the pagemap_lru_lock fixed we no longer need to use the inactive # clean list. # -------------------------------------------- # diff -Nru a/mm/slab.c b/mm/slab.c --- a/mm/slab.c Wed Jun 5 07:47:35 2002 +++ b/mm/slab.c Wed Jun 5 07:47:35 2002 @@ -579,18 +579,13 @@ * vm_scan(). Shouldn't be a worry. */ while (i--) { - if (cachep->flags & SLAB_NO_REAP) - PageClearSlab(page); - else { - if (PageActive(page)) - del_page_from_active_list(page); - ClearPageReferenced(page); - add_page_to_inactive_clean_list(page); - } + PageClearSlab(page); + if (PageActive(page)) + del_page_from_active_list(page); + ClearPageReferenced(page); page++; } - if (cachep->flags & SLAB_NO_REAP) - free_pages((unsigned long)addr, cachep->gfporder); + free_pages((unsigned long)addr, cachep->gfporder); } #if DEBUG diff -Nru a/mm/vmscan.c b/mm/vmscan.c --- a/mm/vmscan.c Wed Jun 5 07:47:35 2002 +++ b/mm/vmscan.c Wed Jun 5 07:47:35 2002 @@ -136,11 +136,9 @@ goto found_page; } - /* page just has the flag, its not in any cache/slab */ - if (PageSlab(page)) { - PageClearSlab(page); - goto found_page; - } + /* should not be on this list... */ + if (PageSlab(page)) + BUG(); /* We should never ever get here. */ printk(KERN_ERR "VM: reclaim_page, found unknown page\n"); ---- On June 3, 2002 09:20 pm, Ed Tomlinson wrote: > Hi, > > This uses aging information to let the vm free slab pages depending on > page age. It moves towards having the mapping call backs do the work. > I wonder if using mapping callback is worth the effort in that slab pages > are a bit different from other pages and are treated a little differently. > For instance, we free slab in refill_inactive. Doing this prevents caches > from growing without possibility of shrinking when under light loads. By > allowing freeing we avoid getting into a situation where slab pages cause > an artificial shortage. > > Finding a good method of handling the dcache/icache and dquota caches has > been fun... What I do now is factor the pruning and shrinking into > different calls. The pruning, in effect, ages entries in the above caches. > The rate I prune is simply the rate I see entries for these slabs in > refill_inactive_zone. This is seems fair and, in my testing, works better > than anything else I have tried (I have have experimented quite a bit). It > also avoids using any magic numbers and is self tuning. > > The logic has also been improved to (usually) free specific slabs instead > of shrinking freeable slabs. To handle slabs allocated when in interrupt > context we omit adding these pages to the lru (always in UP and if trylock > fails in SMP). For caches with non lru pages we sometimes call > kmem_cache_shrink. This prevents caches with these pages from growing until > kmem_cache_reap is called when are close to ooming. > > The patch is against rmap 13a and was developed on pre8-ac5/pre9-ac3. > There is a bk tree you can pull from at > 'casa.dyndns.org:3334/linux-2.4-rmap'. > > I have tested on UP here and, since Andrew Morton pointed out than spin > locks on UP are nops and give no protection in interrupt context, its been > working as expected. > > Now to look at 2.5... > > Comments, Feedback etc appriecated, > > Ed Tomlinson > > ------------ > # This is a BitKeeper generated patch for the following project: > # Project Name: Linux kernel tree > # This patch format is intended for GNU patch command version 2.5 or > higher. # This patch includes the following deltas: > # ChangeSet 1.423 -> 1.425 > # fs/dcache.c 1.19 -> 1.20 > # fs/dquot.c 1.19 -> 1.20 > # mm/vmscan.c 1.69 -> 1.71 > # mm/slab.c 1.17 -> 1.19 > # fs/inode.c 1.35 -> 1.36 > # include/linux/slab.h 1.10 -> 1.12 > # include/linux/dcache.h 1.11 -> 1.12 > # > # The following is the BitKeeper ChangeSet Log > # -------------------------------------------- > # 02/05/31 ed@oscar.et.ca 1.424 > # [PATCH] move slab pages into the lru > # -------------------------------------------- > # 02/06/03 ed@oscar.et.ca 1.425 > # Various locking improvements and fixes. > # -------------------------------------------- > # > diff -Nru a/fs/dcache.c b/fs/dcache.c > --- a/fs/dcache.c Mon Jun 3 21:01:57 2002 > +++ b/fs/dcache.c Mon Jun 3 21:01:57 2002 > @@ -321,7 +321,7 @@ > void prune_dcache(int count) > { > spin_lock(&dcache_lock); > - for (;;) { > + for (; count ; count--) { > struct dentry *dentry; > struct list_head *tmp; > > @@ -345,8 +345,6 @@ > BUG(); > > prune_one_dentry(dentry); > - if (!--count) > - break; > } > spin_unlock(&dcache_lock); > } > @@ -538,19 +536,10 @@ > > /* > * This is called from kswapd when we think we need some > - * more memory, but aren't really sure how much. So we > - * carefully try to free a _bit_ of our dcache, but not > - * too much. > - * > - * Priority: > - * 0 - very urgent: shrink everything > - * ... > - * 6 - base-level: try to shrink a bit. > + * more memory. > */ > -int shrink_dcache_memory(int priority, unsigned int gfp_mask) > +int age_dcache_memory(kmem_cache_t *cachep, int entries, int gfp_mask) > { > - int count = 0; > - > /* > * Nasty deadlock avoidance. > * > @@ -565,10 +554,11 @@ > if (!(gfp_mask & __GFP_FS)) > return 0; > > - count = dentry_stat.nr_unused / priority; > + if (entries > dentry_stat.nr_unused) > + entries = dentry_stat.nr_unused; > > - prune_dcache(count); > - return kmem_cache_shrink(dentry_cache); > + prune_dcache(entries); > + return entries; > } > > #define NAME_ALLOC_LEN(len) ((len+16) & ~15) > @@ -1186,6 +1176,8 @@ > if (!dentry_cache) > panic("Cannot create dentry cache"); > > + kmem_set_pruner(dentry_cache, (kmem_pruner_t)age_dcache_memory); > + > #if PAGE_SHIFT < 13 > mempages >>= (13 - PAGE_SHIFT); > #endif > @@ -1278,6 +1270,9 @@ > SLAB_HWCACHE_ALIGN, NULL, NULL); > if (!dquot_cachep) > panic("Cannot create dquot SLAB cache"); > + > + kmem_set_pruner(dquot_cachep, (kmem_pruner_t)age_dqcache_memory); > + > #endif > > dcache_init(mempages); > diff -Nru a/fs/dquot.c b/fs/dquot.c > --- a/fs/dquot.c Mon Jun 3 21:01:57 2002 > +++ b/fs/dquot.c Mon Jun 3 21:01:57 2002 > @@ -410,10 +410,13 @@ > > int shrink_dqcache_memory(int priority, unsigned int gfp_mask) > { > + if (entries > nr_free_dquots) > + entries = nr_free_dquots; > + > lock_kernel(); > - prune_dqcache(nr_free_dquots / (priority + 1)); > + prune_dqcache(entries); > unlock_kernel(); > - return kmem_cache_shrink(dquot_cachep); > + return entries; > } > > /* NOTE: If you change this function please check whether dqput_blocks() > works right... */ diff -Nru a/fs/inode.c b/fs/inode.c > --- a/fs/inode.c Mon Jun 3 21:01:57 2002 > +++ b/fs/inode.c Mon Jun 3 21:01:57 2002 > @@ -672,10 +672,11 @@ > > count = 0; > entry = inode_unused.prev; > - while (entry != &inode_unused) > - { > + for(; goal; goal--) { > struct list_head *tmp = entry; > > + if (entry == &inode_unused) > + break; > entry = entry->prev; > inode = INODE(tmp); > if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK)) > @@ -690,8 +691,6 @@ > list_add(tmp, freeable); > inode->i_state |= I_FREEING; > count++; > - if (!--goal) > - break; > } > inodes_stat.nr_unused -= count; > spin_unlock(&inode_lock); > @@ -708,10 +707,8 @@ > schedule_task(&unused_inodes_flush_task); > } > > -int shrink_icache_memory(int priority, int gfp_mask) > +int age_icache_memory(kmem_cache_t *cachep, int entries, int gfp_mask) > { > - int count = 0; > - > /* > * Nasty deadlock avoidance.. > * > @@ -722,10 +719,11 @@ > if (!(gfp_mask & __GFP_FS)) > return 0; > > - count = inodes_stat.nr_unused / priority; > + if (entries > inodes_stat.nr_unused) > + entries = inodes_stat.nr_unused; > > - prune_icache(count); > - return kmem_cache_shrink(inode_cachep); > + prune_icache(entries); > + return entries; > } > > /* > @@ -1171,6 +1169,8 @@ > NULL); > if (!inode_cachep) > panic("cannot create inode slab cache"); > + > + kmem_set_pruner(inode_cachep, (kmem_pruner_t)age_icache_memory); > > unused_inodes_flush_task.routine = try_to_sync_unused_inodes; > } > diff -Nru a/include/linux/dcache.h b/include/linux/dcache.h > --- a/include/linux/dcache.h Mon Jun 3 21:01:57 2002 > +++ b/include/linux/dcache.h Mon Jun 3 21:01:57 2002 > @@ -171,15 +171,10 @@ > #define shrink_dcache() prune_dcache(0) > struct zone_struct; > /* dcache memory management */ > -extern int shrink_dcache_memory(int, unsigned int); > extern void prune_dcache(int); > > /* icache memory management (defined in linux/fs/inode.c) */ > -extern int shrink_icache_memory(int, int); > extern void prune_icache(int); > - > -/* quota cache memory management (defined in linux/fs/dquot.c) */ > -extern int shrink_dqcache_memory(int, unsigned int); > > /* only used at mount-time */ > extern struct dentry * d_alloc_root(struct inode *); > diff -Nru a/include/linux/slab.h b/include/linux/slab.h > --- a/include/linux/slab.h Mon Jun 3 21:01:57 2002 > +++ b/include/linux/slab.h Mon Jun 3 21:01:57 2002 > @@ -55,6 +55,26 @@ > void (*)(void *, kmem_cache_t *, unsigned long)); > extern int kmem_cache_destroy(kmem_cache_t *); > extern int kmem_cache_shrink(kmem_cache_t *); > + > +typedef int (*kmem_pruner_t)(kmem_cache_t *, int, int); > + > +extern void kmem_set_pruner(kmem_cache_t *, kmem_pruner_t); > +extern int kmem_do_prunes(int); > +extern int kmem_count_page(struct page *, int); > +#define kmem_touch_page(addr) > SetPageReferenced(virt_to_page(addr)); + > +/* shrink a slab */ > +extern int kmem_shrink_slab(struct page *); > + > +/* dcache prune ( defined in linux/fs/dcache.c) */ > +extern int age_dcache_memory(kmem_cache_t *, int, int); > + > +/* icache prune (defined in linux/fs/inode.c) */ > +extern int age_icache_memory(kmem_cache_t *, int, int); > + > +/* quota cache prune (defined in linux/fs/dquot.c) */ > +extern int age_dqcache_memory(kmem_cache_t *, int, int); > + > extern void *kmem_cache_alloc(kmem_cache_t *, int); > extern void kmem_cache_free(kmem_cache_t *, void *); > > diff -Nru a/mm/slab.c b/mm/slab.c > --- a/mm/slab.c Mon Jun 3 21:01:57 2002 > +++ b/mm/slab.c Mon Jun 3 21:01:57 2002 > @@ -72,6 +72,7 @@ > #include > #include > #include > +#include > #include > > /* > @@ -212,6 +213,8 @@ > kmem_cache_t *slabp_cache; > unsigned int growing; > unsigned int dflags; /* dynamic flags */ > + kmem_pruner_t pruner; /* shrink callback */ > + int count; /* count used to trigger shrink */ > > /* constructor func */ > void (*ctor)(void *, kmem_cache_t *, unsigned long); > @@ -250,10 +253,12 @@ > > /* c_dflags (dynamic flags). Need to hold the spinlock to access this > member */ #define DFLGS_GROWN 0x000001UL /* don't reap a recently grown */ > +#define DFLGS_NONLRU 0x000002UL /* there are reciently allocated > + non lru pages in this cache */ > > #define OFF_SLAB(x) ((x)->flags & CFLGS_OFF_SLAB) > #define OPTIMIZE(x) ((x)->flags & CFLGS_OPTIMIZE) > -#define GROWN(x) ((x)->dlags & DFLGS_GROWN) > +#define GROWN(x) ((x)->dflags & DFLGS_GROWN) > > #if STATS > #define STATS_INC_ACTIVE(x) ((x)->num_active++) > @@ -381,6 +386,64 @@ > static void enable_cpucache (kmem_cache_t *cachep); > static void enable_all_cpucaches (void); > #endif > + > +/* > + * Note: For prunable caches object size must less than page size. > + */ > +void kmem_set_pruner(kmem_cache_t * cachep, kmem_pruner_t thepruner) > +{ > + if (cachep->objsize > PAGE_SIZE) > + BUG(); > + cachep->pruner = thepruner; > +} > + > +/* > + * Used by refill_inactive_zone to determine caches that need pruning. > + */ > +int kmem_count_page(struct page *page, int cold) > +{ > + kmem_cache_t *cachep = GET_PAGE_CACHE(page); > + slab_t *slabp = GET_PAGE_SLAB(page); > + int ret =0; > + spin_lock(&cachep->spinlock); > + if (cachep->pruner != NULL) { > + cachep->count += slabp->inuse >> cachep->gfporder; > + ret = !slabp->inuse; > + } else > + ret = cold && !slabp->inuse; > + spin_unlock(&cachep->spinlock); > + return ret; > +} > + > + > +/* Call the prune functions to age pruneable caches */ > +int kmem_do_prunes(int gfp_mask) > +{ > + struct list_head *p; > + int nr; > + > + if (gfp_mask & __GFP_WAIT) > + down(&cache_chain_sem); > + else > + if (down_trylock(&cache_chain_sem)) > + return 0; > + > + list_for_each(p,&cache_chain) { > + kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next); > + if (cachep->pruner != NULL) { > + spin_lock(&cachep->spinlock); > + nr = cachep->count; > + cachep->count = 0; > + spin_unlock(&cachep->spinlock); > + if (nr > 0) > + (*cachep->pruner)(cachep, nr, gfp_mask); > + > + } > + } > + up(&cache_chain_sem); > + return 1; > +} > + > > /* Cal the num objs, wastage, and bytes left over for a given slab size. > */ static void kmem_cache_estimate (unsigned long gfporder, size_t size, @@ > -479,7 +542,9 @@ > > __initcall(kmem_cpucache_init); > > -/* Interface to system's page allocator. No need to hold the cache-lock. > +/* > + * Interface to system's page allocator. No need to hold the cache-lock. > + * Call with pagemap_lru_lock held > */ > static inline void * kmem_getpages (kmem_cache_t *cachep, unsigned long > flags) { > @@ -501,7 +566,8 @@ > return addr; > } > > -/* Interface to system's page release. */ > +/* Interface to system's page release. > + * Normally called with pagemap_lru_lock held */ > static inline void kmem_freepages (kmem_cache_t *cachep, void *addr) > { > unsigned long i = (1<gfporder); > @@ -513,10 +579,18 @@ > * vm_scan(). Shouldn't be a worry. > */ > while (i--) { > - PageClearSlab(page); > + if (cachep->flags & SLAB_NO_REAP) > + PageClearSlab(page); > + else { > + if (PageActive(page)) > + del_page_from_active_list(page); > + ClearPageReferenced(page); > + add_page_to_inactive_clean_list(page); > + } > page++; > } > - free_pages((unsigned long)addr, cachep->gfporder); > + if (cachep->flags & SLAB_NO_REAP) > + free_pages((unsigned long)addr, cachep->gfporder); > } > > #if DEBUG > @@ -546,9 +620,11 @@ > } > #endif > > + > /* Destroy all the objs in a slab, and release the mem back to the system. > * Before calling the slab must have been unlinked from the cache. > * The cache-lock is not held/needed. > + * pagemap_lru_lock should be held for kmem_freepages > */ > static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp) > { > @@ -780,6 +856,8 @@ > flags |= CFLGS_OPTIMIZE; > > cachep->flags = flags; > + cachep->pruner = NULL; > + cachep->count = 0; > cachep->gfpflags = 0; > if (flags & SLAB_CACHE_DMA) > cachep->gfpflags |= GFP_DMA; > @@ -946,11 +1024,13 @@ > > drain_cpu_caches(cachep); > > + spin_lock(&pagemap_lru_lock); > spin_lock_irq(&cachep->spinlock); > __kmem_cache_shrink_locked(cachep); > ret = !list_empty(&cachep->slabs_full) || > !list_empty(&cachep->slabs_partial); > spin_unlock_irq(&cachep->spinlock); > + spin_unlock(&pagemap_lru_lock); > return ret; > } > > @@ -959,7 +1039,7 @@ > * @cachep: The cache to shrink. > * > * Releases as many slabs as possible for a cache. > - * Returns number of pages released. > + * Returns number of pages removed from the cache. > */ > int kmem_cache_shrink(kmem_cache_t *cachep) > { > @@ -969,14 +1049,53 @@ > BUG(); > > drain_cpu_caches(cachep); > - > + > + spin_lock(&pagemap_lru_lock); > spin_lock_irq(&cachep->spinlock); > ret = __kmem_cache_shrink_locked(cachep); > spin_unlock_irq(&cachep->spinlock); > + spin_unlock(&pagemap_lru_lock); > > - return ret << cachep->gfporder; > + return ret<gfporder; > } > > + > +/* > + * Used by refill_inactive_zone to try to shrink a cache. The > + * method we use to shrink depends on if we have added nonlru > + * pages since the last time we shrunk this cache. > + * - shrink works and we return the pages shrunk > + * - shrink fails because the slab is in use, we return 0 > + * called with pagemap_lru_lock held. > + */ > +int kmem_shrink_slab(struct page *page) > +{ > + kmem_cache_t *cachep = GET_PAGE_CACHE(page); > + slab_t *slabp = GET_PAGE_SLAB(page); > + > + spin_lock_irq(&cachep->spinlock); > + if (!slabp->inuse) { > + if (!cachep->growing) { > + if (cachep->dflags & DFLGS_NONLRU) { > + int nr = __kmem_cache_shrink_locked(cachep); > + cachep->dflags &= ~DFLGS_NONLRU; > + spin_unlock_irq(&cachep->spinlock); > + return nr<gfporder; > + } else { > + list_del(&slabp->list); > + spin_unlock_irq(&cachep->spinlock); > + kmem_slab_destroy(cachep, slabp); > + return 1<gfporder; > + } > + if (PageActive(page)) > + BUG(); > + } > + } > + spin_unlock_irq(&cachep->spinlock); > + return 0; > +} > + > + > /** > * kmem_cache_destroy - delete a cache > * @cachep: the cache to destroy > @@ -1106,7 +1225,7 @@ > struct page *page; > void *objp; > size_t offset; > - unsigned int i, local_flags; > + unsigned int i, local_flags, locked = 0; > unsigned long ctor_flags; > unsigned long save_flags; > > @@ -1163,6 +1282,21 @@ > if (!(objp = kmem_getpages(cachep, flags))) > goto failed; > > + /* > + * We want the pagemap_lru_lock, in UP spin locks to not > + * protect us in interrupt context... In SMP they do but, > + * optimizating for speed, we process if we do not get it. > + */ > + if (!(cachep->flags & SLAB_NO_REAP)) { > +#ifdef CONFIG_SMP > + locked = spin_trylock(&pagemap_lru_lock); > +#else > + locked = !in_interrupt() && spin_trylock(&pagemap_lru_lock); > +#endif > + if (!locked && !in_interrupt()) > + goto opps1; > + } > + > /* Get slab management. */ > if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags))) > goto opps1; > @@ -1174,9 +1308,15 @@ > SET_PAGE_CACHE(page, cachep); > SET_PAGE_SLAB(page, slabp); > PageSetSlab(page); > + set_page_count(page, 1); > + if (locked) > + add_page_to_active_list(page); > page++; > } while (--i); > > + if (locked) > + spin_unlock(&pagemap_lru_lock); > + > kmem_cache_init_objs(cachep, slabp, ctor_flags); > > spin_lock_irqsave(&cachep->spinlock, save_flags); > @@ -1187,10 +1327,15 @@ > STATS_INC_GROWN(cachep); > cachep->failures = 0; > > + /* The pagemap_lru_lock was not quickly/safely available */ > + if (!locked && !(cachep->flags & SLAB_NO_REAP)) > + cachep->dflags |= DFLGS_NONLRU; > + > spin_unlock_irqrestore(&cachep->spinlock, save_flags); > return 1; > opps1: > - kmem_freepages(cachep, objp); > + /* do not use kmem_freepages - we are not in the lru yet... */ > + free_pages((unsigned long)objp, cachep->gfporder); > failed: > spin_lock_irqsave(&cachep->spinlock, save_flags); > cachep->growing--; > @@ -1255,6 +1400,7 @@ > list_del(&slabp->list); > list_add(&slabp->list, &cachep->slabs_full); > } > + kmem_touch_page(objp); > #if DEBUG > if (cachep->flags & SLAB_POISON) > if (kmem_check_poison_obj(cachep, objp)) > @@ -1816,6 +1962,7 @@ > > spin_lock_irq(&best_cachep->spinlock); > perfect: > + spin_lock(&pagemap_lru_lock); > /* free only 50% of the free slabs */ > best_len = (best_len + 1)/2; > for (scan = 0; scan < best_len; scan++) { > @@ -1841,6 +1988,7 @@ > kmem_slab_destroy(best_cachep, slabp); > spin_lock_irq(&best_cachep->spinlock); > } > + spin_unlock(&pagemap_lru_lock); > spin_unlock_irq(&best_cachep->spinlock); > ret = scan * (1 << best_cachep->gfporder); > out: > diff -Nru a/mm/vmscan.c b/mm/vmscan.c > --- a/mm/vmscan.c Mon Jun 3 21:01:57 2002 > +++ b/mm/vmscan.c Mon Jun 3 21:01:57 2002 > @@ -136,6 +136,12 @@ > goto found_page; > } > > + /* page just has the flag, its not in any cache/slab */ > + if (PageSlab(page)) { > + PageClearSlab(page); > + goto found_page; > + } > + > /* We should never ever get here. */ > printk(KERN_ERR "VM: reclaim_page, found unknown page\n"); > list_del(page_lru); > @@ -264,6 +270,10 @@ > if (unlikely(TryLockPage(page))) > continue; > > + /* Slab pages should never get here... */ > + if (PageSlab(page)) > + BUG(); > + > /* > * The page is in active use or really unfreeable. Move to > * the active list and adjust the page age if needed. > @@ -469,6 +479,7 @@ > * This function will scan a portion of the active list of a zone to find > * unused pages, those pages will then be moved to the inactive list. > */ > + > int refill_inactive_zone(struct zone_struct * zone, int priority) > { > int maxscan = zone->active_pages >> priority; > @@ -506,7 +517,7 @@ > * both PG_locked and the pte_chain_lock are held. > */ > pte_chain_lock(page); > - if (!page_mapping_inuse(page)) { > + if (!page_mapping_inuse(page) && !PageSlab(page)) { > pte_chain_unlock(page); > UnlockPage(page); > drop_page(page); > @@ -523,6 +534,31 @@ > } > > /* > + * For slab pages we count entries for caches with their > + * own pruning/aging method. If we can count a page or > + * its cold we try to free it. We only use one aging > + * method otherwise we end up with caches with lots > + * of free pages... kmem_shrink_slab frees slab(s) > + * and moves the page(s) to the inactive clean list. > + */ > + if (PageSlab(page)) { > + pte_chain_unlock(page); > + UnlockPage(page); > + if (kmem_count_page(page, !page->age)) { > + int pages = kmem_shrink_slab(page); > + if (pages) { > + nr_deactivated += pages; > + if (nr_deactivated > target) > + goto done; > + continue; > + } > + } > + list_del(page_lru); > + list_add(page_lru, &zone->active_list); > + continue; > + } > + > + /* > * If the page age is 'hot' and the process using the > * page doesn't exceed its RSS limit we keep the page. > * Otherwise we move it to the inactive_dirty list. > @@ -555,6 +591,7 @@ > return nr_deactivated; > } > > + > /** > * refill_inactive - checks all zones and refills the inactive list as > needed * > @@ -619,24 +656,15 @@ > > /* > * Eat memory from filesystem page cache, buffer cache, > - * dentry, inode and filesystem quota caches. > */ > ret += page_launder(gfp_mask); > - ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask); > - ret += shrink_icache_memory(1, gfp_mask); > -#ifdef CONFIG_QUOTA > - ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask); > -#endif > > /* > - * Move pages from the active list to the inactive list. > + * Move pages from the active list to the inactive list, > + * then prune the prunable caches, aging them. > */ > refill_inactive(); > - > - /* > - * Reclaim unused slab cache memory. > - */ > - ret += kmem_cache_reap(gfp_mask); > + kmem_do_prunes(gfp_mask); > > refill_freelist(); > > @@ -645,11 +673,13 @@ > run_task_queue(&tq_disk); > > /* > - * Hmm.. Cache shrink failed - time to kill something? > + * Hmm.. - time to kill something? > * Mhwahahhaha! This is the part I really like. Giggle. > */ > - if (!ret && free_min(ANY_ZONE) > 0) > - out_of_memory(); > + if (!ret && free_min(ANY_ZONE) > 0) { > + if (!kmem_cache_reap(gfp_mask)) > + out_of_memory(); > + } > > return ret; > } > @@ -740,6 +770,7 @@ > > /* Do background page aging. */ > background_aging(DEF_PRIORITY); > + kmem_do_prunes(GFP_KSWAPD); > } > > wakeup_memwaiters(); > > ------------ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/