Content-Type: text/plain; charset=US-ASCII
From: Ed Tomlinson <tomlins@cam.org>
Organization: me
To: linux-mm@kvack.org
Subject: Re: [PATCH] move slabpages into the lru for rmap
Date: Wed, 5 Jun 2002 07:49:31 -0400
Cc: linux-kernel@vger.kernel.org, Rik van Riel <riel@conectiva.com.br>
In-Reply-To: <200206032120.23379.tomlins@cam.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 7BIT
Message-Id: <200206050749.31475.tomlins@cam.org>
Sender: linux-kernel-owner@vger.kernel.org

Hi,

Mike Galbraith asked why I was using the inactive clean list.  This was done
since, on UP, it stopped a race.  Then Andrew Morton pointed out that spin
locks on UP do not do very much, so I taught the cache grow code to 
avoid the race, but I did not remove the logic for inactive clean.  This
patch does.

Comments,
Ed Tomlinson

----
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.425   -> 1.426  
#	         mm/vmscan.c	1.71    -> 1.72   
#	           mm/slab.c	1.19    -> 1.20   
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 02/06/05	ed@oscar.et.ca	1.426
# With the race between a cache growing in interrupt context and
# the pagemap_lru_lock fixed we no longer need to use the inactive
# clean list.
# --------------------------------------------
#
diff -Nru a/mm/slab.c b/mm/slab.c
--- a/mm/slab.c	Wed Jun  5 07:47:35 2002
+++ b/mm/slab.c	Wed Jun  5 07:47:35 2002
@@ -579,18 +579,13 @@
 	 * vm_scan(). Shouldn't be a worry.
 	 */
 	while (i--) {
-		if (cachep->flags & SLAB_NO_REAP) 
-			PageClearSlab(page);
-		else {
-			if (PageActive(page))
-				del_page_from_active_list(page);
-			ClearPageReferenced(page);
-			add_page_to_inactive_clean_list(page);
-		}
+		PageClearSlab(page);
+		if (PageActive(page))
+			del_page_from_active_list(page);
+		ClearPageReferenced(page);
 		page++;
 	}
-	if (cachep->flags & SLAB_NO_REAP)
-		free_pages((unsigned long)addr, cachep->gfporder);
+	free_pages((unsigned long)addr, cachep->gfporder);
 }
 
 #if DEBUG
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	Wed Jun  5 07:47:35 2002
+++ b/mm/vmscan.c	Wed Jun  5 07:47:35 2002
@@ -136,11 +136,9 @@
 			goto found_page;
 		}
 
-		/* page just has the flag, its not in any cache/slab */
-		if (PageSlab(page)) {
-			PageClearSlab(page);
-			goto found_page;
-		}
+		/* should not be on this list... */
+		if (PageSlab(page))
+			BUG();
 
 		/* We should never ever get here. */
 		printk(KERN_ERR "VM: reclaim_page, found unknown page\n");

----

On June 3, 2002 09:20 pm, Ed Tomlinson wrote:
> Hi,
>
> This uses aging information to let the vm free slab pages depending on
> page age.  It moves towards having the mapping call backs do the work.
> I wonder if using mapping callback is  worth the effort in that slab pages
> are a bit different from other pages and are treated a little differently. 
> For instance, we free slab in refill_inactive.  Doing this prevents caches
> from growing without possibility of shrinking when under light loads.  By
> allowing freeing we avoid getting into a situation where slab pages cause
> an artificial shortage.
>
> Finding a good method of handling the dcache/icache and dquota caches has
> been fun...  What I do now is factor the pruning and shrinking into
> different calls.  The pruning, in effect, ages entries in the above caches.
>  The rate I prune is simply the rate I see entries for these slabs in
> refill_inactive_zone. This is seems fair and, in my testing, works better
> than anything else I have tried (I have have experimented quite a bit).  It
> also avoids using any magic numbers and is self tuning.
>
> The logic has also been improved to (usually) free specific slabs instead
> of shrinking freeable slabs.  To handle slabs allocated when in interrupt
> context we omit adding these pages to the lru (always in UP and if trylock
> fails in SMP).  For caches with non lru pages we sometimes call
> kmem_cache_shrink. This prevents caches with these pages from growing until
> kmem_cache_reap is called when are close to ooming.
>
> The patch is against rmap 13a and was developed on pre8-ac5/pre9-ac3. 
> There is a bk tree you can pull from at
> 'casa.dyndns.org:3334/linux-2.4-rmap'.
>
> I have tested on UP here and, since Andrew Morton pointed out than spin
> locks on UP are nops and give no protection in interrupt context, its been
> working as expected.
>
> Now to look at 2.5...
>
> Comments, Feedback etc appriecated,
>
> Ed Tomlinson
>
> ------------
> # This is a BitKeeper generated patch for the following project:
> # Project Name: Linux kernel tree
> # This patch format is intended for GNU patch command version 2.5 or
> higher. # This patch includes the following deltas:
> #	           ChangeSet	1.423   -> 1.425
> #	         fs/dcache.c	1.19    -> 1.20
> #	          fs/dquot.c	1.19    -> 1.20
> #	         mm/vmscan.c	1.69    -> 1.71
> #	           mm/slab.c	1.17    -> 1.19
> #	          fs/inode.c	1.35    -> 1.36
> #	include/linux/slab.h	1.10    -> 1.12
> #	include/linux/dcache.h	1.11    -> 1.12
> #
> # The following is the BitKeeper ChangeSet Log
> # --------------------------------------------
> # 02/05/31	ed@oscar.et.ca	1.424
> # [PATCH] move slab pages into the lru
> # --------------------------------------------
> # 02/06/03	ed@oscar.et.ca	1.425
> # Various locking improvements and fixes.
> # --------------------------------------------
> #
> diff -Nru a/fs/dcache.c b/fs/dcache.c
> --- a/fs/dcache.c	Mon Jun  3 21:01:57 2002
> +++ b/fs/dcache.c	Mon Jun  3 21:01:57 2002
> @@ -321,7 +321,7 @@
>  void prune_dcache(int count)
>  {
>  	spin_lock(&dcache_lock);
> -	for (;;) {
> +	for (; count ; count--) {
>  		struct dentry *dentry;
>  		struct list_head *tmp;
>
> @@ -345,8 +345,6 @@
>  			BUG();
>
>  		prune_one_dentry(dentry);
> -		if (!--count)
> -			break;
>  	}
>  	spin_unlock(&dcache_lock);
>  }
> @@ -538,19 +536,10 @@
>
>  /*
>   * This is called from kswapd when we think we need some
> - * more memory, but aren't really sure how much. So we
> - * carefully try to free a _bit_ of our dcache, but not
> - * too much.
> - *
> - * Priority:
> - *   0 - very urgent: shrink everything
> - *  ...
> - *   6 - base-level: try to shrink a bit.
> + * more memory.
>   */
> -int shrink_dcache_memory(int priority, unsigned int gfp_mask)
> +int age_dcache_memory(kmem_cache_t *cachep, int entries, int gfp_mask)
>  {
> -	int count = 0;
> -
>  	/*
>  	 * Nasty deadlock avoidance.
>  	 *
> @@ -565,10 +554,11 @@
>  	if (!(gfp_mask & __GFP_FS))
>  		return 0;
>
> -	count = dentry_stat.nr_unused / priority;
> +	if (entries > dentry_stat.nr_unused)
> +		entries = dentry_stat.nr_unused;
>
> -	prune_dcache(count);
> -	return kmem_cache_shrink(dentry_cache);
> +	prune_dcache(entries);
> +	return entries;
>  }
>
>  #define NAME_ALLOC_LEN(len)	((len+16) & ~15)
> @@ -1186,6 +1176,8 @@
>  	if (!dentry_cache)
>  		panic("Cannot create dentry cache");
>
> +	kmem_set_pruner(dentry_cache, (kmem_pruner_t)age_dcache_memory);
> +
>  #if PAGE_SHIFT < 13
>  	mempages >>= (13 - PAGE_SHIFT);
>  #endif
> @@ -1278,6 +1270,9 @@
>  			SLAB_HWCACHE_ALIGN, NULL, NULL);
>  	if (!dquot_cachep)
>  		panic("Cannot create dquot SLAB cache");
> +
> +	kmem_set_pruner(dquot_cachep, (kmem_pruner_t)age_dqcache_memory);
> +
>  #endif
>
>  	dcache_init(mempages);
> diff -Nru a/fs/dquot.c b/fs/dquot.c
> --- a/fs/dquot.c	Mon Jun  3 21:01:57 2002
> +++ b/fs/dquot.c	Mon Jun  3 21:01:57 2002
> @@ -410,10 +410,13 @@
>
>  int shrink_dqcache_memory(int priority, unsigned int gfp_mask)
>  {
> +	if (entries > nr_free_dquots)
> +		entries = nr_free_dquots;
> +
>  	lock_kernel();
> -	prune_dqcache(nr_free_dquots / (priority + 1));
> +	prune_dqcache(entries);
>  	unlock_kernel();
> -	return kmem_cache_shrink(dquot_cachep);
> +	return entries;
>  }
>
>  /* NOTE: If you change this function please check whether dqput_blocks()
> works right... */ diff -Nru a/fs/inode.c b/fs/inode.c
> --- a/fs/inode.c	Mon Jun  3 21:01:57 2002
> +++ b/fs/inode.c	Mon Jun  3 21:01:57 2002
> @@ -672,10 +672,11 @@
>
>  	count = 0;
>  	entry = inode_unused.prev;
> -	while (entry != &inode_unused)
> -	{
> +	for(; goal; goal--) {
>  		struct list_head *tmp = entry;
>
> +		if (entry == &inode_unused)
> +			break;
>  		entry = entry->prev;
>  		inode = INODE(tmp);
>  		if (inode->i_state & (I_FREEING|I_CLEAR|I_LOCK))
> @@ -690,8 +691,6 @@
>  		list_add(tmp, freeable);
>  		inode->i_state |= I_FREEING;
>  		count++;
> -		if (!--goal)
> -			break;
>  	}
>  	inodes_stat.nr_unused -= count;
>  	spin_unlock(&inode_lock);
> @@ -708,10 +707,8 @@
>  		schedule_task(&unused_inodes_flush_task);
>  }
>
> -int shrink_icache_memory(int priority, int gfp_mask)
> +int age_icache_memory(kmem_cache_t *cachep, int entries, int gfp_mask)
>  {
> -	int count = 0;
> -
>  	/*
>  	 * Nasty deadlock avoidance..
>  	 *
> @@ -722,10 +719,11 @@
>  	if (!(gfp_mask & __GFP_FS))
>  		return 0;
>
> -	count = inodes_stat.nr_unused / priority;
> +	if (entries > inodes_stat.nr_unused)
> +		entries = inodes_stat.nr_unused;
>
> -	prune_icache(count);
> -	return kmem_cache_shrink(inode_cachep);
> +	prune_icache(entries);
> +	return entries;
>  }
>
>  /*
> @@ -1171,6 +1169,8 @@
>  					 NULL);
>  	if (!inode_cachep)
>  		panic("cannot create inode slab cache");
> +
> +	kmem_set_pruner(inode_cachep, (kmem_pruner_t)age_icache_memory);
>
>  	unused_inodes_flush_task.routine = try_to_sync_unused_inodes;
>  }
> diff -Nru a/include/linux/dcache.h b/include/linux/dcache.h
> --- a/include/linux/dcache.h	Mon Jun  3 21:01:57 2002
> +++ b/include/linux/dcache.h	Mon Jun  3 21:01:57 2002
> @@ -171,15 +171,10 @@
>  #define shrink_dcache() prune_dcache(0)
>  struct zone_struct;
>  /* dcache memory management */
> -extern int shrink_dcache_memory(int, unsigned int);
>  extern void prune_dcache(int);
>
>  /* icache memory management (defined in linux/fs/inode.c) */
> -extern int shrink_icache_memory(int, int);
>  extern void prune_icache(int);
> -
> -/* quota cache memory management (defined in linux/fs/dquot.c) */
> -extern int shrink_dqcache_memory(int, unsigned int);
>
>  /* only used at mount-time */
>  extern struct dentry * d_alloc_root(struct inode *);
> diff -Nru a/include/linux/slab.h b/include/linux/slab.h
> --- a/include/linux/slab.h	Mon Jun  3 21:01:57 2002
> +++ b/include/linux/slab.h	Mon Jun  3 21:01:57 2002
> @@ -55,6 +55,26 @@
>  				       void (*)(void *, kmem_cache_t *, unsigned long));
>  extern int kmem_cache_destroy(kmem_cache_t *);
>  extern int kmem_cache_shrink(kmem_cache_t *);
> +
> +typedef int (*kmem_pruner_t)(kmem_cache_t *, int, int);
> +
> +extern void kmem_set_pruner(kmem_cache_t *, kmem_pruner_t);
> +extern int kmem_do_prunes(int);
> +extern int kmem_count_page(struct page *, int);
> +#define kmem_touch_page(addr)                
> SetPageReferenced(virt_to_page(addr)); +
> +/* shrink a slab */
> +extern int kmem_shrink_slab(struct page *);
> +
> +/* dcache prune ( defined in linux/fs/dcache.c) */
> +extern int age_dcache_memory(kmem_cache_t *, int, int);
> +
> +/* icache prune (defined in linux/fs/inode.c) */
> +extern int age_icache_memory(kmem_cache_t *, int, int);
> +
> +/* quota cache prune (defined in linux/fs/dquot.c) */
> +extern int age_dqcache_memory(kmem_cache_t *, int, int);
> +
>  extern void *kmem_cache_alloc(kmem_cache_t *, int);
>  extern void kmem_cache_free(kmem_cache_t *, void *);
>
> diff -Nru a/mm/slab.c b/mm/slab.c
> --- a/mm/slab.c	Mon Jun  3 21:01:57 2002
> +++ b/mm/slab.c	Mon Jun  3 21:01:57 2002
> @@ -72,6 +72,7 @@
>  #include	<linux/slab.h>
>  #include	<linux/interrupt.h>
>  #include	<linux/init.h>
> +#include	<linux/mm_inline.h>
>  #include	<asm/uaccess.h>
>
>  /*
> @@ -212,6 +213,8 @@
>  	kmem_cache_t		*slabp_cache;
>  	unsigned int		growing;
>  	unsigned int		dflags;		/* dynamic flags */
> +	kmem_pruner_t		pruner;	/* shrink callback */
> +	int 			count;		/* count used to trigger shrink */
>
>  	/* constructor func */
>  	void (*ctor)(void *, kmem_cache_t *, unsigned long);
> @@ -250,10 +253,12 @@
>
>  /* c_dflags (dynamic flags). Need to hold the spinlock to access this
> member */ #define	DFLGS_GROWN	0x000001UL	/* don't reap a recently grown */
> +#define	DFLGS_NONLRU	0x000002UL	/* there are reciently allocated
> +					   non lru pages in this cache */
>
>  #define	OFF_SLAB(x)	((x)->flags & CFLGS_OFF_SLAB)
>  #define	OPTIMIZE(x)	((x)->flags & CFLGS_OPTIMIZE)
> -#define	GROWN(x)	((x)->dlags & DFLGS_GROWN)
> +#define	GROWN(x)	((x)->dflags & DFLGS_GROWN)
>
>  #if STATS
>  #define	STATS_INC_ACTIVE(x)	((x)->num_active++)
> @@ -381,6 +386,64 @@
>  static void enable_cpucache (kmem_cache_t *cachep);
>  static void enable_all_cpucaches (void);
>  #endif
> +
> +/*
> + * Note: For prunable caches object size must less than page size.
> + */
> +void kmem_set_pruner(kmem_cache_t * cachep, kmem_pruner_t thepruner)
> +{
> +	if (cachep->objsize > PAGE_SIZE)
> +		BUG();
> +	cachep->pruner = thepruner;
> +}
> +
> +/*
> + * Used by refill_inactive_zone to determine caches that need pruning.
> + */
> +int kmem_count_page(struct page *page, int cold)
> +{
> +	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
> +	slab_t *slabp = GET_PAGE_SLAB(page);
> +	int ret =0;
> +	spin_lock(&cachep->spinlock);
> +	if (cachep->pruner != NULL) {
> +		cachep->count += slabp->inuse >> cachep->gfporder;
> +		ret = !slabp->inuse;
> +	} else
> +		ret = cold && !slabp->inuse;
> +	spin_unlock(&cachep->spinlock);
> +	return ret;
> +}
> +
> +
> +/* Call the prune functions to age pruneable caches */
> +int kmem_do_prunes(int gfp_mask)
> +{
> +	struct list_head *p;
> +	int nr;
> +
> +        if (gfp_mask & __GFP_WAIT)
> +                down(&cache_chain_sem);
> +        else
> +                if (down_trylock(&cache_chain_sem))
> +                        return 0;
> +
> +        list_for_each(p,&cache_chain) {
> +                kmem_cache_t *cachep = list_entry(p, kmem_cache_t, next);
> +		if (cachep->pruner != NULL) {
> +			spin_lock(&cachep->spinlock);
> +			nr = cachep->count;
> +			cachep->count = 0;
> +			spin_unlock(&cachep->spinlock);
> +			if (nr > 0)
> +				(*cachep->pruner)(cachep, nr, gfp_mask);
> +
> +		}
> +	}
> +        up(&cache_chain_sem);
> +	return 1;
> +}
> +
>
>  /* Cal the num objs, wastage, and bytes left over for a given slab size.
> */ static void kmem_cache_estimate (unsigned long gfporder, size_t size, @@
> -479,7 +542,9 @@
>
>  __initcall(kmem_cpucache_init);
>
> -/* Interface to system's page allocator. No need to hold the cache-lock.
> +/*
> + * Interface to system's page allocator. No need to hold the cache-lock.
> + * Call with pagemap_lru_lock held
>   */
>  static inline void * kmem_getpages (kmem_cache_t *cachep, unsigned long
> flags) {
> @@ -501,7 +566,8 @@
>  	return addr;
>  }
>
> -/* Interface to system's page release. */
> +/* Interface to system's page release.
> + * Normally called with pagemap_lru_lock held */
>  static inline void kmem_freepages (kmem_cache_t *cachep, void *addr)
>  {
>  	unsigned long i = (1<<cachep->gfporder);
> @@ -513,10 +579,18 @@
>  	 * vm_scan(). Shouldn't be a worry.
>  	 */
>  	while (i--) {
> -		PageClearSlab(page);
> +		if (cachep->flags & SLAB_NO_REAP)
> +			PageClearSlab(page);
> +		else {
> +			if (PageActive(page))
> +				del_page_from_active_list(page);
> +			ClearPageReferenced(page);
> +			add_page_to_inactive_clean_list(page);
> +		}
>  		page++;
>  	}
> -	free_pages((unsigned long)addr, cachep->gfporder);
> +	if (cachep->flags & SLAB_NO_REAP)
> +		free_pages((unsigned long)addr, cachep->gfporder);
>  }
>
>  #if DEBUG
> @@ -546,9 +620,11 @@
>  }
>  #endif
>
> +
>  /* Destroy all the objs in a slab, and release the mem back to the system.
>   * Before calling the slab must have been unlinked from the cache.
>   * The cache-lock is not held/needed.
> + * pagemap_lru_lock should be held for kmem_freepages
>   */
>  static void kmem_slab_destroy (kmem_cache_t *cachep, slab_t *slabp)
>  {
> @@ -780,6 +856,8 @@
>  		flags |= CFLGS_OPTIMIZE;
>
>  	cachep->flags = flags;
> +	cachep->pruner = NULL;
> +	cachep->count = 0;
>  	cachep->gfpflags = 0;
>  	if (flags & SLAB_CACHE_DMA)
>  		cachep->gfpflags |= GFP_DMA;
> @@ -946,11 +1024,13 @@
>
>  	drain_cpu_caches(cachep);
>
> +	spin_lock(&pagemap_lru_lock);
>  	spin_lock_irq(&cachep->spinlock);
>  	__kmem_cache_shrink_locked(cachep);
>  	ret = !list_empty(&cachep->slabs_full) ||
>  		!list_empty(&cachep->slabs_partial);
>  	spin_unlock_irq(&cachep->spinlock);
> +	spin_unlock(&pagemap_lru_lock);
>  	return ret;
>  }
>
> @@ -959,7 +1039,7 @@
>   * @cachep: The cache to shrink.
>   *
>   * Releases as many slabs as possible for a cache.
> - * Returns number of pages released.
> + * Returns number of pages removed from the cache.
>   */
>  int kmem_cache_shrink(kmem_cache_t *cachep)
>  {
> @@ -969,14 +1049,53 @@
>  		BUG();
>
>  	drain_cpu_caches(cachep);
> -
> +
> +	spin_lock(&pagemap_lru_lock);
>  	spin_lock_irq(&cachep->spinlock);
>  	ret = __kmem_cache_shrink_locked(cachep);
>  	spin_unlock_irq(&cachep->spinlock);
> +	spin_unlock(&pagemap_lru_lock);
>
> -	return ret << cachep->gfporder;
> +	return ret<<cachep->gfporder;
>  }
>
> +
> +/*
> + * Used by refill_inactive_zone to try to shrink a cache.  The
> + * method we use to shrink depends on if we have added nonlru
> + * pages since the last time we shrunk this cache.
> + * - shrink works and we return the pages shrunk
> + * - shrink fails because the slab is in use, we return 0
> + * called with pagemap_lru_lock held.
> + */
> +int kmem_shrink_slab(struct page *page)
> +{
> +	kmem_cache_t *cachep = GET_PAGE_CACHE(page);
> +	slab_t *slabp = GET_PAGE_SLAB(page);
> +
> +	spin_lock_irq(&cachep->spinlock);
> +	if (!slabp->inuse) {
> +	 	if (!cachep->growing) {
> +			if (cachep->dflags & DFLGS_NONLRU) {
> +				int nr = __kmem_cache_shrink_locked(cachep);
> +				cachep->dflags &= ~DFLGS_NONLRU;
> +				spin_unlock_irq(&cachep->spinlock);
> +				return nr<<cachep->gfporder;
> +			} else {
> +				list_del(&slabp->list);
> +				spin_unlock_irq(&cachep->spinlock);
> +				kmem_slab_destroy(cachep, slabp);
> +				return 1<<cachep->gfporder;
> +			}
> +			if (PageActive(page))
> +				BUG();
> +		}
> +	}
> +	spin_unlock_irq(&cachep->spinlock);
> +	return 0;
> +}
> +
> +
>  /**
>   * kmem_cache_destroy - delete a cache
>   * @cachep: the cache to destroy
> @@ -1106,7 +1225,7 @@
>  	struct page	*page;
>  	void		*objp;
>  	size_t		 offset;
> -	unsigned int	 i, local_flags;
> +	unsigned int	 i, local_flags, locked = 0;
>  	unsigned long	 ctor_flags;
>  	unsigned long	 save_flags;
>
> @@ -1163,6 +1282,21 @@
>  	if (!(objp = kmem_getpages(cachep, flags)))
>  		goto failed;
>
> +	/*
> +	 * We want the pagemap_lru_lock, in UP spin locks to not
> +	 * protect us in interrupt context...  In SMP they do but,
> +	 * optimizating for speed, we process if we do not get it.
> +	 */
> +	if (!(cachep->flags & SLAB_NO_REAP)) {
> +#ifdef CONFIG_SMP
> +		locked = spin_trylock(&pagemap_lru_lock);
> +#else
> +		locked = !in_interrupt() && spin_trylock(&pagemap_lru_lock);
> +#endif
> +		if (!locked && !in_interrupt())
> +			goto opps1;
> +	}
> +
>  	/* Get slab management. */
>  	if (!(slabp = kmem_cache_slabmgmt(cachep, objp, offset, local_flags)))
>  		goto opps1;
> @@ -1174,9 +1308,15 @@
>  		SET_PAGE_CACHE(page, cachep);
>  		SET_PAGE_SLAB(page, slabp);
>  		PageSetSlab(page);
> +		set_page_count(page, 1);
> +		if (locked)
> +			add_page_to_active_list(page);
>  		page++;
>  	} while (--i);
>
> +	if (locked)
> +		spin_unlock(&pagemap_lru_lock);
> +
>  	kmem_cache_init_objs(cachep, slabp, ctor_flags);
>
>  	spin_lock_irqsave(&cachep->spinlock, save_flags);
> @@ -1187,10 +1327,15 @@
>  	STATS_INC_GROWN(cachep);
>  	cachep->failures = 0;
>
> +	/* The pagemap_lru_lock was not quickly/safely available */
> +	if (!locked && !(cachep->flags & SLAB_NO_REAP))
> +		cachep->dflags |= DFLGS_NONLRU;
> +
>  	spin_unlock_irqrestore(&cachep->spinlock, save_flags);
>  	return 1;
>  opps1:
> -	kmem_freepages(cachep, objp);
> +	/* do not use kmem_freepages - we are not in the lru yet... */
> +	free_pages((unsigned long)objp, cachep->gfporder);
>  failed:
>  	spin_lock_irqsave(&cachep->spinlock, save_flags);
>  	cachep->growing--;
> @@ -1255,6 +1400,7 @@
>  		list_del(&slabp->list);
>  		list_add(&slabp->list, &cachep->slabs_full);
>  	}
> +	kmem_touch_page(objp);
>  #if DEBUG
>  	if (cachep->flags & SLAB_POISON)
>  		if (kmem_check_poison_obj(cachep, objp))
> @@ -1816,6 +1962,7 @@
>
>  	spin_lock_irq(&best_cachep->spinlock);
>  perfect:
> +	spin_lock(&pagemap_lru_lock);
>  	/* free only 50% of the free slabs */
>  	best_len = (best_len + 1)/2;
>  	for (scan = 0; scan < best_len; scan++) {
> @@ -1841,6 +1988,7 @@
>  		kmem_slab_destroy(best_cachep, slabp);
>  		spin_lock_irq(&best_cachep->spinlock);
>  	}
> +	spin_unlock(&pagemap_lru_lock);
>  	spin_unlock_irq(&best_cachep->spinlock);
>  	ret = scan * (1 << best_cachep->gfporder);
>  out:
> diff -Nru a/mm/vmscan.c b/mm/vmscan.c
> --- a/mm/vmscan.c	Mon Jun  3 21:01:57 2002
> +++ b/mm/vmscan.c	Mon Jun  3 21:01:57 2002
> @@ -136,6 +136,12 @@
>  			goto found_page;
>  		}
>
> +		/* page just has the flag, its not in any cache/slab */
> +		if (PageSlab(page)) {
> +			PageClearSlab(page);
> +			goto found_page;
> +		}
> +
>  		/* We should never ever get here. */
>  		printk(KERN_ERR "VM: reclaim_page, found unknown page\n");
>  		list_del(page_lru);
> @@ -264,6 +270,10 @@
>  		if (unlikely(TryLockPage(page)))
>  			continue;
>
> +		/* Slab pages should never get here... */
> +		if (PageSlab(page))
> +			BUG();
> +
>  		/*
>  		 * The page is in active use or really unfreeable. Move to
>  		 * the active list and adjust the page age if needed.
> @@ -469,6 +479,7 @@
>   * This function will scan a portion of the active list of a zone to find
>   * unused pages, those pages will then be moved to the inactive list.
>   */
> +
>  int refill_inactive_zone(struct zone_struct * zone, int priority)
>  {
>  	int maxscan = zone->active_pages >> priority;
> @@ -506,7 +517,7 @@
>  		 * both PG_locked and the pte_chain_lock are held.
>  		 */
>  		pte_chain_lock(page);
> -		if (!page_mapping_inuse(page)) {
> +		if (!page_mapping_inuse(page) && !PageSlab(page)) {
>  			pte_chain_unlock(page);
>  			UnlockPage(page);
>  			drop_page(page);
> @@ -523,6 +534,31 @@
>  		}
>
>  		/*
> +		 * For slab pages we count entries for caches with their
> +		 * own pruning/aging method.  If we can count a page or
> +		 * its cold we try to free it.  We only use one aging
> +		 * method otherwise we end up with caches with lots
> +		 * of free pages...  kmem_shrink_slab frees slab(s)
> +		 * and moves the page(s) to the inactive clean list.
> +		 */
> +		if (PageSlab(page)) {
> +			pte_chain_unlock(page);
> +			UnlockPage(page);
> +			if (kmem_count_page(page, !page->age)) {
> +				int pages = kmem_shrink_slab(page);
> +				if (pages) {
> +					nr_deactivated += pages;
> +					if (nr_deactivated > target)
> +						goto done;
> +					continue;
> +				}
> +			}
> +			list_del(page_lru);
> +			list_add(page_lru, &zone->active_list);
> +			continue;
> +		}
> +
> +		/*
>  		 * If the page age is 'hot' and the process using the
>  		 * page doesn't exceed its RSS limit we keep the page.
>  		 * Otherwise we move it to the inactive_dirty list.
> @@ -555,6 +591,7 @@
>  	return nr_deactivated;
>  }
>
> +
>  /**
>   * refill_inactive - checks all zones and refills the inactive list as
> needed *
> @@ -619,24 +656,15 @@
>
>  	/*
>  	 * Eat memory from filesystem page cache, buffer cache,
> -	 * dentry, inode and filesystem quota caches.
>  	 */
>  	ret += page_launder(gfp_mask);
> -	ret += shrink_dcache_memory(DEF_PRIORITY, gfp_mask);
> -	ret += shrink_icache_memory(1, gfp_mask);
> -#ifdef CONFIG_QUOTA
> -	ret += shrink_dqcache_memory(DEF_PRIORITY, gfp_mask);
> -#endif
>
>  	/*
> -	 * Move pages from the active list to the inactive list.
> +	 * Move pages from the active list to the inactive list,
> +	 * then prune the prunable caches, aging them.
>  	 */
>  	refill_inactive();
> -
> -	/*
> -	 * Reclaim unused slab cache memory.
> -	 */
> -	ret += kmem_cache_reap(gfp_mask);
> +	kmem_do_prunes(gfp_mask);
>
>  	refill_freelist();
>
> @@ -645,11 +673,13 @@
>  		run_task_queue(&tq_disk);
>
>  	/*
> -	 * Hmm.. Cache shrink failed - time to kill something?
> +	 * Hmm.. - time to kill something?
>  	 * Mhwahahhaha! This is the part I really like. Giggle.
>  	 */
> -	if (!ret && free_min(ANY_ZONE) > 0)
> -		out_of_memory();
> +	if (!ret && free_min(ANY_ZONE) > 0) {
> +		if (!kmem_cache_reap(gfp_mask))
> +			out_of_memory();
> +	}
>
>  	return ret;
>  }
> @@ -740,6 +770,7 @@
>
>  			/* Do background page aging. */
>  			background_aging(DEF_PRIORITY);
> +			kmem_do_prunes(GFP_KSWAPD);
>  		}
>
>  		wakeup_memwaiters();
>
> ------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/