2001-11-01 01:41:30

by Vijay Gadad

[permalink] [raw]
Subject: Slab Allocator Leak?

I have a script (below) which creates a file, deletes it, and loops. In
/proc/slabinfo, I see many dentry_cache slabs being created - to the point
where nearly all free memory is consumed. The system is still usable, as
the slab allocator responds to memory pressure and releases some of these
dentry_cache slabs.

If I understand the USENIX paper correctly, a single dentry_cache slab
buffer should be sufficient for my case, as the system will reuse that
buffer for the new dentry.

Furthermore, this memory consumption only occurs with different filenames.
If I simply do a "while [ 1 ] ; do touch myfile ; rm myfile ; done", the
extra dentry_cache slabs do not get created.

I've been able to reproduce this under 2.4.2, 2.4.9, 2.4.13, and someone
else confirmed they saw this behavior under 2.4.13-ac3.

Thanks.


Vijay Gadad
[email protected]


-----CUT-----
#!/bin/sh

COUNTER=2000000000

while [ $COUNTER -gt 0 ]
do
touch testfile.$COUNTER
rm -f testfile.$COUNTER
COUNTER=`expr $COUNTER - 1`
done
-----CUT-----


# cat /proc/slabinfo
slabinfo - version: 1.1
kmem_cache 54 78 100 2 2 1
tcp_tw_bucket 0 0 96 0 0 1
tcp_bind_bucket 29 113 32 1 1 1
tcp_open_request 0 0 64 0 0 1
inet_peer_cache 0 0 64 0 0 1
ip_fib_hash 10 113 32 1 1 1
ip_dst_cache 14 24 160 1 1 1
arp_cache 3 30 128 1 1 1
blkdev_requests 256 280 96 7 7 1
dnotify cache 0 0 20 0 0 1
file lock cache 2 42 92 1 1 1
fasync cache 0 0 16 0 0 1
uid_cache 2 113 32 1 1 1
skbuff_head_cache 54 72 160 3 3 1
sock 82 90 832 10 10 2
sigqueue 1 29 132 1 1 1
cdev_cache 495 826 64 12 14 1
bdev_cache 6605 6962 64 114 118 1
mnt_cache 14 59 64 1 1 1
inode_cache 10809 13632 480 1703 1704 1
dentry_cache 253205 253230 128 8441 8441 1
filp 553 560 96 14 14 1
names_cache 0 7 4096 0 7 1
buffer_head 42229 42280 96 1056 1057 1
mm_struct 29 48 160 2 2 1
vm_area_struct 772 1062 64 14 18 1
fs_cache 28 59 64 1 1 1
files_cache 28 36 416 4 4 1
signal_act 32 33 1312 11 11 1
size-131072(DMA) 0 0 131072 0 0 32
size-131072 0 0 131072 0 0 32
size-65536(DMA) 0 0 65536 0 0 16
size-65536 0 0 65536 0 0 16
size-32768(DMA) 0 0 32768 0 0 8
size-32768 0 0 32768 0 0 8
size-16384(DMA) 0 0 16384 0 0 4
size-16384 0 0 16384 0 0 4
size-8192(DMA) 0 0 8192 0 0 2
size-8192 0 0 8192 0 0 2
size-4096(DMA) 0 0 4096 0 0 1
size-4096 75 76 4096 75 76 1
size-2048(DMA) 0 0 2048 0 0 1
size-2048 38 42 2048 20 21 1
size-1024(DMA) 0 0 1024 0 0 1
size-1024 25 28 1024 7 7 1
size-512(DMA) 0 0 512 0 0 1
size-512 29 32 512 4 4 1
size-256(DMA) 0 0 256 0 0 1
size-256 8 15 256 1 1 1
size-128(DMA) 0 0 128 0 0 1
size-128 439 450 128 15 15 1
size-64(DMA) 0 0 64 0 0 1
size-64 114 236 64 3 4 1
size-32(DMA) 0 0 32 0 0 1
size-32 251432 261595 32 2226 2315 1


2001-11-01 06:16:06

by Andreas Dilger

[permalink] [raw]
Subject: Re: Slab Allocator Leak?

On Oct 31, 2001 17:42 -0800, Vijay Gadad wrote:
> I have a script (below) which creates a file, deletes it, and loops. In
> /proc/slabinfo, I see many dentry_cache slabs being created - to the point
> where nearly all free memory is consumed. The system is still usable, as
> the slab allocator responds to memory pressure and releases some of these
> dentry_cache slabs.

I had created a patch a few months ago to address this problem. It put
new negative dentries at the head of the dentry list, so if there was
memory pressure they would be freed first.

This meant that in a situation as you describe, the VM can free a large
number of negative dentries quickly, because the head of the list was
likely to contain a large number of unreferenced negative dentries, which
I changed it to allow in the __GFP_FS context (currently it can't do this
because of a potential deadlock).

The question is, is this really needed? Maybe. Being able to free
negative dentries from the dcache within a __GFP_FS context may be
useful under memory pressure, and can't really hurt. Putting negative
dentries at the head of the dentry list as unreferenced _should_ be OK,
as under normal circumstances (e.g. PATH searches) you would likely
reference the negative dentry relatively quickly again, which would
put it at the end of the list when the next prune_dcache call came.

The below patch is extracted from my current kernel - I have been running
with it since I created the original patch in 2.4.7, so it is pretty safe.

Cheers, Andreas
=========================================================================
--- linux.orig/fs/dcache.c Thu Oct 25 01:50:30 2001
+++ linux/fs/dcache.c Thu Oct 25 00:02:58 2001
@@ -137,7 +137,16 @@
/* Unreachable? Get rid of it */
if (list_empty(&dentry->d_hash))
goto kill_it;
- list_add(&dentry->d_lru, &dentry_unused);
+ if (dentry->d_inode) {
+ list_add(&dentry->d_lru, &dentry_unused);
+ } else {
+ /* Put an unused negative inode to the end of the list. If it
+ * is not referenced again before we need to free some memory,
+ * it will be the first to be freed (2Q algorithm, I believe).
+ */
+ dentry->d_vfs_flags &= ~DCACHE_REFERENCED;
+ list_add_tail(&dentry->d_lru, &dentry_unused);
+ }
dentry_stat.nr_unused++;
spin_unlock(&dcache_lock);
return;
@@ -306,8 +315,9 @@
}

/**
- * prune_dcache - shrink the dcache
+ * _prune_dcache - shrink the dcache
* @count: number of entries to try and free
+ * @gfp_mask: context under which we are trying to free memory
*
* Shrink the dcache. This is done when we need
* more memory, or simply when we need to unmount
@@ -318,7 +328,7 @@
* all the dentries are in use.
*/

-void prune_dcache(int count)
+void _prune_dcache(int count, unsigned int gfp_mask)
{
spin_lock(&dcache_lock);
for (;;) {
@@ -329,15 +339,32 @@

if (tmp == &dentry_unused)
break;
- list_del_init(tmp);
dentry = list_entry(tmp, struct dentry, d_lru);

/* If the dentry was recently referenced, don't free it. */
if (dentry->d_vfs_flags & DCACHE_REFERENCED) {
+ list_del_init(tmp);
dentry->d_vfs_flags &= ~DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
continue;
}
+
+ /*
+ * Nasty deadlock avoidance.
+ *
+ * ext2_new_block->getblk->GFP->shrink_dcache_memory->
+ * prune_dcache->prune_one_dentry->dput->dentry_iput->iput->
+ * inode->i_sb->s_op->put_inode->ext2_discard_prealloc->
+ * ext2_free_blocks->lock_super->DEADLOCK.
+ *
+ * We should make sure we don't hold the superblock lock over
+ * block allocations, but for now we will only free unused
+ * negative dentries (which are added at the end of the list).
+ * It is safe to call prune_one_dentry() on a negative dentry
+ * even with GFP_FS, because dentry_iput() is a no-op in this
+ * case, and no chance of calling into the filesystem.
+ *
+ * I'm not sure if the d_release check is necessary to avoid
+ * deadlock in d_free(), but better to be safe for now.
+ */
+ if (((dentry->d_op && dentry->d_op->d_release) ||
+ dentry->d_inode) && !(gfp_mask & __GFP_FS))
+ break;
+
+ list_del_init(tmp);
dentry_stat.nr_unused--;

/* Unused dentry with a count? */
@@ -351,6 +378,11 @@
spin_unlock(&dcache_lock);
}

+void prune_dcache(int count)
+{
+ _prune_dcache(count, __GFP_FS);
+}
+
/*
* Shrink the dcache for the specified super block.
* This allows us to unmount a device without disturbing
@@ -549,26 +581,11 @@
*/
int shrink_dcache_memory(int priority, unsigned int gfp_mask)
{
- int count = 0;
-
- /*
- * Nasty deadlock avoidance.
- *
- * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache->
- * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op->
- * put_inode->ext2_discard_prealloc->ext2_free_blocks->lock_super->
- * DEADLOCK.
- *
- * We should make sure we don't hold the superblock lock over
- * block allocations, but for now:
- */
- if (!(gfp_mask & __GFP_FS))
- return 0;
-
- count = dentry_stat.nr_unused / priority;
+ int count = dentry_stat.nr_unused / (priority + 1);

- prune_dcache(count);
+ _prune_dcache(count, gfp_mask);
kmem_cache_shrink(dentry_cache);
+
return 0;
}

--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/