2005-09-21 16:26:44

by Chris Friesen

[permalink] [raw]
Subject: dentry_cache using up all my zone normal memory


I'm running 2.6.10 on a pentium-M system with 3 gig of RAM. I'm running
with NFS root, no swap.

Normally at idle I use about 20MB of memory.

When I run LTP everything is fine until it hits the rename14 test.
Invariably during that test the OOM killer kicks in.

With a bit of digging the culprit appears to be the dentry_cache. The
last log I have shows it using 817MB of memory. Right after that the
oom killer kicked me off the system. When I logged back in, the cache
usage was back down to normal and everything was fine.

Anyone have any suggestions?

Chris


2005-09-21 16:53:26

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory


On Marcelo's suggestion, I modified the oom killer code to dump the
stack. The dmesg output of the first process being killed is as follows:


oom-killer: gfp_mask=0xd0
[<c0137701>] out_of_memory+0x51/0xb0
[<c0138888>] __alloc_pages+0x3d8/0x3f0
[<c01388bb>] __get_free_pages+0x1b/0x40
[<c013bbc2>] kmem_getpages+0x22/0xb0
[<c013c7d9>] cache_grow+0xa9/0x170
[<c013ca4d>] cache_alloc_refill+0x1ad/0x1f0
[<c013cc78>] kmem_cache_alloc+0x38/0x40
[<c01655f1>] d_alloc+0x21/0x170
[<c015b15d>] cached_lookup+0x7d/0x90
[<c015c53d>] __lookup_hash+0x8d/0xe0
[<c015c5ad>] lookup_hash+0x1d/0x30
[<c015e737>] sys_rename+0x137/0x230
[<c01106e0>] recalc_task_prio+0xc0/0x1c0
[<c010250d>] sysenter_past_esp+0x52/0x75
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16

Free pages: 2484556kB (2480576kB HighMem)
Active:1787 inactive:419 dirty:2 writeback:0 unstable:0 free:621139
slab:217642 mapped:1511 pagetables:102
DMA free:68kB min:68kB low:84kB high:100kB active:0kB inactive:0kB
present:16384kB pages_scanned:0 all_unreclaimable? yes
protections[]: 0 0 0
Normal free:3912kB min:3756kB low:4692kB high:5632kB active:0kB
inactive:0kB present:901120kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
HighMem free:2480576kB min:512kB low:640kB high:768kB active:7148kB
inactive:1676kB present:3276800kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 68kB
Normal: 2*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB
1*1024kB 1*2048kB 0*4096kB = 3912kB
HighMem: 244*4kB 116*8kB 73*16kB 66*32kB 44*64kB 25*128kB 16*256kB
3*512kB 0*1024kB 1*2048kB 601*4096kB = 2480576kB
Out of Memory: Killed process 595 (portmap).

2005-09-21 19:46:50

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2


Just for kicks I tried with 2.6.14-rc2, and got the same behaviour.
/proc/slabinfo gives the following two high-runners within a second of
the oom-killer running:

dentry_cache 3894397 3894961 136 29 1 : tunables 120 60
0 : slabdata 134307 134309 0
filp 1216820 1216980 192 20 1 : tunables 120 60
0 : slabdata 60844 60849 0


The oom killer gave the following output:


oom-killer: gfp_mask=0xd0, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1 used:2
cpu 0 cold: low 0, high 2, batch 1 used:0
Normal per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:92
cpu 0 cold: low 0, high 62, batch 31 used:0
HighMem per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:161
cpu 0 cold: low 0, high 62, batch 31 used:19
Free pages: 2486564kB (2479256kB HighMem)
Active:2064 inactive:392 dirty:2 writeback:0 unstable:0 free:621641
slab:216703 mapped:1715 pagetables:91
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB
present:16384kB pages_scanned:18 all_unreclaimable? yes
lowmem_reserve[]: 0 880 4080
Normal free:3720kB min:3756kB low:4692kB high:5632kB active:0kB
inactive:0kB present:901120kB pages_scanned:18 all_unreclaimable? yes
lowmem_reserve[]: 0 0 25600
HighMem free:2479256kB min:512kB low:640kB high:768kB active:8256kB
inactive:1568kB present:3276800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB
1*2048kB 0*4096kB = 3588kB
Normal: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB
1*1024kB 1*2048kB 0*4096kB = 3720kB
HighMem: 270*4kB 156*8kB 108*16kB 92*32kB 61*64kB 28*128kB 4*256kB
0*512kB 0*1024kB 1*2048kB 601*4096kB = 2479256kB
Free swap: 0kB
1048576 pages of RAM
819200 pages of HIGHMEM
205960 reserved pages
4543 pages shared
0 pages swap cached
2 pages dirty
0 pages writeback
1715 pages mapped
216546 pages slab
91 pages pagetables
Out of Memory: Killed process 444 (portmap).
oom-killer: gfp_mask=0xd0, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1 used:2
cpu 0 cold: low 0, high 2, batch 1 used:0
Normal per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:92
cpu 0 cold: low 0, high 62, batch 31 used:0
HighMem per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:182
cpu 0 cold: low 0, high 62, batch 31 used:19
Free pages: 2486564kB (2479256kB HighMem)
Active:2043 inactive:392 dirty:7 writeback:0 unstable:0 free:621641
slab:216707 mapped:1681 pagetables:88
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB
present:16384kB pages_scanned:34 all_unreclaimable? yes
lowmem_reserve[]: 0 880 4080
Normal free:3720kB min:3756kB low:4692kB high:5632kB active:0kB
inactive:0kB present:901120kB pages_scanned:16 all_unreclaimable? yes
lowmem_reserve[]: 0 0 25600
HighMem free:2479256kB min:512kB low:640kB high:768kB active:8172kB
inactive:1568kB present:3276800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB
1*2048kB 0*4096kB = 3588kB
Normal: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 1*128kB 0*256kB 1*512kB
1*1024kB 1*2048kB 0*4096kB = 3720kB
HighMem: 270*4kB 156*8kB 108*16kB 92*32kB 61*64kB 28*128kB 4*256kB
0*512kB 0*1024kB 1*2048kB 601*4096kB = 2479256kB
Free swap: 0kB
1048576 pages of RAM
819200 pages of HIGHMEM
205960 reserved pages
4479 pages shared
0 pages swap cached
7 pages dirty
0 pages writeback
1681 pages mapped
216521 pages slab
88 pages pagetables
Out of Memory: Killed process 803 (sshd).


Chris

2005-09-21 20:12:48

by Sonny Rao

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Wed, Sep 21, 2005 at 01:46:35PM -0600, Christopher Friesen wrote:
>
> Just for kicks I tried with 2.6.14-rc2, and got the same behaviour.
> /proc/slabinfo gives the following two high-runners within a second of
> the oom-killer running:
>
> dentry_cache 3894397 3894961 136 29 1 : tunables 120 60
> 0 : slabdata 134307 134309 0
> filp 1216820 1216980 192 20 1 : tunables 120 60
> 0 : slabdata 60844 60849 0
>
>

If I'm reading this correctly, you seem to have about 1.2 million
files open and about 3.9 million dentrys objects in lowmem with almost
no fragmentation.. for those files which are open there certainly
will be a dentry attached to the inode (how big is inode cache?), but
the shrinker should be trying to reclaim memory from the other 2.7
million objects I would think.

Based on the lack of fragmentation I would guess that either the shrinker isn't
running or those dentrys are otherwise pinned somehow (parent
directorys of the open files?) What does the directory structure look
like?

Just for kicks (again), have you tried ratcheting up the
/proc/sys/vm/vfs_cache_pressure tunable by a few orders of magnitude ?

Probably go from the default of 100 to say 10000 to start.

Over one million files open at once is just asking for trouble on a
lowmem-crippled x86 machine, IMHO.

Sonny

2005-09-21 20:27:56

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Sonny Rao wrote:

> If I'm reading this correctly, you seem to have about 1.2 million
> files open and about 3.9 million dentrys objects in lowmem with almost
> no fragmentation.. for those files which are open there certainly
> will be a dentry attached to the inode (how big is inode cache?), but
> the shrinker should be trying to reclaim memory from the other 2.7
> million objects I would think.

I don't know what the code is actually doing. This is testcase
"rename14" from the LTP suite. It runs fine on ppc, ppc64, dual-xeon,
and Xscale.

The inode cache is small...under 300 objects.

> Based on the lack of fragmentation I would guess that either the shrinker isn't
> running or those dentrys are otherwise pinned somehow (parent
> directorys of the open files?) What does the directory structure look
> like?

No idea.

> Just for kicks (again), have you tried ratcheting up the
> /proc/sys/vm/vfs_cache_pressure tunable by a few orders of magnitude ?

Nope. I'm currently rebooting with an instrumentation patch for dentry,
may try this too.

Chris

2005-09-21 20:50:15

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Sonny Rao wrote:

> Just for kicks (again), have you tried ratcheting up the
> /proc/sys/vm/vfs_cache_pressure tunable by a few orders of magnitude ?

Boosting it up to 1000000 had no effect.

Chris

2005-09-21 20:59:51

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Sonny Rao wrote:

> Over one million files open at once is just asking for trouble on a
> lowmem-crippled x86 machine, IMHO.

I don't think there actually are. I ran the testcase under strace, and
it appears that there are two threads going at once.

thread 1 spins doing the following:
fd = creat("./rename14", 0666);
unlink("./rename14");
close(fd);

thread 2 spins doing:
rename("./rename14", "./rename14xyz");

Chris

2005-09-21 21:03:45

by Sonny Rao

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Wed, Sep 21, 2005 at 02:59:30PM -0600, Christopher Friesen wrote:
> Sonny Rao wrote:
>
> >Over one million files open at once is just asking for trouble on a
> >lowmem-crippled x86 machine, IMHO.
>
> I don't think there actually are. I ran the testcase under strace, and
> it appears that there are two threads going at once.
>
> thread 1 spins doing the following:
> fd = creat("./rename14", 0666);
> unlink("./rename14");
> close(fd);
>
> thread 2 spins doing:
> rename("./rename14", "./rename14xyz");

Try running lsof and grepping for the process name

2005-09-21 21:06:21

by Dipankar Sarma

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Wed, Sep 21, 2005 at 02:59:30PM -0600, Christopher Friesen wrote:
> Sonny Rao wrote:
>
> >Over one million files open at once is just asking for trouble on a
> >lowmem-crippled x86 machine, IMHO.
>
> I don't think there actually are. I ran the testcase under strace, and
> it appears that there are two threads going at once.
>
> thread 1 spins doing the following:
> fd = creat("./rename14", 0666);
> unlink("./rename14");
> close(fd);
>
> thread 2 spins doing:
> rename("./rename14", "./rename14xyz");

Ewww.. Looks like a leak due to a race.

Does this happen on a non-nfs filesystem ?

Thanks
Dipankar

2005-09-21 21:14:49

by Al Viro

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Thu, Sep 22, 2005 at 02:30:20AM +0530, Dipankar Sarma wrote:
> > unlink("./rename14");
> > close(fd);
> >
> > thread 2 spins doing:
> > rename("./rename14", "./rename14xyz");
>
> Ewww.. Looks like a leak due to a race.

Looks like sillyrename being b0rken, if that's NFS...

2005-09-21 21:15:05

by Trond Myklebust

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

to den 22.09.2005 Klokka 02:30 (+0530) skreiv Dipankar Sarma:
> On Wed, Sep 21, 2005 at 02:59:30PM -0600, Christopher Friesen wrote:
> > Sonny Rao wrote:
> >
> > >Over one million files open at once is just asking for trouble on a
> > >lowmem-crippled x86 machine, IMHO.
> >
> > I don't think there actually are. I ran the testcase under strace, and
> > it appears that there are two threads going at once.
> >
> > thread 1 spins doing the following:
> > fd = creat("./rename14", 0666);
> > unlink("./rename14");
> > close(fd);
> >
> > thread 2 spins doing:
> > rename("./rename14", "./rename14xyz");
>
> Ewww.. Looks like a leak due to a race.
>
> Does this happen on a non-nfs filesystem ?

...and what is "an NFS filesystem"? v2, v3, v4, ...?

Note also that the above test will be heavily exercising the sillyrename
code. Is Chris perhaps seeing vast numbers of .nfs* files resulting from
this test?

Also note that one change went into 2.6.13 that causes the icache to
evict inodes corresponding to files that end up being deleted by
nfs_rename(). Previously, they could end up floating around in the
icache because the i_nlink count was non-zero.

Cheers,
Trond

2005-09-21 21:25:25

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Dipankar Sarma wrote:

> Does this happen on a non-nfs filesystem ?

Digging in a bit more, it looks like the files are being
created/destroyed/renamed in /tmp, which is a tmpfs filesystem.

Chris

2005-09-21 21:26:19

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Trond Myklebust wrote:

> ...and what is "an NFS filesystem"? v2, v3, v4, ...?

I think the root filesystem is v3. But as I mentioned, I just realized
the file manipulation was happening on a tmpfs filesystem.

Chris

2005-09-21 21:29:41

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Friesen, Christopher [CAR:VC21:EXCH] wrote:
> Trond Myklebust wrote:
>
>> ...and what is "an NFS filesystem"? v2, v3, v4, ...?
>
>
> I think the root filesystem is v3. But as I mentioned, I just realized
> the file manipulation was happening on a tmpfs filesystem.

As a data point, the LTP test does not trigger the oom killer if /tmp is
part of the root filesystem and not a separate tmpfs partition.

Chris

2005-09-21 22:03:43

by Roland Dreier

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Christopher> Digging in a bit more, it looks like the files are
Christopher> being created/destroyed/renamed in /tmp, which is a
Christopher> tmpfs filesystem.

Hmm... could there be a race in shmem_rename()??

- R.

2005-09-22 03:12:38

by Al Viro

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Wed, Sep 21, 2005 at 03:03:33PM -0700, Roland Dreier wrote:
> Christopher> Digging in a bit more, it looks like the files are
> Christopher> being created/destroyed/renamed in /tmp, which is a
> Christopher> tmpfs filesystem.
>
> Hmm... could there be a race in shmem_rename()??

Not likely - in that setup all calls of ->unlink() and ->rename()
are completely serialized by ->i_sem on parent. One question:
is it dcache or icache that ends up leaking?

2005-09-22 03:54:42

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Al Viro wrote:

>>Hmm... could there be a race in shmem_rename()??

> Not likely - in that setup all calls of ->unlink() and ->rename()
> are completely serialized by ->i_sem on parent. One question:
> is it dcache or icache that ends up leaking?

dcache. Here's some information I sent to dipankar earlier, with his
debug patch applied. This is within half a second of the oom killer
kicking in.

/proc/sys/fs/dentry-state:
1185 415 45 0 0 0

/proc/meminfo:
MemTotal: 3366368 kB
MemFree: 2522660 kB
Buffers: 0 kB
Cached: 8040 kB
SwapCached: 0 kB
Active: 9372 kB
Inactive: 2544 kB
HighTotal: 2489836 kB
HighFree: 2477520 kB
LowTotal: 876532 kB
LowFree: 45140 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 4 kB
Writeback: 0 kB
Mapped: 6476 kB
Slab: 824648 kB
CommitLimit: 1683184 kB
Committed_AS: 18624 kB
PageTables: 336 kB
VmallocTotal: 114680 kB
VmallocUsed: 444 kB
VmallocChunk: 114036 kB
pages_with_[ 0]_dentries: 0
pages_with_[ 1]_dentries: 3
pages_with_[ 2]_dentries: 2
pages_with_[ 3]_dentries: 2
pages_with_[ 4]_dentries: 1
pages_with_[ 5]_dentries: 4
pages_with_[ 6]_dentries: 3
pages_with_[ 7]_dentries: 3
pages_with_[ 8]_dentries: 2
pages_with_[ 9]_dentries: 2
pages_with_[10]_dentries: 1
pages_with_[11]_dentries: 0
pages_with_[12]_dentries: 2
pages_with_[13]_dentries: 1
pages_with_[14]_dentries: 0
pages_with_[15]_dentries: 1
pages_with_[16]_dentries: 0
pages_with_[17]_dentries: 0
pages_with_[18]_dentries: 3
pages_with_[19]_dentries: 1
pages_with_[20]_dentries: 2
pages_with_[21]_dentries: 1
pages_with_[22]_dentries: 2
pages_with_[23]_dentries: 1
pages_with_[24]_dentries: 0
pages_with_[25]_dentries: 0
pages_with_[26]_dentries: 0
pages_with_[27]_dentries: 0
pages_with_[28]_dentries: 0
pages_with_[29]_dentries: 139491
dcache_pages total: 139528
prune_dcache: requested 1 freed 1
dcache lru list data:
dentries total: 416
dentries in_use: 36
dentries free: 380
dentries referenced: 416
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB

/proc/slabinfo:
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata
<active_slabs> <num_slabs> <sharedavail>
ip_fib_alias 11 113 32 113 1 : tunables 120 60 0
: slabdata 1 1 0
ip_fib_hash 11 113 32 113 1 : tunables 120 60 0
: slabdata 1 1 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0
: slabdata 4 4 0
rpc_tasks 20 20 192 20 1 : tunables 120 60 0
: slabdata 1 1 0
rpc_inode_cache 8 9 448 9 1 : tunables 54 27 0
: slabdata 1 1 0
xfrm6_tunnel_spi 0 0 64 59 1 : tunables 120 60 0
: slabdata 0 0 0
fib6_nodes 5 113 32 113 1 : tunables 120 60 0
: slabdata 1 1 0
ip6_dst_cache 4 15 256 15 1 : tunables 120 60 0
: slabdata 1 1 0
ndisc_cache 1 20 192 20 1 : tunables 120 60 0
: slabdata 1 1 0
RAWv6 3 6 640 6 1 : tunables 54 27 0
: slabdata 1 1 0
UDPv6 0 0 576 7 1 : tunables 54 27 0
: slabdata 0 0 0
tw_sock_TCPv6 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
request_sock_TCPv6 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
TCPv6 4 7 1088 7 2 : tunables 24 12 0
: slabdata 1 1 0
UNIX 5 20 384 10 1 : tunables 54 27 0
: slabdata 2 2 0
ip_mrt_cache 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
tcp_bind_bucket 3 203 16 203 1 : tunables 120 60 0
: slabdata 1 1 0
inet_peer_cache 2 59 64 59 1 : tunables 120 60 0
: slabdata 1 1 0
secpath_cache 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
xfrm_dst_cache 0 0 320 12 1 : tunables 54 27 0
: slabdata 0 0 0
ip_dst_cache 7 15 256 15 1 : tunables 120 60 0
: slabdata 1 1 0
arp_cache 2 30 128 30 1 : tunables 120 60 0
: slabdata 1 1 0
RAW 2 9 448 9 1 : tunables 54 27 0
: slabdata 1 1 0
UDP 1 7 512 7 1 : tunables 54 27 0
: slabdata 1 1 0
tw_sock_TCP 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
request_sock_TCP 0 0 64 59 1 : tunables 120 60 0
: slabdata 0 0 0
TCP 2 4 960 4 1 : tunables 54 27 0
: slabdata 1 1 0
flow_cache 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
cfq_ioc_pool 0 0 48 78 1 : tunables 120 60 0
: slabdata 0 0 0
cfq_pool 0 0 96 40 1 : tunables 120 60 0
: slabdata 0 0 0
crq_pool 0 0 44 84 1 : tunables 120 60 0
: slabdata 0 0 0
deadline_drq 0 0 48 78 1 : tunables 120 60 0
: slabdata 0 0 0
as_arq 0 0 60 63 1 : tunables 120 60 0
: slabdata 0 0 0
relayfs_inode_cache 0 0 320 12 1 : tunables 54 27
0 : slabdata 0 0 0
nfs_direct_cache 0 0 40 92 1 : tunables 120 60 0
: slabdata 0 0 0
nfs_write_data 45 45 448 9 1 : tunables 54 27 0
: slabdata 5 5 0
nfs_read_data 33 45 448 9 1 : tunables 54 27 0
: slabdata 5 5 0
nfs_inode_cache 204 228 592 6 1 : tunables 54 27 0
: slabdata 38 38 0
nfs_page 59 59 64 59 1 : tunables 120 60 0
: slabdata 1 1 0
hugetlbfs_inode_cache 1 12 316 12 1 : tunables 54 27
0 : slabdata 1 1 0
ext2_inode_cache 0 0 436 9 1 : tunables 54 27 0
: slabdata 0 0 0
ext2_xattr 0 0 44 84 1 : tunables 120 60 0
: slabdata 0 0 0
dnotify_cache 0 0 20 169 1 : tunables 120 60 0
: slabdata 0 0 0
eventpoll_pwq 0 0 36 101 1 : tunables 120 60 0
: slabdata 0 0 0
eventpoll_epi 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
inotify_event_cache 0 0 28 127 1 : tunables 120 60
0 : slabdata 0 0 0
inotify_watch_cache 0 0 36 101 1 : tunables 120 60
0 : slabdata 0 0 0
kioctx 0 0 192 20 1 : tunables 120 60 0
: slabdata 0 0 0
kiocb 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
fasync_cache 0 0 16 203 1 : tunables 120 60 0
: slabdata 0 0 0
shmem_inode_cache 405 405 408 9 1 : tunables 54 27 0
: slabdata 45 45 0
posix_timers_cache 0 0 96 40 1 : tunables 120 60
0 : slabdata 0 0 0
uid_cache 0 59 64 59 1 : tunables 120 60 0
: slabdata 0 1 0
blkdev_ioc 0 0 28 127 1 : tunables 120 60 0
: slabdata 0 0 0
blkdev_queue 24 30 380 10 1 : tunables 54 27 0
: slabdata 3 3 0
blkdev_requests 0 0 152 26 1 : tunables 120 60 0
: slabdata 0 0 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0
: slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0
: slabdata 52 52 0
biovec-64 256 260 768 5 1 : tunables 54 27 0
: slabdata 52 52 0
biovec-16 256 260 192 20 1 : tunables 120 60 0
: slabdata 13 13 0
biovec-4 256 295 64 59 1 : tunables 120 60 0
: slabdata 5 5 0
biovec-1 256 406 16 203 1 : tunables 120 60 0
: slabdata 2 2 0
bio 256 295 64 59 1 : tunables 120 60 0
: slabdata 5 5 0
file_lock_cache 0 44 88 44 1 : tunables 120 60 0
: slabdata 0 1 0
sock_inode_cache 22 40 384 10 1 : tunables 54 27 0
: slabdata 4 4 0
skbuff_fclone_cache 12 12 320 12 1 : tunables 54 27
0 : slabdata 1 1 0
skbuff_head_cache 266 300 192 20 1 : tunables 120 60 0
: slabdata 15 15 0
proc_inode_cache 140 144 332 12 1 : tunables 54 27 0
: slabdata 12 12 0
sigqueue 1 26 148 26 1 : tunables 120 60 0
: slabdata 1 1 0
radix_tree_node 330 476 276 14 1 : tunables 54 27 0
: slabdata 34 34 0
bdev_cache 2 9 448 9 1 : tunables 54 27 0
: slabdata 1 1 0
sysfs_dir_cache 1356 1380 40 92 1 : tunables 120 60 0
: slabdata 15 15 0
mnt_cache 20 30 128 30 1 : tunables 120 60 0
: slabdata 1 1 0
inode_cache 258 288 316 12 1 : tunables 54 27 0
: slabdata 24 24 0
dentry_cache 4071717 4072354 136 29 1 : tunables 120 60
0 : slabdata 140424 140426 0
filp 1300260 1300500 192 20 1 : tunables 120 60
0 : slabdata 65020 65025 0
names_cache 4 4 4096 1 1 : tunables 24 12 0
: slabdata 4 4 0
idr_layer_cache 69 87 136 29 1 : tunables 120 60 0
: slabdata 3 3 0
buffer_head 0 0 48 78 1 : tunables 120 60 0
: slabdata 0 0 0
mm_struct 35 35 576 7 1 : tunables 54 27 0
: slabdata 5 5 0
vm_area_struct 536 880 88 44 1 : tunables 120 60 0
: slabdata 20 20 0
fs_cache 40 113 32 113 1 : tunables 120 60 0
: slabdata 1 1 0
files_cache 41 54 448 9 1 : tunables 54 27 0
: slabdata 6 6 0
signal_cache 50 50 384 10 1 : tunables 54 27 0
: slabdata 5 5 0
sighand_cache 42 42 1344 3 1 : tunables 24 12 0
: slabdata 14 14 0
task_struct 42 42 1264 3 1 : tunables 24 12 0
: slabdata 14 14 0
anon_vma 278 678 8 339 1 : tunables 120 60 0
: slabdata 2 2 0
pgd 28 28 4096 1 1 : tunables 24 12 0
: slabdata 28 28 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0
: slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0
: slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0
: slabdata 0 0 0
size-65536 0 0 65536 1 16 : tunables 8 4 0
: slabdata 0 0 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0
: slabdata 0 0 0
size-32768 0 0 32768 1 8 : tunables 8 4 0
: slabdata 0 0 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0
: slabdata 0 0 0
size-16384 0 0 16384 1 4 : tunables 8 4 0
: slabdata 0 0 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0
: slabdata 0 0 0
size-8192 38 38 8192 1 2 : tunables 8 4 0
: slabdata 38 38 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0
: slabdata 0 0 0
size-4096 314 314 4096 1 1 : tunables 24 12 0
: slabdata 314 314 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0
: slabdata 0 0 0
size-2048 24 24 2048 2 1 : tunables 24 12 0
: slabdata 12 12 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0
: slabdata 0 0 0
size-1024 84 84 1024 4 1 : tunables 54 27 0
: slabdata 21 21 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0
: slabdata 0 0 0
size-512 130 152 512 8 1 : tunables 54 27 0
: slabdata 19 19 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0
: slabdata 0 0 0
size-256 75 75 256 15 1 : tunables 120 60 0
: slabdata 5 5 0
size-192(DMA) 0 0 192 20 1 : tunables 120 60 0
: slabdata 0 0 0
size-192 40 40 192 20 1 : tunables 120 60 0
: slabdata 2 2 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 0
: slabdata 0 0 0
size-128 964 990 128 30 1 : tunables 120 60 0
: slabdata 33 33 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 0
: slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60 0
: slabdata 0 0 0
size-64 503 767 64 59 1 : tunables 120 60 0
: slabdata 13 13 0
size-32 1161 1808 32 113 1 : tunables 120 60 0
: slabdata 16 16 0
kmem_cache 120 120 128 30 1 : tunables 120 60 0
: slabdata 4 4 0

2005-09-22 04:17:42

by Al Viro

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Wed, Sep 21, 2005 at 09:54:14PM -0600, Christopher Friesen wrote:
> Al Viro wrote:
>
> >>Hmm... could there be a race in shmem_rename()??
>
> >Not likely - in that setup all calls of ->unlink() and ->rename()
> >are completely serialized by ->i_sem on parent. One question:
> >is it dcache or icache that ends up leaking?
>
> dcache. Here's some information I sent to dipankar earlier, with his
> debug patch applied. This is within half a second of the oom killer
> kicking in.

Umm... How many RCU callbacks are pending? Since past the OOM you get
the sucker back to normal... Sounds like you've got a bunch of dentries
on their way to be freed, but the thing that should've been doing final
kmem_cache_free() is getting postponed too much.

2005-09-22 14:47:51

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Al Viro wrote:

> Umm... How many RCU callbacks are pending?

How do I check that?

Chris

2005-09-22 15:17:12

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Al Viro wrote:

> Umm... How many RCU callbacks are pending?

I added an atomic counter that is incremented just before call_rcu() in
d_free(), and decremented just after kmem_cache_free() in d_callback().

According to this we had 4127306 pending rcu callbacks. A few seconds
later it was down to 0.

Full output is below.

Chris




/proc/sys/fs/dentry-state:
1611 838 45 0 0 0

/proc/meminfo:
MemTotal: 3366368 kB
MemFree: 2507296 kB
Buffers: 0 kB
Cached: 7932 kB
SwapCached: 0 kB
Active: 8832 kB
Inactive: 2636 kB
HighTotal: 2489836 kB
HighFree: 2478016 kB
LowTotal: 876532 kB
LowFree: 29280 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 6116 kB
Slab: 840468 kB
CommitLimit: 1683184 kB
Committed_AS: 18096 kB
PageTables: 324 kB
VmallocTotal: 114680 kB
VmallocUsed: 420 kB
VmallocChunk: 114036 kB
pending dentry rcu callbacks: 4127306
pages_with_[ 0]_dentries: 0
pages_with_[ 1]_dentries: 6
pages_with_[ 2]_dentries: 2
pages_with_[ 3]_dentries: 2
pages_with_[ 4]_dentries: 0
pages_with_[ 5]_dentries: 2
pages_with_[ 6]_dentries: 3
pages_with_[ 7]_dentries: 3
pages_with_[ 8]_dentries: 4
pages_with_[ 9]_dentries: 1
pages_with_[10]_dentries: 3
pages_with_[11]_dentries: 3
pages_with_[12]_dentries: 2
pages_with_[13]_dentries: 2
pages_with_[14]_dentries: 1
pages_with_[15]_dentries: 0
pages_with_[16]_dentries: 1
pages_with_[17]_dentries: 1
pages_with_[18]_dentries: 0
pages_with_[19]_dentries: 1
pages_with_[20]_dentries: 2
pages_with_[21]_dentries: 5
pages_with_[22]_dentries: 3
pages_with_[23]_dentries: 1
pages_with_[24]_dentries: 1
pages_with_[25]_dentries: 1
pages_with_[26]_dentries: 2
pages_with_[27]_dentries: 0
pages_with_[28]_dentries: 0
pages_with_[29]_dentries: 142355
dcache_pages total: 142407
prune_dcache: requested 1 freed 1
dcache lru list data:
dentries total: 839
dentries in_use: 43
dentries free: 796
dentries referenced: 839
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB

slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <a
ctive_slabs> <num_slabs> <sharedavail>
ip_fib_alias 11 113 32 113 1 : tunables 120 60
0 : slabdata 1 1 0
ip_fib_hash 11 113 32 113 1 : tunables 120 60
0 : slabdata 1 1 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12
0 : slabdata 4 4 0
rpc_tasks 20 20 192 20 1 : tunables 120 60
0 : slabdata 1 1 0
rpc_inode_cache 8 9 448 9 1 : tunables 54 27
0 : slabdata 1 1 0
xfrm6_tunnel_spi 0 0 64 59 1 : tunables 120 60
0 : slabdata 0 0 0
fib6_nodes 5 113 32 113 1 : tunables 120 60
0 : slabdata 1 1 0
ip6_dst_cache 4 15 256 15 1 : tunables 120 60
0 : slabdata 1 1 0
ndisc_cache 1 20 192 20 1 : tunables 120 60
0 : slabdata 1 1 0
RAWv6 3 6 640 6 1 : tunables 54 27
0 : slabdata 1 1 0
UDPv6 0 0 576 7 1 : tunables 54 27
0 : slabdata 0 0 0
tw_sock_TCPv6 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
request_sock_TCPv6 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
TCPv6 4 7 1088 7 2 : tunables 24 12
0 : slabdata 1 1 0
UNIX 4 10 384 10 1 : tunables 54 27
0 : slabdata 1 1 0
ip_mrt_cache 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
tcp_bind_bucket 4 203 16 203 1 : tunables 120 60
0 : slabdata 1 1 0
inet_peer_cache 2 59 64 59 1 : tunables 120 60
0 : slabdata 1 1 0
secpath_cache 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
xfrm_dst_cache 0 0 320 12 1 : tunables 54 27
0 : slabdata 0 0 0
ip_dst_cache 7 15 256 15 1 : tunables 120 60
0 : slabdata 1 1 0
arp_cache 2 30 128 30 1 : tunables 120 60
0 : slabdata 1 1 0
RAW 2 9 448 9 1 : tunables 54 27
0 : slabdata 1 1 0
UDP 2 7 512 7 1 : tunables 54 27
0 : slabdata 1 1 0
tw_sock_TCP 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
request_sock_TCP 0 0 64 59 1 : tunables 120 60
0 : slabdata 0 0 0
TCP 3 4 960 4 1 : tunables 54 27
0 : slabdata 1 1 0
flow_cache 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
cfq_ioc_pool 0 0 48 78 1 : tunables 120 60
0 : slabdata 0 0 0
cfq_pool 0 0 96 40 1 : tunables 120 60
0 : slabdata 0 0 0
crq_pool 0 0 44 84 1 : tunables 120 60
0 : slabdata 0 0 0
deadline_drq 0 0 48 78 1 : tunables 120 60
0 : slabdata 0 0 0
as_arq 0 0 60 63 1 : tunables 120 60
0 : slabdata 0 0 0
relayfs_inode_cache 0 0 320 12 1 : tunables 54 27
0 : slabdata 0 0 0
nfs_direct_cache 0 0 40 92 1 : tunables 120 60
0 : slabdata 0 0 0
nfs_write_data 45 45 448 9 1 : tunables 54 27
0 : slabdata 5 5 0
nfs_read_data 32 36 448 9 1 : tunables 54 27
0 : slabdata 4 4 0
nfs_inode_cache 384 384 592 6 1 : tunables 54 27
0 : slabdata 64 64 0
nfs_page 59 59 64 59 1 : tunables 120 60
0 : slabdata 1 1 0
hugetlbfs_inode_cache 1 12 316 12 1 : tunables 54
27 0 : slabdata 1 1 0
ext2_inode_cache 0 0 436 9 1 : tunables 54 27
0 : slabdata 0 0 0
ext2_xattr 0 0 44 84 1 : tunables 120 60
0 : slabdata 0 0 0
dnotify_cache 0 0 20 169 1 : tunables 120 60
0 : slabdata 0 0 0
eventpoll_pwq 0 0 36 101 1 : tunables 120 60
0 : slabdata 0 0 0
eventpoll_epi 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
inotify_event_cache 0 0 28 127 1 : tunables 120 60
0 : slabdata 0 0 0
inotify_watch_cache 0 0 36 101 1 : tunables 120 60
0 : slabdata 0 0 0
kioctx 0 0 192 20 1 : tunables 120 60
0 : slabdata 0 0 0
kiocb 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
fasync_cache 0 0 16 203 1 : tunables 120 60
0 : slabdata 0 0 0
shmem_inode_cache 396 396 408 9 1 : tunables 54 27
0 : slabdata 44 44 0
posix_timers_cache 0 0 96 40 1 : tunables 120 60
0 : slabdata 0 0 0
uid_cache 1 59 64 59 1 : tunables 120 60
0 : slabdata 1 1 0
blkdev_ioc 0 0 28 127 1 : tunables 120 60
0 : slabdata 0 0 0
blkdev_queue 24 30 380 10 1 : tunables 54 27
0 : slabdata 3 3 0
blkdev_requests 0 0 152 26 1 : tunables 120 60
0 : slabdata 0 0 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12
0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12
0 : slabdata 52 52 0
biovec-64 256 260 768 5 1 : tunables 54 27
0 : slabdata 52 52 0
biovec-16 256 260 192 20 1 : tunables 120 60
0 : slabdata 13 13 0
biovec-4 256 295 64 59 1 : tunables 120 60
0 : slabdata 5 5 0
biovec-1 256 406 16 203 1 : tunables 120 60
0 : slabdata 2 2 0
bio 256 295 64 59 1 : tunables 120 60
0 : slabdata 5 5 0
file_lock_cache 0 0 88 44 1 : tunables 120 60
0 : slabdata 0 0 0
sock_inode_cache 24 40 384 10 1 : tunables 54 27
0 : slabdata 4 4 0
skbuff_fclone_cache 12 12 320 12 1 : tunables 54 27
0 : slabdata 1 1 0
skbuff_head_cache 288 320 192 20 1 : tunables 120 60
0 : slabdata 16 16 0
proc_inode_cache 68 72 332 12 1 : tunables 54 27
0 : slabdata 6 6 0
sigqueue 16 26 148 26 1 : tunables 120 60
0 : slabdata 1 1 0
radix_tree_node 420 420 276 14 1 : tunables 54 27
0 : slabdata 30 30 0
bdev_cache 2 9 448 9 1 : tunables 54 27
0 : slabdata 1 1 0
sysfs_dir_cache 1356 1380 40 92 1 : tunables 120 60
0 : slabdata 15 15 0
mnt_cache 20 30 128 30 1 : tunables 120 60
0 : slabdata 1 1 0
inode_cache 532 540 316 12 1 : tunables 54 27
0 : slabdata 45 45 0
dentry_cache 4118471 4119305 136 29 1 : tunables 120 60
0 : slabdata 142042 142045 0
filp 1311720 1312200 192 20 1 : tunables 120 60
0 : slabdata 65605 65610 0
names_cache 3 3 4096 1 1 : tunables 24 12
0 : slabdata 3 3 0
idr_layer_cache 69 87 136 29 1 : tunables 120 60
0 : slabdata 3 3 0
buffer_head 0 0 48 78 1 : tunables 120 60
0 : slabdata 0 0 0
mm_struct 35 35 576 7 1 : tunables 54 27
0 : slabdata 5 5 0
vm_area_struct 528 528 88 44 1 : tunables 120 60
0 : slabdata 12 12 0
fs_cache 39 113 32 113 1 : tunables 120 60
0 : slabdata 1 1 0
files_cache 40 45 448 9 1 : tunables 54 27
0 : slabdata 5 5 0
signal_cache 50 50 384 10 1 : tunables 54 27
0 : slabdata 5 5 0
sighand_cache 39 39 1344 3 1 : tunables 24 12
0 : slabdata 13 13 0
task_struct 36 36 1264 3 1 : tunables 24 12
0 : slabdata 12 12 0
anon_vma 252 678 8 339 1 : tunables 120 60
0 : slabdata 2 2 0
pgd 27 27 4096 1 1 : tunables 24 12
0 : slabdata 27 27 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4
0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4
0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4
0 : slabdata 0 0 0
size-65536 0 0 65536 1 16 : tunables 8 4
0 : slabdata 0 0 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4
0 : slabdata 0 0 0
size-32768 0 0 32768 1 8 : tunables 8 4
0 : slabdata 0 0 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4
0 : slabdata 0 0 0
size-16384 0 0 16384 1 4 : tunables 8 4
0 : slabdata 0 0 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4
0 : slabdata 0 0 0
size-8192 37 37 8192 1 2 : tunables 8 4
0 : slabdata 37 37 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12
0 : slabdata 0 0 0
size-4096 369 369 4096 1 1 : tunables 24 12
0 : slabdata 369 369 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12
0 : slabdata 0 0 0
size-2048 24 24 2048 2 1 : tunables 24 12
0 : slabdata 12 12 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27
0 : slabdata 0 0 0
size-1024 108 108 1024 4 1 : tunables 54 27
0 : slabdata 27 27 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27
0 : slabdata 0 0 0
size-512 126 152 512 8 1 : tunables 54 27
0 : slabdata 19 19 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60
0 : slabdata 0 0 0
size-256 75 75 256 15 1 : tunables 120 60
0 : slabdata 5 5 0
size-192(DMA) 0 0 192 20 1 : tunables 120 60
0 : slabdata 0 0 0
size-192 40 40 192 20 1 : tunables 120 60
0 : slabdata 2 2 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60
0 : slabdata 0 0 0
size-128 963 990 128 30 1 : tunables 120 60
0 : slabdata 33 33 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60
0 : slabdata 0 0 0
size-32(DMA) 0 0 32 113 1 : tunables 120 60
0 : slabdata 0 0 0
size-64 487 944 64 59 1 : tunables 120 60
0 : slabdata 16 16 0
size-32 1169 1356 32 113 1 : tunables 120 60
0 : slabdata 12 12 0
kmem_cache 120 120 128 30 1 : tunables 120 60
0 : slabdata 4 4 0

2005-09-22 18:12:42

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- new data point


I don't know if this will help, but I re-ran the test with our modified
2.6.10 on a dual-xeon, and the dentry cache stays around 600-700KB the
whole time.

Chris

2005-09-22 18:33:19

by Dipankar Sarma

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Thu, Sep 22, 2005 at 09:16:58AM -0600, Christopher Friesen wrote:
> Al Viro wrote:
>
> >Umm... How many RCU callbacks are pending?
>
> According to this we had 4127306 pending rcu callbacks. A few seconds
> later it was down to 0.
>
> Full output is below.
>
> /proc/sys/fs/dentry-state:
> 1611 838 45 0 0 0
>
> pages_with_[29]_dentries: 142355
> dcache_pages total: 142407
> prune_dcache: requested 1 freed 1
> dcache lru list data:
> dentries total: 839
> dentries in_use: 43
> dentries free: 796
> dentries referenced: 839

These two should make it clear that there were only 1611 dentries
in the dcache and 839 in LRU list. Not too many pinned dentries.
So, all the dentries you saw in the slab were dentries out
of dcache but waiting for corresponding RCU grace periods
to happen.

This can happen if a task runs for too long inside the kernel
holding up context switches or usermode code running on that
cpu. The fact that RCU grace period eventually happens
and the dentries are freed means that something intermittently
holds up RCU. Is this 2.6.10 vanilla or does it have other
patches in there ?

Thanks
Dipankar

2005-09-22 19:03:34

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Dipankar Sarma wrote:

> This can happen if a task runs for too long inside the kernel
> holding up context switches or usermode code running on that
> cpu. The fact that RCU grace period eventually happens
> and the dentries are freed means that something intermittently
> holds up RCU. Is this 2.6.10 vanilla or does it have other
> patches in there ?

The 2.6.10 was modified. All the results with the dcache debugging
patch applied were from vanilla 2.6.14-rc2.

It's perfectly repeatable as well...every single time I run "rename14"
the OOM killer kicks in.

Chris

2005-09-22 19:23:59

by Dipankar Sarma

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Thu, Sep 22, 2005 at 01:03:17PM -0600, Christopher Friesen wrote:
> Dipankar Sarma wrote:
>
> >This can happen if a task runs for too long inside the kernel
> >holding up context switches or usermode code running on that
> >cpu. The fact that RCU grace period eventually happens
> >and the dentries are freed means that something intermittently
> >holds up RCU. Is this 2.6.10 vanilla or does it have other
> >patches in there ?
>
> The 2.6.10 was modified. All the results with the dcache debugging
> patch applied were from vanilla 2.6.14-rc2.
>
> It's perfectly repeatable as well...every single time I run "rename14"
> the OOM killer kicks in.

Can you look at that each cpu is running (backtrace) using
sysrq ? That may tell us what is holding up RCU. I will look
at it myself later.

Thanks
Dipankar

2005-09-22 21:37:33

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Dipankar Sarma wrote:

> Can you look at that each cpu is running (backtrace) using
> sysrq ? That may tell us what is holding up RCU. I will look
> at it myself later.

I'm having some trouble with sysrq over serial console. I can trigger
it, and it dumps the words "SysRq : Show Regs" to the console, but the
actual data only goes to dmesg.

That should work for this, but does anyone have any idea whats going on
there?

Chris

2005-09-22 21:49:31

by Sonny Rao

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Thu, Sep 22, 2005 at 03:37:04PM -0600, Christopher Friesen wrote:
> Dipankar Sarma wrote:
>
> >Can you look at that each cpu is running (backtrace) using
> >sysrq ? That may tell us what is holding up RCU. I will look
> >at it myself later.
>
> I'm having some trouble with sysrq over serial console. I can trigger
> it, and it dumps the words "SysRq : Show Regs" to the console, but the
> actual data only goes to dmesg.
>
> That should work for this, but does anyone have any idea whats going on
> there?

I think your loglevel is too low, set it to seven (using sysrq if
necessary) and try again.


2005-09-22 21:55:56

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Sonny Rao wrote:

> I think your loglevel is too low, set it to seven (using sysrq if
> necessary) and try again.

I thought the following __handle_sysrq() code would take care of that:

spin_lock_irqsave(&sysrq_key_table_lock, flags);
orig_log_level = console_loglevel;
console_loglevel = 7;
printk(KERN_INFO "SysRq : ");

Chris

2005-09-22 22:04:07

by Randy Dunlap

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Thu, 22 Sep 2005, Christopher Friesen wrote:

> Sonny Rao wrote:
>
> > I think your loglevel is too low, set it to seven (using sysrq if
> > necessary) and try again.
>
> I thought the following __handle_sysrq() code would take care of that:
>
> spin_lock_irqsave(&sysrq_key_table_lock, flags);
> orig_log_level = console_loglevel;
> console_loglevel = 7;
> printk(KERN_INFO "SysRq : ");

Some code below there is suspicious. Maybe you could modify
some of it...

(cut-and-paste, may have whitespace damage:)

if (!check_mask || sysrq_enabled == 1 ||
(sysrq_enabled & op_p->enable_mask)) {
printk ("%s\n", op_p->action_msg);
console_loglevel = orig_log_level;
op_p->handler(key, pt_regs, tty);
}

swap those last 2 lines and see what happens...

--
~Randy

2005-09-22 22:05:45

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

SysRq : Show State

sibling
task PC pid father child younger older
init S 00000000 0 1 0 2 (NOTLB)
c353fea0 c34f4a10 c03b7b80 00000000 00000000 00000000 00000001 00000000
c34f4a10 f7ef5560 00002bfa a0dbb75a 00000187 c34f4a10 c34f4b38 0014c7ea
c353feb4 0000000b c353fedc c02fb023 c353feb4 0014c7ea f7f4caec c03bd9a0
Call Trace:
[<c02fb023>] schedule_timeout+0x53/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
ksoftirqd/0 R running 0 2 1 3 (L-TLB)
watchdog/0 S 0000014E 0 3 1 4 2 (L-TLB)
c3543f44 c34f4030 c03b7b80 0000014e 00000000 7c56f42d 0000014e 00000417
00000000 f7ef5560 000002de 8bcab025 00000187 c34f4030 c34f4158 0014b6ea
c3543f58 00000000 c3543f80 c02fb023 c3543f58 0014b6ea c34f4030 c03bdb18
Call Trace:
[<c02fb023>] schedule_timeout+0x53/0xb0
[<c02fb09a>] schedule_timeout_interruptible+0x1a/0x20
[<c011e794>] msleep_interruptible+0x34/0x40
[<c0130b81>] watchdog+0x61/0x90
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
events/0 S 00000246 0 4 1 5 3 (L-TLB)
c3549f38 c34fba50 c03b7b80 00000246 f7d5800c 00000286 00000000 00000092
f7d580d8 f7ef5560 000004f8 995007cf 00000187 c34fba50 c34fbb78 00000286
f7d58000 c34f2848 c3549fb4 c012486c 00000000 c3549f68 00000000 c34f2858
Call Trace:
[<c012486c>] worker_thread+0x1ec/0x220
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
khelper S F74F9D38 0 5 1 6 4 (L-TLB)
c354bf38 c34fb560 c03b7b80 f74f9d38 c354bf20 00000282 00000000 00000092
f74f9d38 f751da90 000000e9 5a1a74e3 00000008 c34fb560 c34fb688 00000292
f74f9d74 c34f27c8 c354bfb4 c012486c 00000000 c354bf68 00000000 c34f27d8
Call Trace:
[<c012486c>] worker_thread+0x1ec/0x220
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
kthread S C3573F1C 0 6 1 8 91 5 (L-TLB)
c3573f38 c34fb070 c03b7b80 c3573f1c 00000282 c353fdd4 00000000 00000092
c353fd7c c34f4a10 0000004f d0f13e25 00000007 c34fb070 c34fb198 00000286
c353fdb8 c3505bc8 c3573fb4 c012486c 00000000 c3573f68 00000000 c3505bd8
Call Trace:
[<c012486c>] worker_thread+0x1ec/0x220
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
kblockd/0 S 2E8A98C2 0 8 6 89 (L-TLB)
c3575f38 c3509a90 c03b7b80 2e8a98c2 c3575f1c c0111c5c 00010000 c3574000
c3509a90 c34f4a10 000002fe 8ecc70a0 00000006 c3509a90 c3509bb8 c3574000
00000000 fffffffb c3575fb4 c012486c 00000011 c3575f68 00000000 c35058d8
Call Trace:
[<c012486c>] worker_thread+0x1ec/0x220
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
pdflush S C373DF50 0 89 6 90 8 (L-TLB)
c373df70 c35095a0 c03b7b80 c373df50 00000001 c373df5c 00000086 c34fb070
b3494352 c35090b0 00000082 a7acf1c2 00000006 c35095a0 c35096c8 c373dfa0
c373df94 00000000 c373df84 c0138a60 c373c000 c373c000 c353ff58 c373dfb4
Call Trace:
[<c0138a60>] __pdflush+0x80/0x170
[<c0138b78>] pdflush+0x28/0x30
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
pdflush S 00000000 0 90 6 92 89 (L-TLB)
c375df70 c35090b0 c03b7b80 00000000 00000000 00000000 00000000 00000000
00000005 f7fc1a90 000008b9 90cbc789 00000187 c35090b0 c35091d8 c375dfa0
c375df94 00000000 c375df84 c0138a60 00000000 c375c000 c353ff58 c375dfb4
Call Trace:
[<c0138a60>] __pdflush+0x80/0x170
[<c0138b78>] pdflush+0x28/0x30
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
aio/0 S 00000006 0 92 6 208 90 (L-TLB)
c379ff38 c34fb560 c03b7bb0 00000006 c379ff1c a948ef6f 00000006 00000195
00000000 c34fb560 00000072 a948f32f 00000006 c359b5e0 c359b708 c379e000
00000000 fffffffb c379ffb4 c012486c 00000011 c379ff68 00000000 c35979d8
Call Trace:
[<c012486c>] worker_thread+0x1ec/0x220
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
kswapd0 S 00000020 0 91 1 427 6 (L-TLB)
c375ff80 c359bad0 c03b7b80 00000020 00000000 00000000 0000052c 00000020
0000000c f7f10ad0 00000065 dde71359 00000133 c359bad0 c359bbf8 00000000
c033c014 00000000 c375ffe4 c013dffa c033bc40 00000000 00000000 00000000
Call Trace:
[<c013dffa>] kswapd+0xda/0xf0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
rpciod/0 S C02E41C5 0 208 6 92 (L-TLB)
f7f71f38 c359b0f0 c03b7b80 c02e41c5 f7dd8ac4 f7dd8b80 00000000 00000092
f7dd8b44 f7f71f38 00000105 ea3d42af 00000182 c359b0f0 c359b218 00000292
f7dd8ac4 f7f1c748 f7f71fb4 c012486c 00000000 f7f71f68 00000000 f7f1c758
Call Trace:
[<c012486c>] worker_thread+0x1ec/0x220
[<c012867a>] kthread+0x9a/0xe0
[<c0100fc9>] kernel_thread_helper+0x5/0xc
syslogd S 00000000 0 427 1 431 91 (NOTLB)
f7f09ea0 f7f5a520 c03b7b80 00000000 f7f5a520 00000010 c033bfc4 00000000
f7f09e90 00000000 00000489 194edd72 00000181 f7f5a520 f7f5a648 00000000
7fffffff 00000001 f7f09edc c02fb06f f7f09ec4 c0243292 f7fd7840 c37979d8
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
klogd R running 0 431 1 502 427 (NOTLB)
sshd S F7412030 0 502 1 823 518 431 (NOTLB)
f7f53ea0 f7412030 c03b7b80 f7412030 00000010 c033bfc4 00000000 00000202
00000000 f7fae5c0 000003d5 3a05cff4 00000136 f7412030 f7412158 00000000
7fffffff 00000004 f7f53edc c02fb06f c0263850 f7fae5c0 c37970d8 f7f53f38
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
xinetd S 00000000 0 518 1 527 502 (NOTLB)
f7e2bea0 f7f105e0 c03b7b80 00000000 00000000 00000000 00000001 00000000
f7f105e0 00000010 00001511 38ccc693 00000008 f7f105e0 f7f10708 00000000
7fffffff 00000007 f7e2bedc c02fb06f c0263850 f7e55a40 f7e58c98 f7e2bf38
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
bash S F7E57E54 0 527 1 528 518 (NOTLB)
f7e57e74 f7412a10 c03b7b80 f7e57e54 c01f9ab0 c03c91e0 00000001 00000007
c03c91e0 f74cd803 000003ad 79dbde27 00000181 f7412a10 f7412b38 f7d58000
7fffffff 00000000 f7e57eb0 c02fb06f f7d58000 00000003 c03c91e0 f74cd803
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
mingetty S C341B000 0 528 1 529 527 (NOTLB)
f7f67e74 f7412520 c03b7b80 c341b000 c341b220 0000000b 00000001 00000000
00000000 c341b000 00000ac1 41ee1e5e 00000008 f7412520 f7412648 c37d1000
7fffffff f7f67f00 f7f67eb0 c02fb06f f7f67e98 c01f2b3f c37d1000 c379200b
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
mingetty S 00000008 0 529 1 530 528 (NOTLB)
f7f4be74 c34f4a10 c03b8028 00000008 00000020 44d95ace 00000008 000c1ebb
00000000 c34f4a10 00000cee 44f60f61 00000008 f7feaa50 f7feab78 f7ef4000
7fffffff f7f4bf00 f7f4beb0 c02fb06f f7f4be98 c01f2b3f f7ef4000 f7ef8c0b
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
mingetty S 00000008 0 530 1 531 529 (NOTLB)
f7e69e74 f7f5a030 c03b8028 00000008 00000020 4cc7c3e5 00000008 0014f81b
00000000 f7f5a030 00002aca 4cf96dd7 00000008 f7fea560 f7fea688 f7dc9000
7fffffff f7e69f00 f7e69eb0 c02fb06f f7e69e98 c01f2b3f f7dc9000 f7eeb80b
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
mingetty S F7FD9000 0 531 1 532 530 (NOTLB)
f7f15e74 f7fea070 c03b7b80 f7fd9000 00000020 f7f15f28 f7f15f28 fffffffe
00000000 f7f5a030 00001c1a 51f4ffc2 00000008 f7fea070 f7fea198 f7ec6000
7fffffff f7f15f00 f7f15eb0 c02fb06f f7f15e98 c01f2b3f f7ec6000 f747f40b
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
mingetty S F7418800 0 532 1 548 531 (NOTLB)
f7e8be74 f7f5a030 c03b7b80 f7418800 00000020 f7e8bf28 f7e8bf28 fffffffe
00000000 c015b71a 0000318d 567baabc 00000008 f7f5a030 f7f5a158 f7581000
7fffffff f7e8bf00 f7e8beb0 c02fb06f f7e8be98 c01f2b3f f7581000 f754f00b
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
udevd S 00000000 0 548 1 573 532 (NOTLB)
f7467ea0 f7ef5070 c03b7b80 00000000 00000000 00000000 00000001 00000000
f7ef5070 00000010 00001e03 b3cffce1 0000000a f7ef5070 f7ef5198 00000000
7fffffff 00000006 f7467edc c02fb06f f7467ec4 c0243292 f7e9eb40 f7fee858
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
mingetty S 00000008 0 573 1 548 (NOTLB)
f74f9e74 f7553520 c03b7bb0 00000008 f74f9eac 5a25a9be 00000008 000024bd
00000000 f7553520 00000c4b 5a25a9be 00000008 f748e560 f748e688 f7498000
7fffffff f74f9f00 f74f9eb0 c02fb06f c37b2380 f74f8000 00000000 00000000
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c01e7ad2>] read_chan+0x4e2/0x640
[<c01e287c>] tty_read+0xcc/0xe0
[<c014d088>] vfs_read+0x198/0x1a0
[<c014d38b>] sys_read+0x4b/0x80
[<c0102b5f>] sysenter_past_esp+0x54/0x75
sshd S 00000187 0 823 502 825 832 (NOTLB)
f7fbfea0 f7f5aa10 c03b8028 00000187 c01e1853 81a0af3c 00000187 0001031e
00000000 f7f5aa10 00006b6c 81a314f0 00000187 f748ea50 f748eb78 00000000
7fffffff 00000008 f7fbfedc c02fb06f 00000286 00000000 f7d9f000 f7d9f00c
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
bash S C01364EA 0 825 823 828 (NOTLB)
f75bdf08 f748e070 c03b7b80 c01364ea f75bdee4 c013bd35 00000002 f75bdee8
c0125211 ffffffff 0000133a b3d1ff7b 0000012c f748e070 f748e198 fffffe00
f748e070 f748e11c f75bdf80 c0118d2b f7f5aa10 00000000 00000000 bf914df8
Call Trace:
[<c0118d2b>] do_wait+0x2bb/0x390
[<c0118ecc>] sys_wait4+0x3c/0x40
[<c0118ef5>] sys_waitpid+0x25/0x27
[<c0102b5f>] sysenter_past_esp+0x54/0x75
slabtop S F74DDEB8 0 828 825 (NOTLB)
f74ddea0 f7f5aa10 c03b7b80 f74ddeb8 c01e1853 f7f30000 00000187 00000000
00000096 c34f4520 000000ac 81a31a51 00000187 f7f5aa10 f7f5ab38 0014be0e
f74ddeb4 00000001 f74ddedc c02fb023 f74ddeb4 0014be0e f7f30000 c03bdb58
Call Trace:
[<c02fb023>] schedule_timeout+0x53/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
sshd S 00000184 0 832 502 834 823 (NOTLB)
f7517ea0 f7f100f0 c03b7bb0 00000184 c01e1853 a47a174b 00000184 000013ac
00000000 f7f100f0 000008bd a47a45e4 00000184 f7f10ad0 f7f10bf8 00000000
7fffffff 00000008 f7517edc c02fb06f 00000286 00000000 f7d7a000 f7d7a00c
Call Trace:
[<c02fb06f>] schedule_timeout+0x9f/0xb0
[<c015fea6>] do_select+0x256/0x290
[<c01601fc>] sys_select+0x2cc/0x3f0
[<c0102b5f>] sysenter_past_esp+0x54/0x75
bash S C01364EA 0 834 832 846 (NOTLB)
f755bf08 f7f100f0 c03b7b80 c01364ea f755bee4 c013bd35 00000002 f755bee8
c0125211 ffffffff 00001e40 a47e8c3d 00000184 f7f100f0 f7f10218 fffffe00
f7f100f0 f7f1019c f755bf80 c0118d2b f7fc10b0 00000000 00000000 bfccc0f8
Call Trace:
[<c0118d2b>] do_wait+0x2bb/0x390
[<c0118ecc>] sys_wait4+0x3c/0x40
[<c0118ef5>] sys_waitpid+0x25/0x27
[<c0102b5f>] sysenter_past_esp+0x54/0x75
rename14 S E3BD3EF0 0 846 834 847 (NOTLB)
e3bd3f20 f7fc10b0 c03b7b80 e3bd3ef0 e3bd3f9c 00000001 e3bd3f24 c0140f36
c2d32c80 f7fc1a90 00002ff6 a6b76d1a 00000184 f7fc10b0 f7fc11d8 fffffe00
f7fc10b0 f7fc115c e3bd3f98 c0118d2b ffffffff 00000004 f7ef5560 b7f9f181
Call Trace:
[<c0118d2b>] do_wait+0x2bb/0x390
[<c0118ecc>] sys_wait4+0x3c/0x40
[<c0102b5f>] sysenter_past_esp+0x54/0x75
rename14 R running 0 847 846 848 (NOTLB)
rename14 R running 0 848 846 847 (NOTLB)
oom-killer: gfp_mask=0xd0, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1 used:2
cpu 0 cold: low 0, high 2, batch 1 used:0
Normal per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:92
cpu 0 cold: low 0, high 62, batch 31 used:0
HighMem per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:77
cpu 0 cold: low 0, high 62, batch 31 used:28
Free pages: 2487360kB (2480124kB HighMem)
Active:1693 inactive:626 dirty:19 writeback:0 unstable:0 free:621840 slab:215739 mapped:1371 pagetables:65
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:30 all_unreclaimable? yes
lowmem_reserve[]: 0 880 4080
Normal free:3648kB min:3756kB low:4692kB high:5632kB active:0kB inactive:0kB present:901120kB pages_scanned:16 all_unreclaimable? yes
lowmem_reserve[]: 0 0 25600
HighMem free:2480124kB min:512kB low:640kB high:768kB active:6772kB inactive:2504kB present:3276800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3648kB
HighMem: 1*4kB 1*8kB 19*16kB 12*32kB 21*64kB 14*128kB 5*256kB 2*512kB 2*1024kB 1*2048kB 603*4096kB = 2480124kB
Free swap: 0kB
1048576 pages of RAM
819200 pages of HIGHMEM
206973 reserved pages
3265 pages shared
0 pages swap cached
19 pages dirty
0 pages writeback
1371 pages mapped
215561 pages slab
65 pages pagetables
oom-killer: gfp_mask=0xd0, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1 used:2
cpu 0 cold: low 0, high 2, batch 1 used:0
Normal per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:92
cpu 0 cold: low 0, high 62, batch 31 used:0
HighMem per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:175
cpu 0 cold: low 0, high 62, batch 31 used:28
Free pages: 2487732kB (2480496kB HighMem)
Active:1524 inactive:604 dirty:21 writeback:0 unstable:0 free:621933 slab:215754 mapped:1142 pagetables:55
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:52 all_unreclaimable? yes
lowmem_reserve[]: 0 880 4080
Normal free:3648kB min:3756kB low:4692kB high:5632kB active:0kB inactive:0kB present:901120kB pages_scanned:22 all_unreclaimable? yes
lowmem_reserve[]: 0 0 25600
HighMem free:2480496kB min:512kB low:640kB high:768kB active:6096kB inactive:2416kB present:3276800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3648kB
HighMem: 60*4kB 18*8kB 19*16kB 12*32kB 21*64kB 14*128kB 5*256kB 2*512kB 2*1024kB 1*2048kB 603*4096kB = 2480496kB
Free swap: 0kB
1048576 pages of RAM
819200 pages of HIGHMEM
206973 reserved pages
2450 pages shared
0 pages swap cached
21 pages dirty
0 pages writeback
1142 pages mapped
215581 pages slab
55 pages pagetables
oom-killer: gfp_mask=0xd0, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1 used:2
cpu 0 cold: low 0, high 2, batch 1 used:0
Normal per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:92
cpu 0 cold: low 0, high 62, batch 31 used:0
HighMem per-cpu:
cpu 0 hot: low 62, high 186, batch 31 used:162
cpu 0 cold: low 0, high 62, batch 31 used:28
Free pages: 2487980kB (2480744kB HighMem)
Active:1476 inactive:603 dirty:22 writeback:1 unstable:0 free:621995 slab:215760 mapped:1014 pagetables:51
DMA free:3588kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16384kB pages_scanned:68 all_unreclaimable? yes
lowmem_reserve[]: 0 880 4080
Normal free:3648kB min:3756kB low:4692kB high:5632kB active:0kB inactive:0kB present:901120kB pages_scanned:16 all_unreclaimable? yes
lowmem_reserve[]: 0 0 25600
HighMem free:2480744kB min:512kB low:640kB high:768kB active:5904kB inactive:2412kB present:3276800kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3648kB
HighMem: 86*4kB 26*8kB 20*16kB 14*32kB 21*64kB 14*128kB 5*256kB 2*512kB 2*1024kB 1*2048kB 603*4096kB = 2480744kB
Free swap: 0kB
1048576 pages of RAM
819200 pages of HIGHMEM
206973 reserved pages
2116 pages shared
0 pages swap cached
22 pages dirty
1 pages writeback
1014 pages mapped
215581 pages slab
51 pages pagetables


Attachments:
runningprocesses (17.91 kB)

2005-09-26 04:43:58

by Randy Dunlap

[permalink] [raw]
Subject: [PATCH/RFC] sysrq: updating console_loglevel

Chris,
Here's a patch for you to try. It creates a safe method for
sysrq handlers to modify console_loglevel, which I suspect is
why you were not seeing messages on the serial console.

Please report back on your testing.
Thanks.
---

From: Randy Dunlap <[email protected]>

Some sysrq handlers modify the console_loglevel variable,
but when they do so, some of their own messages may be lost
(not logged, esp. on serial console).

This patch introduces a way for console_loglevel to be
modified safely by the sysrq handlers. The new value is not
set until the sysrq handler complets and returns.

Signed-off-by: Randy Dunlap <[email protected]>
---

drivers/char/sysrq.c | 16 +++++++++-------
1 files changed, 9 insertions(+), 7 deletions(-)

diff -Naurp linux-2613-rc2/drivers/char/sysrq.c~fix_loglevel linux-2613-rc2/drivers/char/sysrq.c
--- linux-2613-rc2/drivers/char/sysrq.c~fix_loglevel 2005-07-09 13:55:01.000000000 -0700
+++ linux-2613-rc2/drivers/char/sysrq.c 2005-09-25 21:31:12.000000000 -0700
@@ -41,6 +41,7 @@

/* Whether we react on sysrq keys or just ignore them */
int sysrq_enabled = 1;
+static int next_log_level;

/* Loglevel sysrq handler */
static void sysrq_handle_loglevel(int key, struct pt_regs *pt_regs,
@@ -50,7 +51,7 @@ static void sysrq_handle_loglevel(int ke
i = key - '0';
console_loglevel = 7;
printk("Loglevel set to %d\n", i);
- console_loglevel = i;
+ next_log_level = i;
}
static struct sysrq_key_op sysrq_loglevel_op = {
.handler = sysrq_handle_loglevel,
@@ -217,7 +218,7 @@ static void sysrq_handle_term(int key, s
struct tty_struct *tty)
{
send_sig_all(SIGTERM);
- console_loglevel = 8;
+ next_log_level = 8;
}
static struct sysrq_key_op sysrq_term_op = {
.handler = sysrq_handle_term,
@@ -248,7 +249,7 @@ static void sysrq_handle_kill(int key, s
struct tty_struct *tty)
{
send_sig_all(SIGKILL);
- console_loglevel = 8;
+ next_log_level = 8;
}
static struct sysrq_key_op sysrq_kill_op = {
.handler = sysrq_handle_kill,
@@ -371,12 +372,11 @@ void __sysrq_put_key_op (int key, struct
void __handle_sysrq(int key, struct pt_regs *pt_regs, struct tty_struct *tty, int check_mask)
{
struct sysrq_key_op *op_p;
- int orig_log_level;
int i, j;
unsigned long flags;

spin_lock_irqsave(&sysrq_key_table_lock, flags);
- orig_log_level = console_loglevel;
+ next_log_level = console_loglevel;
console_loglevel = 7;
printk(KERN_INFO "SysRq : ");

@@ -387,8 +387,10 @@ void __handle_sysrq(int key, struct pt_r
if (!check_mask || sysrq_enabled == 1 ||
(sysrq_enabled & op_p->enable_mask)) {
printk ("%s\n", op_p->action_msg);
- console_loglevel = orig_log_level;
+ /* handlers may change console_loglevel,
+ * but now they do this by setting <next_log_level> */
op_p->handler(key, pt_regs, tty);
+ console_loglevel = next_log_level;
}
else
printk("This sysrq operation is disabled.\n");
@@ -402,7 +404,7 @@ void __handle_sysrq(int key, struct pt_r
printk ("%s ", sysrq_key_table[i]->help_msg);
}
printk ("\n");
- console_loglevel = orig_log_level;
+ console_loglevel = next_log_level;
}
spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
}

2005-09-30 22:04:29

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory

Friesen, Christopher [CAR:VC21:EXCH] wrote:

> With a bit of digging the culprit appears to be the dentry_cache. The
> last log I have shows it using 817MB of memory. Right after that the
> oom killer kicked me off the system. When I logged back in, the cache
> usage was back down to normal and everything was fine.

There hasn't been any new suggestions as to the culprit, so I started
experimenting a bit.

It turns out that if I limit the memory on the system to 896MB, the
dcache slab usage peaks at around 600MB and sits there for the duration
of the test.

When I limit the memory to 1024MB, the dcache slab chews up all my zone
normal memory and the oom killer runs.

It almost seems like the dcache responds to memory pressure on the
system as a whole, but not as well to pressure on zone normal. Would
this make sense?

Chris

2005-10-01 23:26:13

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory


Hi Chris,

On Fri, Sep 30, 2005 at 04:03:55PM -0600, Christopher Friesen wrote:
> Friesen, Christopher [CAR:VC21:EXCH] wrote:
>
> >With a bit of digging the culprit appears to be the dentry_cache. The
> >last log I have shows it using 817MB of memory. Right after that the
> >oom killer kicked me off the system. When I logged back in, the cache
> >usage was back down to normal and everything was fine.
>
> There hasn't been any new suggestions as to the culprit, so I started
> experimenting a bit.
>
> It turns out that if I limit the memory on the system to 896MB, the
> dcache slab usage peaks at around 600MB and sits there for the duration
> of the test.
>
> When I limit the memory to 1024MB, the dcache slab chews up all my zone
> normal memory and the oom killer runs.

The different 896 and 1024Mb behaviour looks like a problem. I don't
see how the addition of 128Mb of highmem could interfere here. Probably
because highmem creatures additional pressure on lowmem (certain kernel
allocations can only be satisfied from lowmem, such as dcache).

Seems that your workload makes the allocation paths/OOM killer behave in
a fragile manner.

How can this be reproduced? (point to a URL if you already explained
that in detail).

Another problem is why dentries can't be freed.

> It almost seems like the dcache responds to memory pressure on the
> system as a whole, but not as well to pressure on zone normal. Would
> this make sense?

Dentries can only be freed when their reference count reaches zero.

Someone else on the thread said you had zillions of file descriptors
open?

Need to figure out they can't be freed. The kernel is surely trying it
(also a problem if it is not trying). How does the "slabs_scanned" field
of /proc/vmstats looks like?

Bharata maintains a patch to record additional statistics, haven't
you tried it already? Can find it at

http://marc.theaimsgroup.com/?l=linux-mm&m=112660138015732&w=2

And change:

@@ -427,8 +467,13 @@ static void prune_dcache(int count)
continue;
}
prune_one_dentry(dentry);
+ nr_freed++;
}
spin_unlock(&dcache_lock);
+ spin_lock(&prune_dcache_lock);
+ lru_dentry_stat.dprune_req = nr_requested;
+ lru_dentry_stat.dprune_freed = nr_freed;
+ spin_unlock(&prune_dcache_lock);

to use "+=" in nr_requested/nr_freed assignments instead
of "=".

Stats available in /proc/meminfo.

2005-10-02 06:04:52

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory

Marcelo wrote:

> How can this be reproduced? (point to a URL if you already explained
> that in detail).

I mentioned this earlier in the thread. I'm running 2.6.14-rc4 on a
pentium-M based atca blade with a bit over 3GB of memory. When I run
the "rename14" test from LTP with /tmp mounted on tmpfs, the system runs
out of zone normal memory and the OOM killer starts killing processes.

If I have /tmp mounted nfs, the problem doesn't occur. If I use the
boot args to limit the memory to 896MB the problem doesn't occur. If I
run the testcase on a dual Xeon with multiple gigs of memory, the
problem doesn't occur.

> Someone else on the thread said you had zillions of file descriptors
> open?

This does not appear to be the case. The testcase has two threads.

thread 1 loops doing the following:
fd = creat("./rename14", 0666);
unlink("./rename14");
close(fd);

thread 2 loops doing:
rename("./rename14", "./rename14xyz");

> Need to figure out they can't be freed. The kernel is surely trying it
> (also a problem if it is not trying). How does the "slabs_scanned" field
> of /proc/vmstats looks like?

That's something I haven't checked...will have to get back to you.

> Bharata maintains a patch to record additional statistics, haven't
> you tried it already?

Already tried. You should be able to find results earlier in this thread.

Chris

2005-10-02 19:59:48

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory

On Sun, Oct 02, 2005 at 12:04:39AM -0600, Christopher Friesen wrote:
> Marcelo wrote:
>
> >How can this be reproduced? (point to a URL if you already explained
> >that in detail).
>
> I mentioned this earlier in the thread. I'm running 2.6.14-rc4 on a
> pentium-M based atca blade with a bit over 3GB of memory. When I run
> the "rename14" test from LTP with /tmp mounted on tmpfs, the system runs
> out of zone normal memory and the OOM killer starts killing processes.
>
> If I have /tmp mounted nfs, the problem doesn't occur. If I use the
> boot args to limit the memory to 896MB the problem doesn't occur. If I
> run the testcase on a dual Xeon with multiple gigs of memory, the
> problem doesn't occur.
>
> >Someone else on the thread said you had zillions of file descriptors
> >open?
>
> This does not appear to be the case. The testcase has two threads.
>
> thread 1 loops doing the following:
> fd = creat("./rename14", 0666);
> unlink("./rename14");
> close(fd);
>
> thread 2 loops doing:
> rename("./rename14", "./rename14xyz");
>
> >Need to figure out they can't be freed. The kernel is surely trying it
> >(also a problem if it is not trying). How does the "slabs_scanned" field
> >of /proc/vmstats looks like?
>
> That's something I haven't checked...will have to get back to you.
>
> >Bharata maintains a patch to record additional statistics, haven't
> >you tried it already?
>
> Already tried. You should be able to find results earlier in this thread.

Sorry, the problem is queueing of large amounts of RCU callbacks in a
short period of time, as already discussed.

Since ksoftirq is running (as can be seen from the dumps you sent), I
_guess_ that the grace period has happened already, and ksoftirq is
attempting to execute the RCU callbacks.

An important fact that causes the direct reclaim throttling to fail is
the complete lack of pagecache pages on both DMA and normal zones (zero
on both).

Can you try this hack (might need to include linux/interrupt.h and
linux/rcupdate.h)

--- mm/vmscan.c.orig 2005-10-02 16:10:05.000000000 -0300
+++ mm/vmscan.c 2005-10-02 16:27:25.000000000 -0300
@@ -953,6 +953,12 @@ int try_to_free_pages(struct zone **zone
sc.swap_cluster_max = SWAP_CLUSTER_MAX;
shrink_caches(zones, &sc);
shrink_slab(sc.nr_scanned, gfp_mask, lru_pages);
+ /* throttle RCU deletion of SLAB objects if in trouble */
+ if (priority < DEF_PRIORITY/2) {
+ synchronize_rcu();
+ if (local_softirq_pending())
+ do_softirq();
+ }
if (reclaim_state) {
sc.nr_reclaimed += reclaim_state->reclaimed_slab;
reclaim_state->reclaimed_slab = 0;



2005-10-04 19:49:54

by Dipankar Sarma

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Thu, Sep 22, 2005 at 09:16:58AM -0600, Christopher Friesen wrote:
> Al Viro wrote:
>
> >Umm... How many RCU callbacks are pending?
>
> I added an atomic counter that is incremented just before call_rcu() in
> d_free(), and decremented just after kmem_cache_free() in d_callback().
>
> According to this we had 4127306 pending rcu callbacks. A few seconds
> later it was down to 0.
>
>
> /proc/sys/fs/dentry-state:
> 1611 838 45 0 0 0
>

Hmm.. This clearly indicates that there are very few allocated dentries
and they are just not returned to slab by RCU.

Since then, I have done some testing myself, but I can't reproduce
this problem in two of my systems - x86 and x86_64. I ran rename14
in a loop too, but after exhausting a lot of free memory, dcache
does get shrunk and I don't see dentries stuck in RCU queues at all.
I tried UP kernel too.

So, there must be something else in your system that I
am missing in my setup. Could you please mail me your .config ?

Thanks
Dipankar

2005-10-04 20:03:05

by Chris Friesen

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

Dipankar Sarma wrote:

> Since then, I have done some testing myself, but I can't reproduce
> this problem in two of my systems - x86 and x86_64. I ran rename14
> in a loop too, but after exhausting a lot of free memory, dcache
> does get shrunk and I don't see dentries stuck in RCU queues at all.
> I tried UP kernel too.

I've only managed to reproduce it on a UP system with HIGHMEM.

> So, there must be something else in your system that I
> am missing in my setup. Could you please mail me your .config ?

Sent privately.

Chris

2005-10-05 04:06:37

by Dipankar Sarma

[permalink] [raw]
Subject: Re: dentry_cache using up all my zone normal memory -- also seen on 2.6.14-rc2

On Tue, Oct 04, 2005 at 02:02:39PM -0600, Christopher Friesen wrote:
> Dipankar Sarma wrote:
>
> >Since then, I have done some testing myself, but I can't reproduce
> >this problem in two of my systems - x86 and x86_64. I ran rename14
> >in a loop too, but after exhausting a lot of free memory, dcache
> >does get shrunk and I don't see dentries stuck in RCU queues at all.
> >I tried UP kernel too.
>
> I've only managed to reproduce it on a UP system with HIGHMEM.
>
> >So, there must be something else in your system that I
> >am missing in my setup. Could you please mail me your .config ?
>
> Sent privately.

Chris pointed out privately that the LTP rename14 test always
creates a directory in /tmp and runs from there (hidden
in an ltp library call tst_tmpdir()), not from the current
working directory. So, my tests weren't really running on
tmpfs. So, I commented out the tst_tmpdir() call in the test
forcing it to run on tmpfs and immediately I got the following
errors -

llm22:/test # ./rename14
Bad page state at prep_new_page (in process 'rename14', page ffff810008000000)
flags:0x4000000000000090 mapping:0000000000000000 mapcount:0 count:0
Backtrace:
Trying to fix it up, but a reboot is needed
Bad page state at prep_new_page (in process 'syslogd', page ffff8100080002a0)
flags:0x4000000000000000 mapping:0000000000000000 mapcount:0 count:-26471
Backtrace:
Trying to fix it up, but a reboot is needed

Bad page state at prep_new_page (in process 'rename14', page ffff810008001538)

llm22 kernel: Bflags:0x4000000000000000 mapping:0000005500005555 mapcount:0 count:0
ad page state atBacktrace:
prep_new_page (Trying to fix it up, but a reboot is needed
in process 'renaBad page state at prep_new_page (in process 'rename14', page ffff810008002aa8)
me14', page ffffflags:0x4000000000000090 mapping:0000000000000000 mapcount:0 count:0
810008000000)
Backtrace:
Trying to fix it up, but a reboot is needed

Message from syslogd@llm22 at Tue Oct 4 19:41:42 2005 ...
llm22 kernel: fBad page state at prep_new_page (in process 'rename14', page ffff810008005550)
lags:0x400000000flags:0x4000005500009090 mapping:0000000000000000 mapcount:0 count:0
0000090 mapping:Backtrace:
0000000000000000Trying to fix it up, but a reboot is needed
mapcount:0 count:0

Andi, does this look any familiar ? This is on a 2-CPU x86_64 system
running 2.6.14-rc3 (UP kernel though).

Thanks
Dipankar