2010-11-22 12:47:57

by Christoph Bartoschek

[permalink] [raw]
Subject: ext4_alloc_context occupies 150 GiB of memory and makes the system unusable

Hi,

I have the problem that on one machine lots of memory is allocated for
ext4_alloc_context.

I would like to know for what purpose the memory is allocated and why it is
not given to processes that need memory.

The machine normally only uses a local ext4 for booting. The data it is
working on comes from NFS.

Now there are several normally CPU-bound jobs running but they only get 1-2%
of cputime because they are constantly swapping. They are swapping because of
the 192 GiB the machine has 150 GiB are allocated for ext4_alloc_context.
Here is the output of /dev/meminfo:

MemTotal: 198493288 kB
MemFree: 853372 kB
Buffers: 824 kB
Cached: 26108 kB
SwapCached: 6369336 kB
Active: 37073576 kB
Inactive: 1104932 kB
Active(anon): 37059712 kB
Inactive(anon): 1090980 kB
Active(file): 13864 kB
Inactive(file): 13952 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 209713148 kB
SwapFree: 149362056 kB
Dirty: 16 kB
Writeback: 0 kB
AnonPages: 37642012 kB
Mapped: 13312 kB
Shmem: 0 kB
Slab: 158765316 kB
SReclaimable: 158732380 kB
SUnreclaim: 32936 kB
KernelStack: 2968 kB
PageTables: 202500 kB
NFS_Unstable: 4 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 308959792 kB
Committed_AS: 64376360 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 736572 kB
VmallocChunk: 34358994676 kB



We see that Slab uses most of the memory. And within slab nearly everything is
used for ext4_alloc_context. There is the output of slabtop:

Active / Total Objects (% used) : 364597 / 1070670469 (0.0%)
Active / Total Slabs (% used) : 52397 / 39688960 (0.1%)
Active / Total Caches (% used) : 107 / 193 (55.4%)
Active / Total Size (% used) : 159579.25K / 150697605.41K (0.1%)
Minimum / Average / Maximum Object : 0.02K / 0.14K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
1070187012 0 0% 0.14K 39636556 27 158546224K
ext4_alloc_context



I see no reason why ext4 should use so much memory. What is it used for? And
how can I release it to get it used for my processes. The overall system is
very sluggish now. Here is top info for some computing jobs:

top - 13:06:06 up 10 days, 22:04, 5 users, load average: 9.65, 9.74, 9.80
Tasks: 272 total, 1 running, 271 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.4%us, 0.3%sy, 0.0%ni, 46.5%id, 52.8%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 193841M total, 192945M used, 895M free, 0M buffers
Swap: 204797M total, 61718M used, 143079M free, 163113M cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19459 joachimi 20 0 23.6g 11g 4000 D 0 6.1 417:07.29 bonnRoute
9329 bartosch 20 0 11.7g 9g 3436 D 0 5.3 38:55.70 chipbench
28845 bartosch 20 0 10.9g 5.0g 1028 D 0 2.6 28:27.45 chipbench
6505 bartosch 20 0 10.7g 2.8g 976 D 0 1.5 289:24.73 chipbench
11061 bartosch 20 0 9.8g 1.5g 900 D 1 0.8 146:07.40 chipbench
11010 bartosch 20 0 5638m 1.5g 2800 D 0 0.8 82:48.69 chipbench
10946 bartosch 20 0 5952m 1.3g 936 D 0 0.7 80:57.63 chipbench
10976 bartosch 20 0 5563m 1.3g 936 D 1 0.7 77:53.40 chipbench
11030 bartosch 20 0 9807m 1.2g 4272 D 0 0.6 149:40.97 chipbench
9330 bartosch 20 0 69572 7160 376 S 0 0.0 0:33.06 chipbench
10914 bartosch 20 0 81888 4668 480 S 0 0.0 0:48.84 chipbench
17065 bartosch 20 0 99.0m 3408 488 S 0 0.0 0:41.91 chipbench
11031 bartosch 20 0 75724 2988 496 S 0 0.0 0:53.41 chipbench


iotop shows that the jobs while not creating any normal I/O create lots of
disk reads and spents nearly 100% for swapping:

Total DISK READ: 4.91 M/s | Total DISK WRITE: 0 B/s
PID USER DISK READ DISK WRITE SWAPIN IO> COMMAND
79 root 0 B/s 0 B/s 0.00 % 94.34 % [kswapd0]
10946 bartosch 3.14 M/s 0 B/s 65.42 % 1.54 % chipbench
28845 bartosch 334.16 K/s 0 B/s 99.99 % 0.00 % chipbench
6505 bartosch 194.28 K/s 0 B/s 99.99 % 0.00 % chipbench
10976 bartosch 147.65 K/s 0 B/s 99.99 % 0.00 % chipbench
11010 bartosch 170.97 K/s 0 B/s 95.03 % 0.00 % chipbench
11030 bartosch 85.48 K/s 0 B/s 77.11 % 0.00 % chipbench
11061 bartosch 174.85 K/s 0 B/s 99.00 % 0.00 % chipbench
19459 joachimi 155.42 K/s 0 B/s 83.84 % 0.00 % bonnRoute
9329 bartosch 551.75 K/s 0 B/s 99.99 % 0.00 % chipbench


The problem appeared about after a week of uptime. The system is opensuse
11.3:

Linux euler 2.6.34.7-0.5-desktop #1 SMP PREEMPT 2010-10-25 08:40:12 +0200
x86_64 x86_64 x86_64 GNU/Linux


I would like to prevent a reboot.

Thanks
Christoph


2010-11-22 15:25:05

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4_alloc_context occupies 150 GiB of memory and makes the system unusable

On 11/22/10 6:23 AM, Christoph Bartoschek wrote:
> Hi,
>
> I have the problem that on one machine lots of memory is allocated for
> ext4_alloc_context.
>
> I would like to know for what purpose the memory is allocated and why it is
> not given to processes that need memory.
>
> The machine normally only uses a local ext4 for booting. The data it is
> working on comes from NFS.
>
> Now there are several normally CPU-bound jobs running but they only get 1-2%
> of cputime because they are constantly swapping. They are swapping because of
> the 192 GiB the machine has 150 GiB are allocated for ext4_alloc_context.
> Here is the output of /dev/meminfo:

You probably want my patch,

commit 3e1e5f501632460184a98237d5460c521510535e
Author: Eric Sandeen <[email protected]>
Date: Wed Oct 27 21:30:07 2010 -0400

ext4: don't use ext4_allocation_contexts for tracing

Many tracepoints were populating an ext4_allocation_context
to pass in, but this requires a slab allocation even when
tracepoints are off. In fact, 4 of 5 of these allocations
were only for tracing. In addition, we were only using a
small fraction of the 144 bytes of this structure for this
purpose.

We can do away with all these alloc/frees of the ac and
simply pass in the bits we care about, instead.

I tested this by turning on tracing and running through
xfstests on x86_64. I did not actually do anything with
the trace output, however.

Signed-off-by: Eric Sandeen <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

I don't know why the inactive slabs stay around, but there is no
reason to be (ab)using this slab cache for this purpose, and the
above commit just removes most users of the cache.

I -think- I have seen a case where even with this patch alloc_contexts
still hang around, and I can't explain it. But you might start
with the above, as it should at least make things better.

...
>
> We see that Slab uses most of the memory. And within slab nearly everything is
> used for ext4_alloc_context. There is the output of slabtop:
>
> Active / Total Objects (% used) : 364597 / 1070670469 (0.0%)
> Active / Total Slabs (% used) : 52397 / 39688960 (0.1%)
> Active / Total Caches (% used) : 107 / 193 (55.4%)
> Active / Total Size (% used) : 159579.25K / 150697605.41K (0.1%)
> Minimum / Average / Maximum Object : 0.02K / 0.14K / 4096.00K
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 1070187012 0 0% 0.14K 39636556 27 158546224K
> ext4_alloc_context
>

and it's all unused... (inactive)

To make matters worse drop_caches doesn't touch the slabs, IIRC, but you
might try: echo 3 > /proc/sys/vm/drop_caches

> I see no reason why ext4 should use so much memory. What is it used for? And
> how can I release it to get it used for my processes.

You may need to reboot, or at best unmount ext4 filesystems and/or rmmod
the ext4 module, if the drop_caches trick doesn't work.

The fact that this doesn't get reclaimed seems to point to a problem
with the vm though, I think (aside from the craziness of ext4 using
this slab so heavily without my patch...)

-Eric

2010-11-22 15:38:00

by Christoph Bartoschek

[permalink] [raw]
Subject: Re: ext4_alloc_context occupies 150 GiB of memory and makes the system unusable

Am Montag, 22. November 2010 schrieben Sie:

> > We see that Slab uses most of the memory. And within slab nearly
> > everything is
> >
> > used for ext4_alloc_context. There is the output of slabtop:
> > Active / Total Objects (% used) : 364597 / 1070670469 (0.0%)
> > Active / Total Slabs (% used) : 52397 / 39688960 (0.1%)
> > Active / Total Caches (% used) : 107 / 193 (55.4%)
> > Active / Total Size (% used) : 159579.25K / 150697605.41K (0.1%)
> > Minimum / Average / Maximum Object : 0.02K / 0.14K / 4096.00K
> >
> > OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> >
> > 1070187012 0 0% 0.14K 39636556 27 158546224K
> > ext4_alloc_context
>
> and it's all unused... (inactive)
>
> To make matters worse drop_caches doesn't touch the slabs, IIRC, but you
> might try: echo 3 > /proc/sys/vm/drop_caches

I tried it and it did not improve anything.


> > I see no reason why ext4 should use so much memory. What is it used for?
> > And how can I release it to get it used for my processes.
>
> You may need to reboot, or at best unmount ext4 filesystems and/or rmmod
> the ext4 module, if the drop_caches trick doesn't work.
>
> The fact that this doesn't get reclaimed seems to point to a problem
> with the vm though, I think (aside from the craziness of ext4 using
> this slab so heavily without my patch...)

I see the problem for the first time and I do not know whether it is
reproducable. We have several similar machines with similar workloads but none
has shown such a problem till now.

I'm going to reboot the machine. If it shows the problem again I will try a
newer kernel and then the patch.

Some workload will be lost, but the machine did not do anything useful for
three days now :)

Thanks,
Christoph

2010-11-22 15:45:54

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext4_alloc_context occupies 150 GiB of memory and makes the system unusable

On 11/22/10 9:37 AM, Christoph Bartoschek wrote:

...

> I see the problem for the first time and I do not know whether it is
> reproducable. We have several similar machines with similar workloads but none
> has shown such a problem till now.
>
> I'm going to reboot the machine. If it shows the problem again I will try a
> newer kernel and then the patch.
>
> Some workload will be lost, but the machine did not do anything useful for
> three days now :)

at some point somebody needs to look at the slab cache management, I think;
even if ext4 is (ab)using this cache, having that much memory unreclaimable
in inactive caches is clearly a bug somewhere...!

-Eric

2010-12-05 03:11:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4_alloc_context occupies 150 GiB of memory and makes the system unusable

Christoph,

Have you been able to replicate this problem since rebooting your
machine. I've never seen anything quite like your report before.

If you do, a couple of questions. Which slab allocator are you using?
Are you using SLAB or SLUB? (grep for CONFIG_SLAB or CONFIG_SLUB in
your .config file).

If you are using SLUB, it would be useful to compile slubinfo.c (found
in /usr/src/linux/Documentation/vm/slabinfo.c) and send us the output
"slabinfo -a ext4_allocation_context" and "slabinfo -r
ext4_allocation_context".

Also, you might try "slabinfo -s" and see if that shrinks the slabs
for you. If this works, why it wasn't doing this automatically is
beyond me.

- Ted


2011-03-15 10:55:09

by Sven

[permalink] [raw]
Subject: Re: ext4_alloc_context occupies 150 GiB of memory and makes the system unusable

Ted Ts'o writes:

> Christoph,
>
> Have you been able to replicate this problem since rebooting your
> machine. I've never seen anything quite like your report before.

I have a similar problem here: 2.5 GB of 6 GB used by ext4_allocation_context
according to slabtop.

# uname -a
Linux ... 2.6.34.7-0.7-default #1 SMP 2010-12-13 11:13:53 +0100 x86_64 x86_64
x86_64 GNU/Linux

openSUSE 11.3

> If you do, a couple of questions. Which slab allocator are you using?
> Are you using SLAB or SLUB? (grep for CONFIG_SLAB or CONFIG_SLUB in
> your .config file).

It seems to be SLAB, so I cannot use slabinfo:
# slabinfo -r
SYSFS support for SLUB not active

Any solutions found in the past 3 months?