2009-06-30 10:09:44

by Attila Kinali

[permalink] [raw]
Subject: Long lasting MM bug when swap is smaller than RAM

Moin,

There has been a bug back in the 2.4.17 days that is somehow
triggered by swap being smaller than RAM, which i thought had
been fixed long ago, reappeared on one of the machines i manage.

<history>
Back in 2002, i had a few machines, running 2.4.x kernels, which
i upgraded heavily from some 16-64MB RAM to a couple 100MB
(changing mainboards at times, but keeping the harddisks).
Due to the upgrade of RAM, the swap size became a lot smaller
than RAM size, sometimes not even half as much.
Under most conditions these machines worked fine, but sometimes,
they showed a strange behavior: At times, the swap use would grow
(depending on the machine and its use faster or slower, sometimes
at 1MB/minute) until it was full. I couldnt figure out what filled
swap back then, couldnt find any programm that used a lot of memory.
And even more, the RAM portion that was used as cache and buffers
was most times still very large, ie it didnt seem like something using
a lot of memory.
After swap was full, nothing happend. No programms crashing, no errors
in the logs, nothing.... Until later (between hours and a few weeks),
the OOM would suddenly start to kick in and kill applications. This
time, something would use a lot of memory, but i couldn't figure out
what. None of the applications running would use more than usual.
And even killing the usual culprits (Mozilla, X11,...) wouldnt help.
The only cure was to reboot.

All the machines back then were running Debian, a vanilla kernel,
and had more RAM than swap and were x86 boxes. Other than that,
they didnt had much in common. One was a machine with an Adaptec
2940UW, others had IDE, one had a K6-III CPU, others were Intel.
Some had a lot of disk, others very little. Machine usage was
fileserver, firewall/router, desktop, laptop.

I reported this bug back then but never got an answer, so i used
the only fix i had available back then: disable swap completely.
</history>

Now, 7 years later, i have a machine that shows the same behavior.

Some data:

We have a HP DL380 G4 currently running a 2.6.29.4 vanilla kernel,
compiled for x86 32 bit.
It was originaly purchased in 2005 with 2GB RAM and a few weeks
ago upgraded to 6GB (no other changes beside this and a kernel upgrade).
The machine, being the MPlayer main server, runs a lighttpd, svnserve,
mailman, postfix, bind. Ie nothing unusual and the applications didn't
change in the last months (since the update from debian/etch to lenny).

---
root@natsuki:/home/attila# uname -a
Linux natsuki 2.6.29.4 #1 SMP Sun May 31 22:13:21 CEST 2009 i686 GNU/Linux
root@natsuki:/home/attila# uptime
11:41:07 up 29 days, 13:17, 5 users, load average: 0.15, 0.36, 0.54
root@natsuki:/home/attila# free -m
total used free shared buffers cached
Mem: 6023 5919 103 0 415 3873
-/+ buffers/cache: 1630 4393
Swap: 3812 879 2932
---

I want to point your attention at the fact that the machine has now
more RAM installed than it previously had RAM+Swap (ie before the upgrade).
Ie there is no reason it would need to swap out, at least not so much.

What is even more interesting is the amount of swap used over time.
Sampled every day at 10:00 CEST:

---
Date: Wed, 17 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5893 130 0 405 3834
Swap: 3812 190 3622

Date: Thu, 18 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5793 229 0 340 3939
Swap: 3812 225 3586

Date: Fri, 19 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5820 203 0 341 3899
Swap: 3812 275 3536

Date: Sun, 21 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5264 758 0 459 3181
Swap: 3812 325 3486

Date: Sat, 20 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5761 262 0 348 3865
Swap: 3812 297 3514

Date: Mon, 22 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5875 147 0 397 3681
Swap: 3812 353 3458

Date: Tue, 23 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5748 275 0 193 3949
Swap: 3812 415 3396

Date: Wed, 24 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5779 244 0 176 3924
Swap: 3812 519 3292

Date: Thu, 25 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5812 210 0 345 3856
Swap: 3812 611 3200

Date: Fri, 26 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5830 192 0 431 3688
Swap: 3812 682 3129

Date: Sat, 27 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5697 326 0 442 3621
Swap: 3812 719 3093

Date: Sun, 28 Jun 2009 10:00:02 +0200 (CEST)
Mem: 6023 5890 132 0 402 3886
Swap: 3812 784 3028

Date: Mon, 29 Jun 2009 10:00:01 +0200 (CEST)
Mem: 6023 5388 635 0 425 3321
Swap: 3812 826 2985
---

As you can see, although memory usage didnt change much over time,
swap usage increased from 190MB to 826MB in about two weeks.

As i'm pretty much clueless when it commes to how the linux VM works,
i would appreciate it if someone could give me some pointers on how
to figure out what causes this bug so that it could be fixed finally.

Thanks a lot in advance

Attila Kinali


2009-06-30 17:58:55

by Hugh Dickins

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

On Tue, 30 Jun 2009, Attila Kinali wrote:
>
> There has been a bug back in the 2.4.17 days that is somehow
> triggered by swap being smaller than RAM, which i thought had
> been fixed long ago, reappeared on one of the machines i manage.

Snipped <history>, which I hope won't be repeated to the point of OOM.

>
> Now, 7 years later, i have a machine that shows the same behavior.
>
> Some data:
>
> We have a HP DL380 G4 currently running a 2.6.29.4 vanilla kernel,
> compiled for x86 32 bit.
> It was originaly purchased in 2005 with 2GB RAM and a few weeks
> ago upgraded to 6GB (no other changes beside this and a kernel upgrade).
> The machine, being the MPlayer main server, runs a lighttpd, svnserve,
> mailman, postfix, bind. Ie nothing unusual and the applications didn't
> change in the last months (since the update from debian/etch to lenny).
>
> ---
> root@natsuki:/home/attila# uname -a
> Linux natsuki 2.6.29.4 #1 SMP Sun May 31 22:13:21 CEST 2009 i686 GNU/Linux
> root@natsuki:/home/attila# uptime
> 11:41:07 up 29 days, 13:17, 5 users, load average: 0.15, 0.36, 0.54
> root@natsuki:/home/attila# free -m
> total used free shared buffers cached
> Mem: 6023 5919 103 0 415 3873
> -/+ buffers/cache: 1630 4393
> Swap: 3812 879 2932
> ---
>
> I want to point your attention at the fact that the machine has now
> more RAM installed than it previously had RAM+Swap (ie before the upgrade).
> Ie there is no reason it would need to swap out, at least not so much.
>
> What is even more interesting is the amount of swap used over time.
> Sampled every day at 10:00 CEST:
>
> ---
> Date: Wed, 17 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5893 130 0 405 3834
> Swap: 3812 190 3622
>
> Date: Thu, 18 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5793 229 0 340 3939
> Swap: 3812 225 3586
>
...
>
> Date: Sun, 28 Jun 2009 10:00:02 +0200 (CEST)
> Mem: 6023 5890 132 0 402 3886
> Swap: 3812 784 3028
>
> Date: Mon, 29 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5388 635 0 425 3321
> Swap: 3812 826 2985
> ---
>
> As you can see, although memory usage didnt change much over time,
> swap usage increased from 190MB to 826MB in about two weeks.
>
> As i'm pretty much clueless when it commes to how the linux VM works,
> i would appreciate it if someone could give me some pointers on how
> to figure out what causes this bug so that it could be fixed finally.

I'm not sure that there's any problem here at all. Beyond hibernation
to disk wanting enough swapspace to write its image, I can't think of
any reason why the kernel would misbehave if your swapspace is smaller
than your RAM.

One possibility is that this steady rise in swap usage just reflects
memory pressure (a nightly cron job?) pushing pages out to swap,
slightly different choices each time, and what's not modified later
gets left with a copy on swap. That would tend to rise (at a slower
and slower rate) until swap is 50% full, then other checks should
keep it around that level.

If you do see it at more than 50% full in the morning, then yes,
I think you do have a leak: but it's more likely to be an
application than the kernel itself. When kernel leaks occur,
they're often of "Slab:" memory - is that rising in /proc/meminfo?

Are you sure this steady rise in swap usage wasn't happening before
you added that RAM? It's possible that you have an application which
decides how much memory to use, based on the amount of RAM in the
machine, itself assuming there's more than that of swap.

Do you have unwanted temporary files accumulating in a tmpfs?
Their pages get pushed out to swap. Or a leak in shared memory:
does ipcs show increasing usage of shared memory?

Hugh

2009-07-01 01:35:40

by Robert Hancock

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

On 06/30/2009 03:58 AM, Attila Kinali wrote:
> Moin,
>
> There has been a bug back in the 2.4.17 days that is somehow
> triggered by swap being smaller than RAM, which i thought had
> been fixed long ago, reappeared on one of the machines i manage.
>
> <history>

It's quite unlikely what you are seeing is at all related to that
problem. The VM subsystem has been hugely changed since then.

> root@natsuki:/home/attila# free -m
> total used free shared buffers cached
> Mem: 6023 5919 103 0 415 3873
> -/+ buffers/cache: 1630 4393
> Swap: 3812 879 2932
> ---
>
> I want to point your attention at the fact that the machine has now
> more RAM installed than it previously had RAM+Swap (ie before the upgrade).
> Ie there is no reason it would need to swap out, at least not so much.
>
> What is even more interesting is the amount of swap used over time.
> Sampled every day at 10:00 CEST:
>
> ---
> Date: Wed, 17 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5893 130 0 405 3834
> Swap: 3812 190 3622

..

> As you can see, although memory usage didnt change much over time,
> swap usage increased from 190MB to 826MB in about two weeks.
>
> As i'm pretty much clueless when it commes to how the linux VM works,
> i would appreciate it if someone could give me some pointers on how
> to figure out what causes this bug so that it could be fixed finally.

You didn't post what the swap usage history before the upgrade was. But
swapping does not only occur if memory is running low. If disk usage is
high then non-recently used data may be swapped out to make more room
for disk caching.

Also, by increasing memory from 2GB to 6GB on a 32-bit kernel, some
memory pressure may actually be increased since many kernel data
structures can only be in low memory (the bottom 896MB). The more that
the system memory is increased the more the pressure on low memory can
become. Using a 64-bit kernel avoids this problem.

2009-07-01 04:21:18

by Fengguang Wu

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

On Tue, Jun 30, 2009 at 11:58:19AM +0200, Attila Kinali wrote:
> Moin,
>
> There has been a bug back in the 2.4.17 days that is somehow
> triggered by swap being smaller than RAM, which i thought had
> been fixed long ago, reappeared on one of the machines i manage.
>
> <history>
> Back in 2002, i had a few machines, running 2.4.x kernels, which
> i upgraded heavily from some 16-64MB RAM to a couple 100MB
> (changing mainboards at times, but keeping the harddisks).
> Due to the upgrade of RAM, the swap size became a lot smaller
> than RAM size, sometimes not even half as much.
> Under most conditions these machines worked fine, but sometimes,
> they showed a strange behavior: At times, the swap use would grow
> (depending on the machine and its use faster or slower, sometimes
> at 1MB/minute) until it was full. I couldnt figure out what filled
> swap back then, couldnt find any programm that used a lot of memory.
> And even more, the RAM portion that was used as cache and buffers
> was most times still very large, ie it didnt seem like something using
> a lot of memory.
> After swap was full, nothing happend. No programms crashing, no errors
> in the logs, nothing.... Until later (between hours and a few weeks),
> the OOM would suddenly start to kick in and kill applications. This
> time, something would use a lot of memory, but i couldn't figure out
> what. None of the applications running would use more than usual.
> And even killing the usual culprits (Mozilla, X11,...) wouldnt help.
> The only cure was to reboot.
>
> All the machines back then were running Debian, a vanilla kernel,
> and had more RAM than swap and were x86 boxes. Other than that,
> they didnt had much in common. One was a machine with an Adaptec
> 2940UW, others had IDE, one had a K6-III CPU, others were Intel.
> Some had a lot of disk, others very little. Machine usage was
> fileserver, firewall/router, desktop, laptop.
>
> I reported this bug back then but never got an answer, so i used
> the only fix i had available back then: disable swap completely.
> </history>
>
> Now, 7 years later, i have a machine that shows the same behavior.
>
> Some data:
>
> We have a HP DL380 G4 currently running a 2.6.29.4 vanilla kernel,
> compiled for x86 32 bit.
> It was originaly purchased in 2005 with 2GB RAM and a few weeks
> ago upgraded to 6GB (no other changes beside this and a kernel upgrade).
> The machine, being the MPlayer main server, runs a lighttpd, svnserve,
> mailman, postfix, bind. Ie nothing unusual and the applications didn't
> change in the last months (since the update from debian/etch to lenny).
>
> ---
> root@natsuki:/home/attila# uname -a
> Linux natsuki 2.6.29.4 #1 SMP Sun May 31 22:13:21 CEST 2009 i686 GNU/Linux
> root@natsuki:/home/attila# uptime
> 11:41:07 up 29 days, 13:17, 5 users, load average: 0.15, 0.36, 0.54
> root@natsuki:/home/attila# free -m
> total used free shared buffers cached
> Mem: 6023 5919 103 0 415 3873
> -/+ buffers/cache: 1630 4393
> Swap: 3812 879 2932
> ---

Hi Attila,

What's your /proc/meminfo and/or /proc/vmstat contents?
Are you making use of tmpfs, or intel graphics devices with drm?

Thanks,
Fengguang

> I want to point your attention at the fact that the machine has now
> more RAM installed than it previously had RAM+Swap (ie before the upgrade).
> Ie there is no reason it would need to swap out, at least not so much.
>
> What is even more interesting is the amount of swap used over time.
> Sampled every day at 10:00 CEST:
>
> ---
> Date: Wed, 17 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5893 130 0 405 3834
> Swap: 3812 190 3622
>
> Date: Thu, 18 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5793 229 0 340 3939
> Swap: 3812 225 3586
>
> Date: Fri, 19 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5820 203 0 341 3899
> Swap: 3812 275 3536
>
> Date: Sun, 21 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5264 758 0 459 3181
> Swap: 3812 325 3486
>
> Date: Sat, 20 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5761 262 0 348 3865
> Swap: 3812 297 3514
>
> Date: Mon, 22 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5875 147 0 397 3681
> Swap: 3812 353 3458
>
> Date: Tue, 23 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5748 275 0 193 3949
> Swap: 3812 415 3396
>
> Date: Wed, 24 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5779 244 0 176 3924
> Swap: 3812 519 3292
>
> Date: Thu, 25 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5812 210 0 345 3856
> Swap: 3812 611 3200
>
> Date: Fri, 26 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5830 192 0 431 3688
> Swap: 3812 682 3129
>
> Date: Sat, 27 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5697 326 0 442 3621
> Swap: 3812 719 3093
>
> Date: Sun, 28 Jun 2009 10:00:02 +0200 (CEST)
> Mem: 6023 5890 132 0 402 3886
> Swap: 3812 784 3028
>
> Date: Mon, 29 Jun 2009 10:00:01 +0200 (CEST)
> Mem: 6023 5388 635 0 425 3321
> Swap: 3812 826 2985
> ---
>
> As you can see, although memory usage didnt change much over time,
> swap usage increased from 190MB to 826MB in about two weeks.
>
> As i'm pretty much clueless when it commes to how the linux VM works,
> i would appreciate it if someone could give me some pointers on how
> to figure out what causes this bug so that it could be fixed finally.
>
> Thanks a lot in advance
>
> Attila Kinali
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2009-07-01 07:55:18

by Attila Kinali

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

Good morning,

On Tue, 30 Jun 2009 18:58:29 +0100 (BST)
Hugh Dickins <[email protected]> wrote:

> One possibility is that this steady rise in swap usage just reflects
> memory pressure (a nightly cron job?) pushing pages out to swap,

That might be true, if we wouldn't have now a lot more RAM than
swap+RAM before. Ie if we now would have memory pressure, then
we would have run out of swap space before and hence triggered the OOM
which didnt happen.

Tough, i did enable logging of the memory usage and swap space in
mrtg: http://natsuki.mplayerhq.hu/cgi-bin/mrtg-rrd.cgi/localmem.html
(blue is free mem w/o buffers/cache, red is used swap space).
The daily/weekly/monthly cron jobs are run between 6:25 and 7:00
but we dont have any increase in memory usage then.
What is interesting is the step at 4:10, which is exactly the time
when an rsync based backup of the mailman archives (lot of small files)
is started. But swap usage didnt increase at that time.
What is strange though, is that the backup takes about 13 minutes.
Most of that time is spend on traversing the directory tree and
stat'ing files. But the increase in memory usage is a sharp step
at the beginning only.

> slightly different choices each time, and what's not modified later
> gets left with a copy on swap. That would tend to rise (at a slower
> and slower rate) until swap is 50% full, then other checks should
> keep it around that level.

I don't want to wait that long.

>
> If you do see it at more than 50% full in the morning, then yes,
> I think you do have a leak: but it's more likely to be an
> application than the kernel itself. When kernel leaks occur,
> they're often of "Slab:" memory - is that rising in /proc/meminfo?

I havent monitored /proc/meminfo yet.

> Are you sure this steady rise in swap usage wasn't happening before
> you added that RAM?

Yes. I semi-regularly checked memory usage by hand and we never had
more than a couple MB of swap used.

> It's possible that you have an application which
> decides how much memory to use, based on the amount of RAM in the
> machine, itself assuming there's more than that of swap.

I dont think we have any application that does this. As I said, it's only a
web/mail/dns/svn server. There isnt anything fancy running.
Even the webpages are static only (beside the mailman interface).
The number of users on the machine is limited and none runs anything
directly on the machine (they are all just maintenance accounts).

> Do you have unwanted temporary files accumulating in a tmpfs?
> Their pages get pushed out to swap. Or a leak in shared memory:
> does ipcs show increasing usage of shared memory?

/tmp is on a hard disk and thus doesnt add to memory usage.
The two tmpfs mounts (/dev/shm and /lib/init/rw) are completely
unused and empty.


Attila Kinali

2009-07-01 08:04:54

by Attila Kinali

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

On Tue, 30 Jun 2009 19:36:15 -0600
Robert Hancock <[email protected]> wrote:

> On 06/30/2009 03:58 AM, Attila Kinali wrote:
> > Moin,
> >
> > There has been a bug back in the 2.4.17 days that is somehow
> > triggered by swap being smaller than RAM, which i thought had
> > been fixed long ago, reappeared on one of the machines i manage.
> >
> > <history>
>
> It's quite unlikely what you are seeing is at all related to that
> problem. The VM subsystem has been hugely changed since then.

That's why i thought this problem was fixed.

> You didn't post what the swap usage history before the upgrade was.

Because i don't have any hard data on this. I checked it by hand
from time to time and we never had more than a few MB of swap used.

> But
> swapping does not only occur if memory is running low. If disk usage is
> high then non-recently used data may be swapped out to make more room
> for disk caching.

Hmm..I didn't know this.. thanks!


> Also, by increasing memory from 2GB to 6GB on a 32-bit kernel, some
> memory pressure may actually be increased since many kernel data
> structures can only be in low memory (the bottom 896MB).

Interesting. But shouldnt memory be "swapped" to highmem first
before going out onto disk?

> The more that
> the system memory is increased the more the pressure on low memory can
> become. Using a 64-bit kernel avoids this problem.

Unfortunately, the CPU we have is still a pure 32bit CPU, so this option
cannot be used.

Attila Kinali

2009-07-01 08:08:50

by Attila Kinali

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

On Wed, 1 Jul 2009 10:04:32 +0200
Attila Kinali <[email protected]> wrote:

> > But
> > swapping does not only occur if memory is running low. If disk usage is
> > high then non-recently used data may be swapped out to make more room
> > for disk caching.
>
> Hmm..I didn't know this.. thanks!

This was the cause of the problem!

I just restarted svnserv, clamav and bind (the three applications
using most memory) and suddenly swap cleared up.

Now the question is, why did they accumulate so much used swap
space, while before the RAM upgrade, we hardly used the swap space at all?

Attila Kinali

2009-07-01 23:16:21

by Zan Lynx

[permalink] [raw]
Subject: Re: Long lasting MM bug when swap is smaller than RAM

Attila Kinali wrote:
> On Wed, 1 Jul 2009 10:04:32 +0200
> Attila Kinali <[email protected]> wrote:
>
>>> But
>>> swapping does not only occur if memory is running low. If disk usage is
>>> high then non-recently used data may be swapped out to make more room
>>> for disk caching.
>> Hmm..I didn't know this.. thanks!
>
> This was the cause of the problem!
>
> I just restarted svnserv, clamav and bind (the three applications
> using most memory) and suddenly swap cleared up.
>
> Now the question is, why did they accumulate so much used swap
> space, while before the RAM upgrade, we hardly used the swap space at all?

I do not know about the others, but ClamAV suffers from pretty serious
memory fragmentation. What it does is load the updated signatures into
a new memory allocation, verify them, then free the old signature
allocation. This results in a large hole in glibc's malloc structures
and because of ClamAV's allocation pattern, this hole is difficult to
reclaim. This ClamAV memory fragmentation will continue to grow worse
until the daemon is completely restarted.

Under memory pressure the kernel pushes least used pages out to swap,
and these unused but still allocated pages of ClamAV are never again
used, so out to swap they go.

I know this because the company I work for tried to fix the memory
allocation fragmentation of ClamAV, but they did not like our patch and
preferred to continue allowing the memory allocator to fragment in
exchange for simpler code.

--
Zan Lynx
[email protected]

"Knowledge is Power. Power Corrupts. Study Hard. Be Evil."