2001-11-25 14:49:24

by Chris Chabot

[permalink] [raw]
Subject: Severe Linux 2.4 kernel memory leakage

Hi, I have a firewall / file server box which is displaying (severe)
memory leakage, presumably by the kernel.

The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
2.4.9 upto 2.4.15, in all situations the same problem appeared.

The problem is that when the box boots up, it uses about 60Mb of memory.
However after only 1 1/2 days, the memory usage is already around 430Mb
(!!). (this is ofcource used - buffers - cache, as displayed by 'free').

When i do a ps aux, and add the 'resident' memory usage of the
applications, the memory usage should be around 70-80Mb (a bit higher
then @ boot time since bind uses more memory for caching). Yet 'free'
happely tells me:

total used free shared buffers
cached
Mem: 1029752 1019188 10564 0 130888
456000
-/+ buffers/cache: 432300 597452
Swap: 2104464 996 2103468

When the box keeps on running for about a month, the memory usage gets
so high that it turns into a swap-crazy, low-memory and slow server ;-/
(it does free up cache memory, and swaps stuff out, however the 'leaked'
memory only grows and is never re-claimed).

The box runs dhcpd, bind, fetchmail (cron), pppd (to adsl modem), smb,
nfs, xinetd (imapd mostly) and sshd.

Also it has a (custom) iptables firewall script, and a simple ip route
hack to allow 'outbound interface == inbound interface' (using ipmark
based routing) for my cable modem & adsl modem. Also it has a 310Gb raid
0 array on 4 IDE disks.

Since this box has ran several versions of different kernels, redhat
distro's, and various firewall scripts. I tend to believe this is a more
'structural' problem within the linux kernel.

The box firewalls for both my cable modem and my adsl modem, and has 3
network cards (1 direct to cable, one direct to adsl, one to local
network).

The hardware on the box is : Asus p2b-ds, 2x p3-600, 1Gb (ECC) ram, 3
network cards (1x Intel EtherExpressPro, 2x 3c905 tx), Internal adaptect
29xx u2w scsi, internal intel IDE, 2x Seagate Cheetah (u2w) 18 Gb disks
(/ and /var), 4x 80 Gb Maxtor IDE disks (raid 0 array) and a NVidia TNT2
card. This hardware

The kernel is compiled with all network- and scsi card and raid0 drivers
build in, and nfs + iptables as modules. The machine currently uses ext3
(also build in), however this problem was also present before i
converted the raid0 volume to ext3, so i do not suspect it to cause this
problem. The kernel is also set for HIGHMEM (4gb) to use the last Mb's
of the 1Gb of ram (else 127Mb isnt detected).

If there is any additional information i can provide, please feel free
to ask! Also please CC me in the replies, since i am not subscribed to
the linux-kernel list.


I do not know which component (iptables / route hack / raid0 / network
cards / highmem) cause this problem. I run several of these components
on other servers, without the same problems.. However in this
combination, the kernel seems very leaky ;-/ Any and all sugestions or
help is greatly apreciated.


Additional info on the box:

My Routes script (to allow cable and adsl to use the same outbound
interface as inbound, to prevent invalid routing over the default gw):


#!/bin/bash
echo 1 > /proc/sys/net/ipv4/route/flush
echo "Removing old rules"
ip rule del from 24.132.33.179 table a2000 &>/dev/null
ip rule del from 213.84.192.197 table xs4all &>/dev/null
ip route del table a2000 &>/dev/null
ip route del table xs4all &>/dev/null
echo "Setting routing"
ip rule add from 24.132.33.179 table a2000 prio 20
ip rule add from 213.84.192.197 table xs4all prio 30
ip route add 0/0 table a2000 dev eth0 prio 20
ip route add 0/0 table xs4all dev ppp0 prio 30


free (this is after 1 1/2 day):

total used free shared buffers
cached
Mem: 1029752 1018528 11224 0 131608
454556
-/+ buffers/cache: 432364 597388
Swap: 2104464 996 2103468



ps aux:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
COMMAND
root 1 0.0 0.0 1416 476 ? S Nov24 0:04 init
[3]
root 2 0.0 0.0 0 0 ? SW Nov24 0:00
[keventd]
root 3 0.0 0.0 0 0 ? SWN Nov24 0:00
[ksoftirqd_CPU0]
root 4 0.0 0.0 0 0 ? SWN Nov24 0:00
[ksoftirqd_CPU1]
root 5 0.0 0.0 0 0 ? SW Nov24 0:13
[kswapd]
root 6 0.0 0.0 0 0 ? SW Nov24 0:00
[bdflush]
root 7 0.0 0.0 0 0 ? SW Nov24 0:03
[kupdated]
root 8 0.0 0.0 0 0 ? SW Nov24 0:00
[scsi_eh_0]
root 9 0.0 0.0 0 0 ? SW< Nov24 0:00
[mdrecoveryd]
root 10 0.0 0.0 0 0 ? SW Nov24 0:01
[kjournald]
root 145 0.0 0.0 0 0 ? SW Nov24 0:00
[kjournald]
root 146 0.0 0.0 0 0 ? SW Nov24 0:01
[kjournald]
root 147 0.0 0.0 0 0 ? SW Nov24 0:18
[kjournald]
root 719 0.0 0.0 1476 604 ? S Nov24 0:00 syslogd
-m 0 -r
root 724 0.0 0.0 1404 476 ? S Nov24 0:00 klogd -2
-x
bin 744 0.0 0.0 1660 764 ? S Nov24 0:00
portmap
root 801 0.0 0.0 1792 568 ? S Nov24 0:00
rpc.rquotad
root 806 0.0 0.0 1620 716 ? S Nov24 0:00
rpc.mountd
root 811 0.0 0.0 0 0 ? SW Nov24 0:20
[nfsd]
root 812 0.0 0.0 0 0 ? SW Nov24 0:20
[nfsd]
root 813 0.0 0.0 0 0 ? SW Nov24 0:20
[nfsd]
root 814 0.0 0.0 0 0 ? SW Nov24 0:19
[nfsd]
root 815 0.0 0.0 0 0 ? SW Nov24 0:20
[nfsd]
root 816 0.0 0.0 0 0 ? SW Nov24 0:19
[nfsd]
root 817 0.0 0.0 0 0 ? SW Nov24 0:20
[nfsd]
root 818 0.0 0.0 0 0 ? SW Nov24 0:20
[nfsd]
root 819 0.0 0.0 0 0 ? SW Nov24 0:00
[lockd]
root 820 0.0 0.0 0 0 ? SW Nov24 0:00
[rpciod]
root 892 0.0 0.0 1920 896 ? S Nov24 0:00
/usr/sbin/pppd ca
root 911 0.0 0.1 2680 1084 ? S Nov24 0:02
/usr/sbin/sshd
root 931 0.0 0.1 2312 1032 ? S Nov24 0:00 xinetd
-stayalive
root 951 0.0 0.0 1796 648 ? S Nov24 0:00
/usr/sbin/dhcpd
named 1009 0.0 0.4 15328 4364 ? S Nov24 0:00 named -u
named
named 1011 0.0 0.4 15328 4364 ? S Nov24 0:01 named -u
named
named 1012 0.0 0.4 15328 4364 ? S Nov24 0:07 named -u
named
named 1013 0.0 0.4 15328 4364 ? S Nov24 0:06 named -u
named
named 1014 0.0 0.4 15328 4364 ? S Nov24 0:04 named -u
named
named 1015 0.0 0.4 15328 4364 ? S Nov24 0:01 named -u
named
root 1033 0.9 0.0 1456 528 ? S Nov24 29:38
/usr/sbin/pptp pp
root 1043 0.0 0.1 5684 1384 ? S Nov24 0:10 sendmail:
accepti
root 1062 0.0 0.0 1648 676 ? S Nov24 0:00
crond
root 1541 0.0 0.0 1388 380 tty1 S Nov24 0:00
/sbin/mingetty tt
root 1542 0.0 0.0 1388 380 tty2 S Nov24 0:00
/sbin/mingetty tt
root 1545 0.0 0.0 1448 560 ? S Nov24 0:00
/usr/sbin/pptp pp
root 7003 0.0 0.1 3260 1132 ? S 05:00 0:00 smbd
-D
root 7008 0.0 0.1 2448 1128 ? S 05:00 0:00 nmbd
-D
root 7609 0.0 0.1 3732 2016 ? S 14:39 0:00
/usr/sbin/sshd
root 7611 0.0 0.1 2612 1448 pts/1 S 14:39 0:00
-bash
root 13333 0.0 0.0 2656 752 pts/1 R 14:57 0:00 ps aux





2001-11-25 15:04:15

by Florian Weimer

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

Chris Chabot <[email protected]> writes:

> When the box keeps on running for about a month,

Which kernels have you run for about a month, and which ones showed
this extreme behavior? Obviously not 2.4.15...

The amount of available memory decreasing is quite normal, due to the
growing cache.

--
Florian Weimer [email protected]
University of Stuttgart http://cert.uni-stuttgart.de/
RUS-CERT +49-711-685-5973/fax +49-711-685-5898

2001-11-25 15:11:15

by James Morris

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

On 25 Nov 2001, Chris Chabot wrote:

> Also it has a (custom) iptables firewall script

Are you using ipchains emulation?


- James
--
James Morris
<[email protected]>


2001-11-25 15:20:05

by Chris Chabot

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

nope, just plain netfilter/iptables. Specificly (lsmod output) :

ipt_TOS 880 8 (autoclean)
ipt_MASQUERADE 1312 4 (autoclean)
ipt_state 576 7 (autoclean)
ipt_REJECT 2816 7 (autoclean)
ipt_LOG 3408 24 (autoclean)
ipt_limit 1008 26 (autoclean)
ip_nat_ftp 3184 0 (unused)
ip_conntrack_ftp 3536 0 [ip_nat_ftp]
iptable_mangle 1712 0 (autoclean) (unused)
iptable_nat 14448 1 (autoclean) [ipt_MASQUERADE
ip_nat_ftp]
ip_conntrack 15056 5 (autoclean) [ipt_MASQUERADE ipt_state
ip_nat_ftp ip_conntrack_ftp iptable_nat]
iptable_filter 1680 0 (autoclean) (unused)
ip_tables 11392 11 [ipt_TOS ipt_MASQUERADE ipt_state
ipt_REJECT ipt_LOG ipt_limit iptable_mangle iptable_nat iptable_filter]

I've also attached the output of 'iptables -L -n' so u can get an idea
of what its running.

-- Chris

On Sun, 2001-11-25 at 16:10, James Morris wrote:
> On 25 Nov 2001, Chris Chabot wrote:
>
> > Also it has a (custom) iptables firewall script
>
> Are you using ipchains emulation?
>
>
> - James
> --
> James Morris
> <[email protected]>
>


Attachments:
iptables-output.txt (24.46 kB)

2001-11-25 15:27:50

by Peter T. Breuer

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

"A month of sundays ago Chris Chabot wrote:"
> The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
> 2.4.9 upto 2.4.15, in all situations the same problem appeared.
>
> The problem is that when the box boots up, it uses about 60Mb of memory.
> However after only 1 1/2 days, the memory usage is already around 430Mb
> (!!). (this is ofcource used - buffers - cache, as displayed by 'free').

I also have this problem. Unknown circumstances provoke it. Kernel
2.4.9 to 2.4.13. When it occurs I lose about 30MB a day.

Dual 500MHz i686, 4 scsi disks (adaptec) under raid5 and raid0
with 2 intelpro's and 1 IDE disk (and xfs and lvm).

Right now I'm on 2.4.9 and it's NOT happening. Doing nothing different
to any other day.

> When the box keeps on running for about a month, the memory usage gets
> so high that it turns into a swap-crazy, low-memory and slow server ;-/
> (it does free up cache memory, and swaps stuff out, however the 'leaked'
> memory only grows and is never re-claimed).

Same.

> The box runs dhcpd, bind, fetchmail (cron), pppd (to adsl modem), smb,
> nfs, xinetd (imapd mostly) and sshd.

Only thing in common with me is nfs. Running X 4.1. glibc 2.1.

> based routing) for my cable modem & adsl modem. Also it has a 310Gb raid
> 0 array on 4 IDE disks.

Could be.

> The hardware on the box is : Asus p2b-ds, 2x p3-600, 1Gb (ECC) ram, 3

My mobo is whatever came from dell, and you also are running 2xP3. My
ram is also ECC but there's only 128MB of it.

> network cards (1x Intel EtherExpressPro, 2x 3c905 tx), Internal adaptect

I have 2 network cards, both EEPRO.

> 29xx u2w scsi, internal intel IDE, 2x Seagate Cheetah (u2w) 18 Gb disks

Yep, I have internal adaptec too. Aic7xxx running ultra 160 at 20MHz
on terminated cable.

Adaptec AIC7xxx driver version: 6.2.1
aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs

4 WD disks:

Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
Type: Direct-Access ANSI SCSI revision: 02

> (/ and /var), 4x 80 Gb Maxtor IDE disks (raid 0 array) and a NVidia TNT2
> card. This hardware

Umm .. I think I run ati rage, external card, though there is one on
the mobo.

(--) PCI:*(0:16:0) ATI Mach64 GU rev 154, Mem @ 0xf5000000/24,
0xfe201000/12, I/O @ 0xd400/8
(--) PCI: (1:0:0) ATI Mach64 GW rev 122, Mem @ 0xfc000000/24,
0xfbfff000/12, I/O @ 0xec00/8

> The kernel is compiled with all network- and scsi card and raid0 drivers
> build in, and nfs + iptables as modules. The machine currently uses ext3

I have it all compiled OUT. Including iptables, which I don't use.

> (also build in), however this problem was also present before i
> converted the raid0 volume to ext3, so i do not suspect it to cause this

I am using xfs on top of lvm on top of raid5.

> problem. The kernel is also set for HIGHMEM (4gb) to use the last Mb's
> of the 1Gb of ram (else 127Mb isnt detected).

Mine isn't. Normal setup.

> I do not know which component (iptables / route hack / raid0 / network
> cards / highmem) cause this problem. I run several of these components

Looks from this as though it might be raid5 or 0 + adaptec scsi + SMP.

Peter

2001-11-25 15:30:35

by Chris Chabot

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

The kernel i ran for about a month was kernel 2.4.11.

Ofcource i am aware that the memory usage grows as more memory is used
for buffers/cache. (specialy since its also a large file server).

However if you check my 'free' output, and the ps aux output you will
notice that the 430Mb used is with the cache and buffer usage already
subtracted from the 'total usage' (else usage is just below 1 gig).

Of 430Mb, (counting ps aux res values), just below 80 Mb is used by the
applications. the rest is just 'missing'.

So the current memory division is about (sources: application = added ps
aux output, buffer/cache/free = 'free' command, sysv shm from 'ipcs')

Applications: 80Mb
Buffers: 127Mb
Cache: 460Mb
Sysv shm: 0
Free: 9.5Mb

memory total 1Gb

Unaccounted +/- 360Mb

ps, yes i did check /dev/shm, and 'ipcs' and no memory is used as sysv
shared memory


-- Chris


On Sun, 2001-11-25 at 16:03, Florian Weimer wrote:
> Chris Chabot <[email protected]> writes:
>
> > When the box keeps on running for about a month,
>
> Which kernels have you run for about a month, and which ones showed
> this extreme behavior? Obviously not 2.4.15...
>
> The amount of available memory decreasing is quite normal, due to the
> growing cache.
>
> --
> Florian Weimer [email protected]
> University of Stuttgart http://cert.uni-stuttgart.de/
> RUS-CERT +49-711-685-5973/fax +49-711-685-5898
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2001-11-25 15:39:58

by Chris Chabot

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

Hi, It's almost nice to hear i am not the only person with this problem
;-)


After reading your email, the common factors seem to be:

- Dual P3 Setup (diff mobo, same intel chipset?)
- Software Raid
- Mixed IDE / SCSI
- Internal Adaptec AHA-29xx
(i have the 80mb, you the 160mb version)
- Multiple network cards
- Intel EtherExpress Pro 10/100
- ECC checking ram


Since you don't use weird routes, nor iptables, i think it's posible to
assume these do not cause the problem. Also since you do not have the
problem under 2.4.9 (ditto for me if i remeber correctly), it is safe to
assume the bug was introduced in kernel 2.4.10 or up.



ps, the reason why i imidiatly switched to 2.4.11 (and up) and not have
a lot of experiance with 2.4.9 is because my dell servers with adaptect
hardware raid are a LOT (> 100%) faster under those newer kernels.
Somehow a block layer change in that kernel speeded up the dells a lot..
and since i love consitent kernel versions accross all my machines, i
upgraded my own boxes as well.

-- Chris


On Sun, 2001-11-25 at 16:27, Peter T. Breuer wrote:
> "A month of sundays ago Chris Chabot wrote:"
> > The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
> > 2.4.9 upto 2.4.15, in all situations the same problem appeared.
> >
> > The problem is that when the box boots up, it uses about 60Mb of memory.
> > However after only 1 1/2 days, the memory usage is already around 430Mb
> > (!!). (this is ofcource used - buffers - cache, as displayed by 'free').
>
> I also have this problem. Unknown circumstances provoke it. Kernel
> 2.4.9 to 2.4.13. When it occurs I lose about 30MB a day.
>
> Dual 500MHz i686, 4 scsi disks (adaptec) under raid5 and raid0
> with 2 intelpro's and 1 IDE disk (and xfs and lvm).
>
> Right now I'm on 2.4.9 and it's NOT happening. Doing nothing different
> to any other day.
>
> > When the box keeps on running for about a month, the memory usage gets
> > so high that it turns into a swap-crazy, low-memory and slow server ;-/
> > (it does free up cache memory, and swaps stuff out, however the 'leaked'
> > memory only grows and is never re-claimed).
>
> Same.
>
> > The box runs dhcpd, bind, fetchmail (cron), pppd (to adsl modem), smb,
> > nfs, xinetd (imapd mostly) and sshd.
>
> Only thing in common with me is nfs. Running X 4.1. glibc 2.1.
>
> > based routing) for my cable modem & adsl modem. Also it has a 310Gb raid
> > 0 array on 4 IDE disks.
>
> Could be.
>
> > The hardware on the box is : Asus p2b-ds, 2x p3-600, 1Gb (ECC) ram, 3
>
> My mobo is whatever came from dell, and you also are running 2xP3. My
> ram is also ECC but there's only 128MB of it.
>
> > network cards (1x Intel EtherExpressPro, 2x 3c905 tx), Internal adaptect
>
> I have 2 network cards, both EEPRO.
>
> > 29xx u2w scsi, internal intel IDE, 2x Seagate Cheetah (u2w) 18 Gb disks
>
> Yep, I have internal adaptec too. Aic7xxx running ultra 160 at 20MHz
> on terminated cable.
>
> Adaptec AIC7xxx driver version: 6.2.1
> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs
>
> 4 WD disks:
>
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
> Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 01 Lun: 00
> Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 02 Lun: 00
> Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
> Type: Direct-Access ANSI SCSI revision: 02
> Host: scsi0 Channel: 00 Id: 03 Lun: 00
> Vendor: WDIGTL Model: WDE9100 ULTRA2 Rev: 1.21
> Type: Direct-Access ANSI SCSI revision: 02
>
> > (/ and /var), 4x 80 Gb Maxtor IDE disks (raid 0 array) and a NVidia TNT2
> > card. This hardware
>
> Umm .. I think I run ati rage, external card, though there is one on
> the mobo.
>
> (--) PCI:*(0:16:0) ATI Mach64 GU rev 154, Mem @ 0xf5000000/24,
> 0xfe201000/12, I/O @ 0xd400/8
> (--) PCI: (1:0:0) ATI Mach64 GW rev 122, Mem @ 0xfc000000/24,
> 0xfbfff000/12, I/O @ 0xec00/8
>
> > The kernel is compiled with all network- and scsi card and raid0 drivers
> > build in, and nfs + iptables as modules. The machine currently uses ext3
>
> I have it all compiled OUT. Including iptables, which I don't use.
>
> > (also build in), however this problem was also present before i
> > converted the raid0 volume to ext3, so i do not suspect it to cause this
>
> I am using xfs on top of lvm on top of raid5.
>
> > problem. The kernel is also set for HIGHMEM (4gb) to use the last Mb's
> > of the 1Gb of ram (else 127Mb isnt detected).
>
> Mine isn't. Normal setup.
>
> > I do not know which component (iptables / route hack / raid0 / network
> > cards / highmem) cause this problem. I run several of these components
>
> Looks from this as though it might be raid5 or 0 + adaptec scsi + SMP.
>
> Peter


2001-11-25 15:42:18

by Phil Sorber

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

On Sun, 2001-11-25 at 10:30, Chris Chabot wrote:
> The kernel i ran for about a month was kernel 2.4.11.
>

wasn't kernel 2.4.11 labeled "dontuse"?

that had a serious bug in it.

>
--
Phil Sorber
AIM: PSUdaemon
IRC: irc.openprojects.net #psulug PSUdaemon
GnuPG: keyserver - pgp.mit.edu


Attachments:
(No filename) (232.00 B)

2001-11-25 15:51:08

by Chris Chabot

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

Hi Phil, I think you are right. When i look @ /boot (good way of seeing,
which & when) it tells me that my upgrade schedule was:

Aug 22 2001 bzImage-2.4.9
Sep 27 07:17 bzImage-2.4.10
Oct 12 04:27 bzImage-2.4.11
Oct 12 16:58 bzImage-2.4.12
Nov 8 17:26 bzImage-2.4.13
Nov 10 10:41 bzImage-2.4.14
Nov 24 10:46 bzImage-2.4.15

So apearantly 2.4.11 was dont use (2.4.12 followed later the same day in
my upgrade cycle). Then for almost a month no upgrades while running
2.4.12, and from there on folowing the kernel upgrade cycle again.

-- Chris


On Sun, 2001-11-25 at 16:41, Phil Sorber wrote:
> On Sun, 2001-11-25 at 10:30, Chris Chabot wrote:
> > The kernel i ran for about a month was kernel 2.4.11.
> >
>
> wasn't kernel 2.4.11 labeled "dontuse"?
>
> that had a serious bug in it.
>
> >
> --
> Phil Sorber
> AIM: PSUdaemon
> IRC: irc.openprojects.net #psulug PSUdaemon
> GnuPG: keyserver - pgp.mit.edu


2001-11-25 15:49:50

by FD Cami

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

Phil Sorber wrote:

> On Sun, 2001-11-25 at 10:30, Chris Chabot wrote:
>
>>The kernel i ran for about a month was kernel 2.4.11.
>>
>>
>
> wasn't kernel 2.4.11 labeled "dontuse"?
>
> that had a serious bug in it.
>
>

I'm wondering why 2.4.15-greased-turkey isn't labelled
2.4.15-bad-turkey, or in other words 2.4.15-dontuse

Fran?ois

2001-11-25 15:59:19

by Mr. Shannon Aldinger

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25 Nov 2001, Chris Chabot wrote:

> Of 430Mb, (counting ps aux res values), just below 80 Mb is used by the
> applications. the rest is just 'missing'.
>

Are you using tmpfs, that had problems in the earlier 2.4.x's IIRC.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iEYEARECAAYFAjwBFHwACgkQwtU6L/A4vVDmzgCeITZ6/njcztWClfPfthOGTnfE
io8An0l2BPZIyJGhhXijFfYoTl/OsTyL
=Q2bB
-----END PGP SIGNATURE-----


2001-11-25 16:15:23

by Florian Weimer

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

"Mr. Shannon Aldinger" <[email protected]> writes:

> Are you using tmpfs, that had problems in the earlier 2.4.x's IIRC.

I've seen tmpfs problems with 2.4.13+xfs, BTW: As soon as /tmp grows
so large that something has to be swapped out, the machine essentially
locks. Known problem?

(I'm going to debug this some day and provide more details, but
currently, I'm busy setting up a new machine, and this one can't be
used for such testing any longer.)

--
Florian Weimer [email protected]
University of Stuttgart http://cert.uni-stuttgart.de/
RUS-CERT +49-711-685-5973/fax +49-711-685-5898

2001-11-25 17:04:07

by Andi Kleen

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

"Peter T. Breuer" <[email protected]> writes:

> "A month of sundays ago Chris Chabot wrote:"
> > The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
> > 2.4.9 upto 2.4.15, in all situations the same problem appeared.
> >
> > The problem is that when the box boots up, it uses about 60Mb of memory.
> > However after only 1 1/2 days, the memory usage is already around 430Mb
> > (!!). (this is ofcource used - buffers - cache, as displayed by 'free').
>
> I also have this problem. Unknown circumstances provoke it. Kernel
> 2.4.9 to 2.4.13. When it occurs I lose about 30MB a day.

Compare snapshots of /proc/slabinfo before and after.

It may be completely harmless; e.g. a slab cache. free is unfortunately
quite misleading with newer kernels; it doesn't give information about
many important caches (e.g. not about the slab caches)

-Andi


2001-11-25 17:20:02

by Phil Sorber

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

On Sun, 2001-11-25 at 10:51, Chris Chabot wrote:
> Nov 24 10:46 bzImage-2.4.15

are you running this now? cause it has a major bug too :) i am running
it, but i patched. just a heads up if you didn't see this on the list
already...

--
Phil Sorber
AIM: PSUdaemon
IRC: irc.openprojects.net #psulug PSUdaemon
GnuPG: keyserver - pgp.mit.edu


Attachments:
(No filename) (232.00 B)

2001-11-25 18:47:19

by Chris Chabot

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

Thanks for the headsup, yea i saw the news on ... well just about every
linux news site, and add a few ;-) Figured i could wait for 2.4.16 since
i hope it will be released before i have to unmount anything ;-)

-- Chris

On Sun, 2001-11-25 at 18:17, Phil Sorber wrote:
> On Sun, 2001-11-25 at 10:51, Chris Chabot wrote:
> > Nov 24 10:46 bzImage-2.4.15
>
> are you running this now? cause it has a major bug too :) i am running
> it, but i patched. just a heads up if you didn't see this on the list
> already...
>
> --
> Phil Sorber
> AIM: PSUdaemon
> IRC: irc.openprojects.net #psulug PSUdaemon
> GnuPG: keyserver - pgp.mit.edu


2001-11-26 05:50:33

by Mike Galbraith

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

On 25 Nov 2001, Chris Chabot wrote:

> Hi, I have a firewall / file server box which is displaying (severe)
> memory leakage, presumably by the kernel.
>
> The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
> 2.4.9 upto 2.4.15, in all situations the same problem appeared.

With 2.4.9 as well? I have an IKD patch for 2.4.7 which I could
update to 2.4.9 fairly quickly if you'd like to try memleak on the
thing. It might even go in fairly cleanly as is.

-Mike

2001-11-26 09:50:56

by Chris Chabot

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

After i recieved an email from Peter T. who had the same problem, but
_not_ under 2.4.9 i re-checked, and indeed, the problems dont appear in
kernel versions =< 2.4.9. So in either 2.4.10 or 2.4.11 the memory
leakage was 'introduced'.

My current preminition is that it could be the software raid layer thats
causing the leakage, but it also could be a combination of factors. (see
prev email to Peter T / lkml about the common factors in the 2
situations). On the other hand, i'm willing to try anything ;-)

-- Chris

On Mon, 2001-11-26 at 07:49, Mike Galbraith wrote:
> On 25 Nov 2001, Chris Chabot wrote:
>
> > Hi, I have a firewall / file server box which is displaying (severe)
> > memory leakage, presumably by the kernel.
> >
> > The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
> > 2.4.9 upto 2.4.15, in all situations the same problem appeared.
>
> With 2.4.9 as well? I have an IKD patch for 2.4.7 which I could
> update to 2.4.9 fairly quickly if you'd like to try memleak on the
> thing. It might even go in fairly cleanly as is.
>
> -Mike


2001-11-26 15:16:32

by Christoph Rohland

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

Hi Shannon,

On Sun, 25 Nov 2001, Shannon Aldinger wrote:
> Are you using tmpfs, that had problems in the earlier 2.4.x's IIRC.

tmpfs always shows up as cached (stock) or shared (in -ac)

Greetings
Christoph


2001-11-26 20:22:41

by Bill Davidsen

[permalink] [raw]
Subject: Re: Severe Linux 2.4 kernel memory leakage

On Sun, 25 Nov 2001, Andi Kleen wrote:

> "Peter T. Breuer" <[email protected]> writes:
>
> > "A month of sundays ago Chris Chabot wrote:"
> > > The box has ran Redhat 7.1 and 7.2, with plain vanilla linux kernels
> > > 2.4.9 upto 2.4.15, in all situations the same problem appeared.
> > >
> > > The problem is that when the box boots up, it uses about 60Mb of memory.
> > > However after only 1 1/2 days, the memory usage is already around 430Mb
> > > (!!). (this is ofcource used - buffers - cache, as displayed by 'free').
> >
> > I also have this problem. Unknown circumstances provoke it. Kernel
> > 2.4.9 to 2.4.13. When it occurs I lose about 30MB a day.
>
> Compare snapshots of /proc/slabinfo before and after.

This may be useful, but I've never seen anything like that magnitude of
usage, either on dns servers (some of mine are up ~150 days), or usenet
servers (several about to hit the 497 day problem). It will be
insteresting to see what's reported, though.

> It may be completely harmless; e.g. a slab cache. free is unfortunately
> quite misleading with newer kernels; it doesn't give information about
> many important caches (e.g. not about the slab caches)

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.