2001-07-30 03:27:44

by William M. Shubert

[permalink] [raw]
Subject: Leak in network memory?

Hi. I have an application that does a lot of nonblocking networking I/O
and is fairly sensitive to how much data can be held in the output
buffers of sockets. All sockets are set to have 64KB (the default) of
output buffering. This application had been running well with very long
uptimes for over a year in the 2.2 kernels.

A couple months ago I upgraded my server to RH 7.1 (with the 2.4.2-2 red
hat kernel). At first it ran fine, but now after an uptime of 67 days
I'm starting to see strange problems. It seems as if only a very small
amount of memory can be held in the output buffer of each socket, even
though they are still set to 64KB! There isn't a tremendous amount of
network traffic going on (about 30-100 sockets open at a time, but
rather low total bandwidth). The fact that each write to a socket only
writes a few (<8) kbytes is really messing with my performance. I did
not see this problem until the past week. I tried to trace through the
kernel code to see why the kernel would be refusing to give me the
buffering that I ask for, and it looks like if the network code thinks
that it is using too much memory, then it will behave this way. I'm not
100% sure of this, though...which is why I'm posting this message.

Does anybody have any hints on how I can track down exactly why my
output buffers aren't working? I see lots of /proc info related to
network parameters, but there is little documentation on them. Is there
a known bug like this in the RH 2.4.2-2 kernel? Would a newer kernel
help me? (I know, I could just try upgrading and waiting another 60
days, but 24x7 reliability is very important to my users so I'd rather
not reboot unless I know that it will help). I searched the archives of
this mailing list, and found a few interesting references network memory
consumption in the changelog of the Alan Cox series, but nothing that
explicitly described a problem like this. Thanks to anybody who can help
me out here.
--

Bill Shubert ([email protected]) <mailto:[email protected]>
http://www.igoweb.org/~wms/ <http://igoweb.org/%7Ewms/>



2001-07-30 14:40:16

by Matthew G. Marsh

[permalink] [raw]
Subject: Re: Leak in network memory?

On Sun, 29 Jul 2001, William M. Shubert wrote:

> Hi. I have an application that does a lot of nonblocking networking I/O
> and is fairly sensitive to how much data can be held in the output
> buffers of sockets. All sockets are set to have 64KB (the default) of
> output buffering. This application had been running well with very long
> uptimes for over a year in the 2.2 kernels.

Yes. Same here only using an application that receives data over the
network.

> A couple months ago I upgraded my server to RH 7.1 (with the 2.4.2-2 red
> hat kernel). At first it ran fine, but now after an uptime of 67 days
> I'm starting to see strange problems. It seems as if only a very small
> amount of memory can be held in the output buffer of each socket, even
> though they are still set to 64KB! There isn't a tremendous amount of
> network traffic going on (about 30-100 sockets open at a time, but
> rather low total bandwidth). The fact that each write to a socket only
> writes a few (<8) kbytes is really messing with my performance. I did
> not see this problem until the past week. I tried to trace through the
> kernel code to see why the kernel would be refusing to give me the
> buffering that I ask for, and it looks like if the network code thinks
> that it is using too much memory, then it will behave this way. I'm not
> 100% sure of this, though...which is why I'm posting this message.

Worse here - the app keeps adding memory and the size of the memory is
almost exactly equal to the amount of data transferred in (plus a few
bytes of overhead). This memory is permanently cached and never released.
We have an open case with RH ....

> Does anybody have any hints on how I can track down exactly why my
> output buffers aren't working? I see lots of /proc info related to
> network parameters, but there is little documentation on them. Is there
> a known bug like this in the RH 2.4.2-2 kernel? Would a newer kernel
> help me? (I know, I could just try upgrading and waiting another 60
> days, but 24x7 reliability is very important to my users so I'd rather
> not reboot unless I know that it will help). I searched the archives of
> this mailing list, and found a few interesting references network memory
> consumption in the changelog of the Alan Cox series, but nothing that
> explicitly described a problem like this. Thanks to anybody who can help
> me out here.

We were using the 2.4.5 kernel and were told to go back to the original
kernel and it got worse. ?? When I find out more - looks like a memory
leak in the glibc right now but... - I will let you know.

> Bill Shubert ([email protected]) <mailto:[email protected]>
> http://www.igoweb.org/~wms/ <http://igoweb.org/%7Ewms/>

--------------------------------------------------
Matthew G. Marsh, President
Paktronix Systems LLC
1506 North 59th Street
Omaha NE 68104
Phone: (402) 932-7250 x101
Email: [email protected]
WWW: http://www.paktronix.com
--------------------------------------------------

2001-07-31 01:40:34

by William M. Shubert

[permalink] [raw]
Subject: Re: Leak in network memory?

Thanks for your response. I think that we have different problems though
- my application is not growing at all, so it doesn't seem to be a glibc
problem. Instead the kernel is refusing my "write()" calls with EAGAIN
even though I know that I have written only a few kbytes and my output
buffer size is set to 64K. Because this took 60+ days to start
happening, I'm guessing that the kernel network code is either leaking
memory or else is miscounting its memory consumed over time...what I
really need to know is what I can do to confirm or refute this guess. It
is also possible of course that there is no leak but my kernel has the
"network takes too much memory" threshold set too low (and I was just
lucky until now and didn't see the problem). I have looked at
"/proc/sys/net/ipv4/tcp_mem", and it claims that it would take 48640
pages (=200MB) before the TCP stack starts feeling memory pressure. I
know that my TCP stack is not using this much memory, because I have
only 256MB in the system and 150MB is under "active" in /proc/meminfo
(I'm assuming this total does not include TCP data?)...so how can I
check how much the TCP stack thinks it is currently using, and why it is
refusing my "write()" calls?

Matthew G. Marsh wrote:

>On Sun, 29 Jul 2001, William M. Shubert wrote:
>
>>...At first it ran fine, but now after an uptime of 67 days
>>I'm starting to see strange problems. It seems as if only a very small
>>amount of memory can be held in the output buffer of each socket, even
>>though they are still set to 64KB!
>>...
>>I tried to trace through the
>>kernel code to see why the kernel would be refusing to give me the
>>buffering that I ask for, and it looks like if the network code thinks
>>that it is using too much memory, then it will behave this way. I'm not
>>100% sure of this, though...which is why I'm posting this message.
>>
>Worse here - the app keeps adding memory and the size of the memory is
>almost exactly equal to the amount of data transferred in (plus a few
>bytes of overhead). This memory is permanently cached and never released.
>We have an open case with RH ....
>
>>Does anybody have any hints on how I can track down exactly why my
>>output buffers aren't working? I see lots of /proc info related to
>>network parameters, but there is little documentation on them.
>>
>We were using the 2.4.5 kernel and were told to go back to the original
>kernel and it got worse. ?? When I find out more - looks like a memory
>leak in the glibc right now but... - I will let you know.
>
--

Bill Shubert ([email protected]) <mailto:[email protected]>
http://www.igoweb.org/~wms/ <http://igoweb.org/%7Ewms/>