2003-11-22 15:27:53

by anand

[permalink] [raw]
Subject: 2.6.0-test9 : bridge freezes

Hi,

I am one of the system administrators who manage a campus network of 5000 users
that is connected to Internet. We have placed a Linux bridge to isolate the
Internet from the campus. To nullify network flooding effect, we have used
iptables. The kernel is 2.6.0-test9, the ethernet cards that are used are
RTL8139.

The problem is : After 3 to 4 hours of functioning, the bridge stops working
and the machine becomes unusable where it doesn't respond to keyboard, and
there is no video display. In simple terms it freezes. Before going in for
2.6.0-test9 I have tried 2.4.20 with bridge patches for iptables support. It
worked reliably except that I cannot even login from the console because
I don't get the shell prompt after a while.

Presently I have gone back to 2.4.20 for the sake of robustness. Can someone
let me know what I can do to use 2.6.x kernel with a good amount of confidence
so that I can keep the campus users happy ? I am making guess work as
to whether the problem is with the network drivers, or some power management
issues, and so on.

Any inputs from you will be really useful. I am eager to try out any amount
of debugging, the thing is I don't know what to look for.


Anand


2003-11-22 16:19:28

by Gene Heskett

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes

On Saturday 22 November 2003 10:27, SVR Anand wrote:
>Hi,
>
>I am one of the system administrators who manage a campus network of
> 5000 users that is connected to Internet. We have placed a Linux
> bridge to isolate the Internet from the campus. To nullify network
> flooding effect, we have used iptables. The kernel is 2.6.0-test9,
> the ethernet cards that are used are RTL8139.
>
>The problem is : After 3 to 4 hours of functioning, the bridge stops
> working and the machine becomes unusable where it doesn't respond
> to keyboard, and there is no video display. In simple terms it
> freezes. Before going in for 2.6.0-test9 I have tried 2.4.20 with
> bridge patches for iptables support. It worked reliably except that
> I cannot even login from the console because I don't get the shell
> prompt after a while.
>
>Presently I have gone back to 2.4.20 for the sake of robustness. Can
> someone let me know what I can do to use 2.6.x kernel with a good
> amount of confidence so that I can keep the campus users happy ? I
> am making guess work as to whether the problem is with the network
> drivers, or some power management issues, and so on.
>
>Any inputs from you will be really useful. I am eager to try out any
> amount of debugging, the thing is I don't know what to look for.

Neither do I, but I can report that iptables is apparently stable,
witness this from my firewall machine:
---------------------
[root@gene root]# uname -a
Linux gene.coyote.den 2.4.21-rc1-ck6 #9 Mon May 5 23:31:30 EDT 2003
i586 unknown
[root@gene root]# uptime
11:14am up 35 days, 3:19, 2 users, load average: 1.00, 1.00, 1.00
---------------------
Now admittedly it doesn't have 5000 users, just one. But everytime
I'd had problems such as you are describing at the tv station the
where user count is about 65, its hardware related. I'd start by
letting memtest86 run on that box for a couple of days, maybe it will
find some flakey memory.

--
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz 512M
99.27% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.

2003-11-22 16:20:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes


On Sat, 22 Nov 2003, SVR Anand wrote:
>
> The problem is : After 3 to 4 hours of functioning, the bridge stops working
> and the machine becomes unusable where it doesn't respond to keyboard, and
> there is no video display.

Sounds like a memory leak somewhere. It would probably be interesting to
watch /proc/slabinfo every five minutes or so, and see what happens..

Linus

2003-11-22 19:19:07

by Markus Hästbacka

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes

Hi!
I had this problem on my router too, the computer freezed somewhere
after 3-4 hours, in my case 2.6.0-test4 worked, but the test8 got lockup
(didn't test anything between test4 and test8).

Regards,
Markus

On Sat, 2003-11-22 at 17:27, SVR Anand wrote:
> The problem is : After 3 to 4 hours of functioning, the bridge stops working
> and the machine becomes unusable where it doesn't respond to keyboard, and
> there is no video display.
--
"Software is like sex, it's better when it's free."
Markus H?stbacka <midian at ihme.org>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-11-23 23:26:24

by David Miller

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes

On Sat, 22 Nov 2003 08:20:40 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

> On Sat, 22 Nov 2003, SVR Anand wrote:
> >
> > The problem is : After 3 to 4 hours of functioning, the bridge stops working
> > and the machine becomes unusable where it doesn't respond to keyboard, and
> > there is no video display.
>
> Sounds like a memory leak somewhere. It would probably be interesting to
> watch /proc/slabinfo every five minutes or so, and see what happens..

Also, we've certainly fixed some serious networking bugs since test9
came out.

2003-11-24 00:03:05

by Markus Hästbacka

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes

I wonder how it's possible that test4 worked fine and then something
like this comes up? (I DID report this earlier, but who would care?)

Also, it've been too long since test9, and there's not much people
testing the bk's at all.

There may be a reason for someone not to test the bk's, maybe the
experience with 2.4 bk's, yes, those which wont compile/boot at all.
So I'd suggest to remove 2.4 bk's totaly from kernel.org.

Regards,
Markus

On Mon, 2003-11-24 at 01:26, David S. Miller wrote:
> Also, we've certainly fixed some serious networking bugs since test9
> came out.
--
"Software is like sex, it's better when it's free."
Markus H?stbacka <midian at ihme dot org>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-11-24 19:09:18

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes

On Sat, 22 Nov 2003 20:57:44 +0530 (GMT+05:30)
[email protected] (SVR Anand) wrote:

> Hi,
>
> I am one of the system administrators who manage a campus network of 5000 users
> that is connected to Internet. We have placed a Linux bridge to isolate the
> Internet from the campus. To nullify network flooding effect, we have used
> iptables. The kernel is 2.6.0-test9, the ethernet cards that are used are
> RTL8139.
>
> The problem is : After 3 to 4 hours of functioning, the bridge stops working
> and the machine becomes unusable where it doesn't respond to keyboard, and
> there is no video display. In simple terms it freezes. Before going in for
> 2.6.0-test9 I have tried 2.4.20 with bridge patches for iptables support. It
> worked reliably except that I cannot even login from the console because
> I don't get the shell prompt after a while.
>
> Presently I have gone back to 2.4.20 for the sake of robustness. Can someone
> let me know what I can do to use 2.6.x kernel with a good amount of confidence
> so that I can keep the campus users happy ? I am making guess work as
> to whether the problem is with the network drivers, or some power management
> issues, and so on.
>
> Any inputs from you will be really useful. I am eager to try out any amount
> of debugging, the thing is I don't know what to look for.
>
>
> Anand

Linus is right, this is probably a memory leak issue. There are several areas
that could be the problem:
- core networking
- iptables
- iptables filter
- ethernet bridging
- ethernet driver (rtl8169)

To find/fix the problem, we need to narrow down the scope.
Things that would help are, what are the iptables rules you are using?
Are there any errors showing up on the ethernet devices?
Also what does the bridge forwarding table look like? are there lots of entries, are
you running spanning tree?



2003-11-25 06:40:21

by anand

[permalink] [raw]
Subject: Re: 2.6.0-test9 : bridge freezes

Hi,

To begin with, thanks a lot for the concern you all have shown to address my
problem.

This morning I have put in test9-bk25 image to see if the problem disappears.
The result should be out in the next few hours. I hope it is OK if I send you
the slabinfo in case the problem persists.

I plan to test in stages.

i) Just bridging, no iptables
ii) With iptables.

I have very limited set of iptables rules. In fact it is as simple as blocking
icmp. There are no errors reported by ethernet devices.

Anand

PS : The latest test10 stops at the booting stage while initialising my aic7xxx
scsi. So, I had to use bk25.
>
> Linus is right, this is probably a memory leak issue. There are several areas
> that could be the problem:
> - core networking
> - iptables
> - iptables filter
> - ethernet bridging
> - ethernet driver (rtl8169)
>
> To find/fix the problem, we need to narrow down the scope.
> Things that would help are, what are the iptables rules you are using?
> Are there any errors showing up on the ethernet devices?
> Also what does the bridge forwarding table look like? are there lots of entries, are
> you running spanning tree?
>
>
>

2003-11-25 17:21:45

by anand

[permalink] [raw]
Subject: Re: 2.6.0-test9-bk25 : bridge works fine

Hi,

With test9-bk25, I am not facing any problem for the past many hours which
was not to be the case with test9. I am hopeful that it will work for ever.

Thanks a lot for all the help. Next time I should make it a point to try on
the latest of the latest before shooting off a mail :)

Anand
>
> On Sat, 22 Nov 2003 08:20:40 -0800 (PST)
> Linus Torvalds <[email protected]> wrote:
>
> > On Sat, 22 Nov 2003, SVR Anand wrote:
> > >
> > > The problem is : After 3 to 4 hours of functioning, the bridge stops working
> > > and the machine becomes unusable where it doesn't respond to keyboard, and
> > > there is no video display.
> >
> > Sounds like a memory leak somewhere. It would probably be interesting to
> > watch /proc/slabinfo every five minutes or so, and see what happens..
>
> Also, we've certainly fixed some serious networking bugs since test9
> came out.
>