2004-03-09 17:09:21

by Felix von Leitner

[permalink] [raw]
Subject: tg3 error

A machine at a customer's site (running kernel 2.4.21) has stopped
answering over Ethernet today. The machine itself was still there and
the customer could log in at the console. A reboot fixed the problem.

The machine has had these error messages in the syslog about once per
hour for about 24 hours:

Mar 9 16:17:38 mail2 kernel: tg3: eth0: transmit timed out, resetting
Mar 9 16:17:38 mail2 kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2

The hardware is on-board on an IBM eServer pizza box, a dual broadcom
gigabit ethernet NIC. Any ideas what might have caused this and what we
should do to prevent this from happening again?

Thanks,

Felix


2004-03-09 17:14:07

by Jeff Garzik

[permalink] [raw]
Subject: Re: tg3 error

Felix von Leitner wrote:
> A machine at a customer's site (running kernel 2.4.21) has stopped
> answering over Ethernet today. The machine itself was still there and
> the customer could log in at the console. A reboot fixed the problem.
>
> The machine has had these error messages in the syslog about once per
> hour for about 24 hours:
>
> Mar 9 16:17:38 mail2 kernel: tg3: eth0: transmit timed out, resetting
> Mar 9 16:17:38 mail2 kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
> Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
> Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
> Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2


AFAIK this is fixed in the latest upstream tg3...

Jeff



2004-04-03 09:13:29

by Kamil Srot

[permalink] [raw]
Subject: Re: tg3 error

> Felix von Leitner wrote:
> > A machine at a customer's site (running kernel 2.4.21) has stopped
> > answering over Ethernet today. The machine itself was still there and
> > the customer could log in at the console. A reboot fixed the problem.
> >
> > The machine has had these error messages in the syslog about once per
> > hour for about 24 hours:
> >
> > Mar 9 16:17:38 mail2 kernel: tg3: eth0: transmit timed out, resetting
> > Mar 9 16:17:38 mail2 kernel: tg3: tg3_stop_block timed out, ofs=3400
enable_bit=2
> > Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=2400
enable_bit=2
> > Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=1400
enable_bit=2
> > Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=c00
enable_bit=2
>
>
> AFAIK this is fixed in the latest upstream tg3...

I have exactly the same problems in 2.4.25 - the log says exactly the same
as for Felix.
I'm running two identical HP ProLiant servers but have this problem only on
one of them.
It's happening approximately twice a week.

Any ideas?

Thank you,
--
C.


2004-04-03 19:37:48

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: tg3 error

"Kamil Srot" <[email protected]> writes:

>> Felix von Leitner wrote:
>> > A machine at a customer's site (running kernel 2.4.21) has stopped
>> > answering over Ethernet today. The machine itself was still there and
>> > the customer could log in at the console. A reboot fixed the problem.
>> >
>> > The machine has had these error messages in the syslog about once per
>> > hour for about 24 hours:
>> >
>> > Mar 9 16:17:38 mail2 kernel: tg3: eth0: transmit timed out, resetting
>> > Mar 9 16:17:38 mail2 kernel: tg3: tg3_stop_block timed out, ofs=3400
> enable_bit=2
>> > Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=2400
> enable_bit=2
>> > Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=1400
> enable_bit=2
>> > Mar 9 16:17:39 mail2 kernel: tg3: tg3_stop_block timed out, ofs=c00
> enable_bit=2
>>
>>
>> AFAIK this is fixed in the latest upstream tg3...
>
> I have exactly the same problems in 2.4.25 - the log says exactly the same
> as for Felix.
> I'm running two identical HP ProLiant servers but have this problem only on
> one of them.
> It's happening approximately twice a week.
>
> Any ideas?
>

Just to pipe in; I have a machine (HP 530cmt) running RHEL3 no the
latest kernel, running with the tg3-driver for the onboard eth, and if
requires the i/f to be taken down and up again to start speaking, or
even negotiating an eth-link. Hardware at the other end is a cisco switch.

mvh,
A
--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy

2004-04-04 21:11:40

by walt

[permalink] [raw]
Subject: Re: tg3 error

Alexander Hoogerhuis wrote:

> Just to pipe in; I have a machine (HP 530cmt) running RHEL3 no the
> latest kernel, running with the tg3-driver for the onboard eth, and if
> requires the i/f to be taken down and up again to start speaking...

Dave Miller and I are actively trying to debug this problem now. If
Dave has no objections I could CC you if you want to join in.

2004-04-04 21:19:12

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: tg3 error

walt <[email protected]> writes:

> Alexander Hoogerhuis wrote:
>
>> Just to pipe in; I have a machine (HP 530cmt) running RHEL3 no the
>> latest kernel, running with the tg3-driver for the onboard eth, and if
>> requires the i/f to be taken down and up again to start speaking...
>
> Dave Miller and I are actively trying to debug this problem now. If
> Dave has no objections I could CC you if you want to join in.
>

Sure, if I can help; I could possible give him access to the machine
if that would help, too.

mvh,
A
--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy