2013-03-27 15:34:29

by Fabio Coatti

[permalink] [raw]
Subject: 3.7.10 kernel crash

Hi all,
we are experiencing crashes on some servers, right now running 3.7.10;
I've been able to get only screenshots from dying server that I
attached below. Probably we can exclude hardware issues, as it
happened on two different servers.
attached you can find config.gz and the screenshot; the machine is a
dual AMD Opteron(TM) Processor 6272, with Ethernet controller:
Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
(I'm adding info on eth because the first line of dumps reads "tg3" :)
)
Of course I'm available for further information, just cc: me because
I'm not subscribed.

Many thanks for any answer.



--
Fabio


Attachments:
crash.jpg (111.03 kB)
config.gz (15.70 kB)
Download all attachments

2013-03-28 08:31:48

by Fabio Coatti

[permalink] [raw]
Subject: Re: 3.7.10 kernel crash

2013/3/27 Fabio Coatti <[email protected]>:
> Hi all,
> we are experiencing crashes on some servers, right now running 3.7.10;
> I've been able to get only screenshots from dying server that I
> attached below. Probably we can exclude hardware issues, as it
> happened on two different servers.

Further information: those crashes seems to happen only when the
machine is heavily loaded (process, network and so on). We have seen
this pattern several times.



--
Fabio

2013-03-28 10:33:27

by Fabio Coatti

[permalink] [raw]
Subject: Re: 3.7.10 kernel crash

Well, according to kernel source the driver is 3.125 September 26, 2012
(drivers/net/ethernet/broadcom/tg3.c) while the latest source downloadable
from broadcom site is 3.124c Aug 14, 2012 so I guess that the driver in
vanilla kernel is the latest available.


> In data giovedì 28 marzo 2013 11:09:50, 王金浦 ha scritto:
>
> Hi Fabio,
>
>
> Have you try latest tg3 driver from Broadcom, the backtrace show the tg3 may
the cause of the panic?
>
>
> Jack
>
>
> 2013/3/27 Fabio Coatti <[email protected]>:
>
> > Hi all,
> > we are experiencing crashes on some servers, right now running 3.7.10;
> > I've been able to get only screenshots from dying server that I
> > attached below. Probably we can exclude hardware issues, as it
> > happened on two different servers.
>
>
>
> --
> Fabio

2013-03-28 12:35:54

by Peter Hurley

[permalink] [raw]
Subject: Re: 3.7.10 kernel crash


[ +cc Matt Carlson, Michael Chan, netdev because this is a tg3-related oops]

On Thu, 2013-03-28 at 09:31 +0100, Fabio Coatti wrote:
> 2013/3/27 Fabio Coatti <[email protected]>:
> > Hi all,
> > we are experiencing crashes on some servers, right now running 3.7.10;
> > I've been able to get only screenshots from dying server that I
> > attached below. Probably we can exclude hardware issues, as it
> > happened on two different servers.
>
> Further information: those crashes seems to happen only when the
> machine is heavily loaded (process, network and so on). We have seen
> this pattern several times.

I would recommend capturing the entire oops text (it will likely be
necessary anyway for someone to properly identify and fix the cause).

If the machine has a 2nd network port, then use netconsole on that
interface. If not, set up a serial console or try to get 50-line VGA
working.

Regards,
Peter Hurley

2013-04-02 09:24:07

by Fabio Coatti

[permalink] [raw]
Subject: Re: 3.7.10 kernel crash

In data gioved? 28 marzo 2013 08:35:47, Peter Hurley ha scritto:
> [ +cc Matt Carlson, Michael Chan, netdev because this is a tg3-related oops]
> On Thu, 2013-03-28 at 09:31 +0100, Fabio Coatti wrote:
> > 2013/3/27 Fabio Coatti <[email protected]>:
> > > Hi all,
> > > we are experiencing crashes on some servers, right now running 3.7.10;
> > > I've been able to get only screenshots from dying server that I
> > > attached below. Probably we can exclude hardware issues, as it
> > > happened on two different servers.
> >
> > Further information: those crashes seems to happen only when the
> > machine is heavily loaded (process, network and so on). We have seen
> > this pattern several times.
>
> I would recommend capturing the entire oops text (it will likely be
> necessary anyway for someone to properly identify and fix the cause).
>
> If the machine has a 2nd network port, then use netconsole on that
> interface. If not, set up a serial console or try to get 50-line VGA


Ok, I'll try to get better oopes. However, this is going to be tricky, as the
machine is remotely administered (via HP iLO) and uses all network interfaces
(BTW, I'm not even sure to be able to get a network driver related crash using
netconsole approach). So far, no success in using a different resolution for
boot console.
Anyway, I'll try to find a way to capture all the messages.

Many thanks for the answer!



--
Fabio

2013-04-03 11:49:09

by Peter Hurley

[permalink] [raw]
Subject: Re: 3.7.10 kernel crash

On Tue, 2013-04-02 at 11:16 +0200, Fabio Coatti wrote:
> Ok, I'll try to get better oopes. However, this is going to be tricky, as the
> machine is remotely administered (via HP iLO) and uses all network interfaces
> (BTW, I'm not even sure to be able to get a network driver related crash using
> netconsole approach)

I use a separate network adapter specifically dedicated for netconsole
(but then I'm trying to crash the machine :)

> So far, no success in using a different resolution for
> boot console.

What distribution is this and what bootloader are you using?