2013-07-27 20:38:08

by Samuel Williams

[permalink] [raw]
Subject: [PATCH] sky2: detect and prevent kernel panic from a possible faulty hardware

A possible faulty hardware might interrupt with a status of 0xffffffff which
may kernel panic if sky2 driver tries to handle it. Detecting this problem
may avoid kernel panic.

Signed-off-by: Samuel Williams <[email protected]>
---
drivers/net/ethernet/marvell/sky2.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index e09a8c6..43fe2f3 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -3049,8 +3049,17 @@ static int sky2_poll(struct napi_struct *napi, int work_limit)
int work_done = 0;
u16 idx;

- if (unlikely(status & Y2_IS_ERROR))
+ if (unlikely(status & Y2_IS_ERROR)) {
+ if (status == 0xFFFFFFFF) {
+ dev_err(&hw->pdev->dev,
+ "fatal hardware inturrupt error\n");
+ napi_complete(napi);
+ napi_disable(&hw->napi);
+ return 0;
+ }
sky2_err_intr(hw, status);
+ }
+

if (status & Y2_IS_IRQ_PHY1)
sky2_phy_intr(hw, 0);
--
1.7.10.4


2013-07-30 23:19:08

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] sky2: detect and prevent kernel panic from a possible faulty hardware

From: Samuel Williams <[email protected]>
Date: Sat, 27 Jul 2013 15:35:54 -0500

> A possible faulty hardware might interrupt with a status of 0xffffffff which
> may kernel panic if sky2 driver tries to handle it. Detecting this problem
> may avoid kernel panic.
>
> Signed-off-by: Samuel Williams <[email protected]>

Sure, but then the device is basically just going to stop.

I think this change needs to be more comprehensive, for example it
needs to fully reset the device and try to get it operational again.

2013-07-30 23:32:23

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] sky2: detect and prevent kernel panic from a possible faulty hardware

On Tue, 30 Jul 2013 16:19:06 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Samuel Williams <[email protected]>
> Date: Sat, 27 Jul 2013 15:35:54 -0500
>
> > A possible faulty hardware might interrupt with a status of 0xffffffff which
> > may kernel panic if sky2 driver tries to handle it. Detecting this problem
> > may avoid kernel panic.
> >
> > Signed-off-by: Samuel Williams <[email protected]>
>
> Sure, but then the device is basically just going to stop.
>
> I think this change needs to be more comprehensive, for example it
> needs to fully reset the device and try to get it operational again.

Not only that it is going to assert IRQ until the device is hard
reset. Plus it is not clear that cleaning up after sick hardware
is possible at all.

2013-07-31 08:18:20

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] sky2: detect and prevent kernel panic from a possible faulty hardware

On Tue, 2013-07-30 at 16:19 -0700, David Miller wrote:
> From: Samuel Williams <[email protected]>
> Date: Sat, 27 Jul 2013 15:35:54 -0500
>
> > A possible faulty hardware might interrupt with a status of 0xffffffff which
> > may kernel panic if sky2 driver tries to handle it. Detecting this problem
> > may avoid kernel panic.
> >
> > Signed-off-by: Samuel Williams <[email protected]>
>
> Sure, but then the device is basically just going to stop.
>
> I think this change needs to be more comprehensive, for example it
> needs to fully reset the device and try to get it operational again.

But sky2 is used in some hotremovable cards. You can hit the window
for this race. It is unlikely but still this simple check fixes a bug.

Regards
Oliver