On Wed, Aug 06, 2003 at 02:39:03PM -0300, Marcelo Tosatti wrote:
>
> ---------- Forwarded message ----------
> Date: Sat, 2 Aug 2003 16:09:30 -0500 (CDT)
> From: "Hmamouche, Youssef" <[email protected]>
> To: [email protected]
> Subject: [PROBLEM] xircom CBE2-100(faulty) hangs kernel 2.4.{21, 22-pre8}
>
>
> Hi,
>
> I have a xircom CBE2-100 ethernet card that I know(as a matter of fact) is
> faulty. The warranty on the card expired and couldn't take it back to
> the manufacturer. Anyway, I hotplugged it into the sock with no problem at
> all. However, when I try to bring up the interface, the kernel hangs. If I
> unplug the card, the kernel comes back to life and resumes.
Uhh... let me get this straight... the card is known for a fact to be
faulty.
> The symptoms of the problem show at
> drivers/net/pcmcia/xircom_tulip_cb: xircom_interrupt() where the interrupt
> is never acknowledge(due to flawed hardware).
Perhaps the driver does not ack the interrupt, because the device
registers do not indicate that it requires service, and the interrupt
pin is just stuck. Or perhaps the driver does ack and the card is
immediately re-triggering or ignoring the ack.
Drivers cannot in general diagnose hardware faults. Perhaps, if
someone had a card broken the same way your card is broken, and they
knew the specific reason for the breakage, they could design a test
for that particular hardware fault. But your card might be the only
one in the known universe with this particular failure mode.
-- Dave
I'm a user. When I insert a card "into my laptop" I'd like it to work as
advertised. If it doesn't work as advertised(because of some hardware
failure in this case), I'd like the kernel to more or less let me know
that something went wrong so I can return it. I wouldn't expect the kernel
to freeze.
Faulty hardware is very common in the PC era. I agree that it is hard to
pin down hardware malfunctions when you don't know what to check
for. However, There should be concern when it takes your whole system
down.
I guess this issue can be disregarded but it'll only make the kernel as
strong as its weakest link.
Youssef
On Wed, 6 Aug 2003, David Hinds wrote:
> > Date: Sat, 2 Aug 2003 16:09:30 -0500 (CDT)
> > From: "Hmamouche, Youssef" <[email protected]>
> > Hi,
> >
> > I have a xircom CBE2-100 ethernet card that I know(as a matter of fact) is
> > faulty. The warranty on the card expired and couldn't take it back to
> > the manufacturer. Anyway, I hotplugged it into the sock with no problem at
> > all. However, when I try to bring up the interface, the kernel hangs. If I
> > unplug the card, the kernel comes back to life and resumes.
>
> Uhh... let me get this straight... the card is known for a fact to be
> faulty.
>
> > The symptoms of the problem show at
> > drivers/net/pcmcia/xircom_tulip_cb: xircom_interrupt() where the interrupt
> > is never acknowledge(due to flawed hardware).
>
> Perhaps the driver does not ack the interrupt, because the device
> registers do not indicate that it requires service, and the interrupt
> pin is just stuck. Or perhaps the driver does ack and the card is
> immediately re-triggering or ignoring the ack.
>
> Drivers cannot in general diagnose hardware faults. Perhaps, if
> someone had a card broken the same way your card is broken, and they
> knew the specific reason for the breakage, they could design a test
> for that particular hardware fault. But your card might be the only
> one in the known universe with this particular failure mode.
>
> -- Dave
>
On Wed, Aug 06, 2003 at 03:55:02PM -0500, Hmamouche, Youssef wrote:
>
> I'm a user. When I insert a card "into my laptop" I'd like it to
> work as advertised. If it doesn't work as advertised(because of some
> hardware failure in this case), I'd like the kernel to more or less
> let me know that something went wrong so I can return it. I wouldn't
> expect the kernel to freeze.
I accept this...
> Faulty hardware is very common in the PC era. I agree that it is
> hard to pin down hardware malfunctions when you don't know what to
> check for. However, There should be concern when it takes your whole
> system down.
I'd agree, that drivers should be made to not screw up when an
unexpected condition arises, where that's possible. Like, not
crashing the OS if a device returns an unexpected value.
This particular problem (what seems to be an unacknowledged interrupt,
but that could be a symptom of something else) is troublesome and
likely impossible for the driver to detect and handle sanely. Because
PCI interrupts are shared, and a driver cannot assume that its device
was responsible for any particular interrupt.
I believe that the 2.6 kernel provides a general central mechanism for
detecting and throttling unacknowledged interrupts, if that really is
the problem. That's where this particular fix belongs, not in the
driver (and every other driver).
-- Dave