2000-12-23 18:12:09

by Manfred Spraul

[permalink] [raw]
Subject: Q: natsemi.c spinlocks

Hi Jeff, Tjeerd,

I spotted the spin_lock in natsemi.c, and I think it's bogus.

The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
(even Alan didn't remember it exactly when I asked him), thus a sane
driver can assume that an interrupt handler is never reentered.

Donald often uses dev->interrupt to hide other races, but I don't see
anything in this driver (tx_timeout and netdev_timer are both trivial)


--
Manfred


2000-12-24 01:20:18

by Andrew Morton

[permalink] [raw]
Subject: Re: Q: natsemi.c spinlocks

Manfred wrote:
>
> Hi Jeff, Tjeerd,
>
> I spotted the spin_lock in natsemi.c, and I think it's bogus.
>
> The "simultaneous interrupt entry" is a bug in some 2.0 and 2.1 kernel
> (even Alan didn't remember it exactly when I asked him), thus a sane
> driver can assume that an interrupt handler is never reentered.
>
> Donald often uses dev->interrupt to hide other races, but I don't see
> anything in this driver (tx_timeout and netdev_timer are both trivial)

Hi, Manfed.

I think you're right. 2.4's interrupt handling prevents
simultaneous entry of the same ISR.

However, natsemi.c's spinlock needs to be retained, and
extended into start_tx(), because this driver has
a race which has cropped up in a few others:

Current code:

start_tx()
{
...
if (np->cur_tx - np->dirty_tx >= TX_QUEUE_LEN - 1) {
/* WINDOW HERE */
np->tx_full = 1;
netif_stop_queue(dev);
}
...
}

If the ring is currently full and an interrupt comes in
at the indicated window and reaps ALL the packets in the
ring, the driver ends up in state `tx_full = 1' and tramsmit
disabled, but with no outstanding transmit interrupts.

It's screwed. You need another interrupt so tx_full
can be cleared and the queue can be restarted, but you can't
*get* another interrupt because there are no Tx packets outstanding.

It's very unlikely to happen with this particular driver
because it's also polling the transmit queue within
receive interrupts. Receiving a packet will clear
the condition.

If you were madly hosing out UDP packets and receiving nothing
then this could occur. It was certainly triggerable in 3c59x.c,
which doesn't test the Tx queue state in Rx interrupts.

I currently have natsemi.c lying in pieces on my garage floor,
so I'll put this locking in if it's OK with everyone?


-

2000-12-24 11:41:59

by Manfred Spraul

[permalink] [raw]
Subject: Re: Q: natsemi.c spinlocks

Andrew Morton wrote:
>
> start_tx()
> {

Yes, I overlooked start_tx.

Hmm. start_tx also assumes that the cpu commits writes in order, I'm
sure the driver is unreliable on RISC cpus.

Perhaps the driver should use pci_alloc_consistent and pci_map_single?

--
Manfred