2001-03-06 14:06:13

by Manoj Sontakke

[permalink] [raw]
Subject: spinlock help

Hi
Thankx in idvance for the help.

1. when spin_lock_irqsave() function is called the subsequent code is
executed untill spin_unloc_irqrestore()is called. is this right?
2. is this sequence valid?
spin_lock_irqsave(a,b);
spin_lock_irqsave(c,d);

Manoj


2001-03-06 23:54:49

by Nigel Gamble

[permalink] [raw]
Subject: Re: spinlock help

On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> 1. when spin_lock_irqsave() function is called the subsequent code is
> executed untill spin_unloc_irqrestore()is called. is this right?

Yes. The protected code will not be interrupted, or simultaneously
executed by another CPU.

> 2. is this sequence valid?
> spin_lock_irqsave(a,b);
> spin_lock_irqsave(c,d);

Yes, as long as it is followed by:

spin_unlock_irqrestore(c, d);
spin_unlock_irqrestore(a, b);

Nigel Gamble [email protected]
Mountain View, CA, USA. http://www.nrg.org/

2001-03-07 09:30:48

by Shmuel Hen

[permalink] [raw]
Subject: RE: spinlock help

How about if the same sequence occurred, but from two different drivers ?

We've had some bad experience with this stuff. Our driver, which acts as an
intermediate net driver, would call the hard_start_xmit in the base driver.
The base driver, wanting to block receive interrupts would issue a
'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full, it
could call an indication entry point in our intermediate driver to signal it
to stop sending more packets. Since our indication function handles many
types of indications but can process them only one at a time, we wanted to
block other indications while queuing the request.

The whole sequence would look like that:

[our driver]
ans_send() {
.
.
e100_hard_start_xmit(dev, skb);
.
.
}

[e100.o]
e100_hard_start_xmit() {
.
.
spin_lock_irqsave(a,b);
.
.
if(tx_queue_full)
ans_notify(TX_QUEUE_FULL); <--
.
.
spin_unlock_irqrestore(a,b);
}

[our driver]
ans_notify() {
.
.
spin_lock_irqsave(c,d);
queue_request(req_type);
spin_unlock_irqrestore(c,d); <--
.
.
}

At that point, for some reason, interrupts were back and the e100.o would
hang in an infinite loop (we verified it on kernel 2.4.0-test10 +kdb that
the processor was enabling interrupts and that the e100_isr was called for
processing an Rx int.).

How is that possible that a 'spin_unlock_irqrestore(c,d)' would also restore
what should have been restored only with a 'spin_unlock_irqrestore(a,b)' ?


Thanks in advance,
Shmulik Hen
Software Engineer
Linux Advanced Networking Services
Intel Network Communications Group
Jerusalem, Israel.

-----Original Message-----
From: Nigel Gamble [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 1:54 AM
To: Manoj Sontakke
Cc: [email protected]
Subject: Re: spinlock help


On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> 1. when spin_lock_irqsave() function is called the subsequent code is
> executed untill spin_unloc_irqrestore()is called. is this right?

Yes. The protected code will not be interrupted, or simultaneously
executed by another CPU.

> 2. is this sequence valid?
> spin_lock_irqsave(a,b);
> spin_lock_irqsave(c,d);

Yes, as long as it is followed by:

spin_unlock_irqrestore(c, d);
spin_unlock_irqrestore(a, b);

Nigel Gamble [email protected]
Mountain View, CA, USA. http://www.nrg.org/

2001-03-07 10:27:56

by Manoj Sontakke

[permalink] [raw]
Subject: Re: spinlock help

hi

spin_lock_irq() and spin_lock_bh()

can they be of any use to u?

"Hen, Shmulik" wrote:
>
> How about if the same sequence occurred, but from two different drivers ?
>
> We've had some bad experience with this stuff. Our driver, which acts as an
> intermediate net driver, would call the hard_start_xmit in the base driver.
> The base driver, wanting to block receive interrupts would issue a
> 'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full, it
> could call an indication entry point in our intermediate driver to signal it
> to stop sending more packets. Since our indication function handles many
> types of indications but can process them only one at a time, we wanted to
> block other indications while queuing the request.
>
> The whole sequence would look like that:
>
> [our driver]
> ans_send() {
> .
> .
> e100_hard_start_xmit(dev, skb);
> .
> .
> }
>
> [e100.o]
> e100_hard_start_xmit() {
> .
> .
> spin_lock_irqsave(a,b);
> .
> .
> if(tx_queue_full)
> ans_notify(TX_QUEUE_FULL); <--
> .
> .
> spin_unlock_irqrestore(a,b);
> }
>
> [our driver]
> ans_notify() {
> .
> .
> spin_lock_irqsave(c,d);
> queue_request(req_type);
> spin_unlock_irqrestore(c,d); <--
> .
> .
> }
>
> At that point, for some reason, interrupts were back and the e100.o would
> hang in an infinite loop (we verified it on kernel 2.4.0-test10 +kdb that
> the processor was enabling interrupts and that the e100_isr was called for
> processing an Rx int.).
>
> How is that possible that a 'spin_unlock_irqrestore(c,d)' would also restore
> what should have been restored only with a 'spin_unlock_irqrestore(a,b)' ?
>
> Thanks in advance,
> Shmulik Hen
> Software Engineer
> Linux Advanced Networking Services
> Intel Network Communications Group
> Jerusalem, Israel.
>
> -----Original Message-----
> From: Nigel Gamble [mailto:[email protected]]
> Sent: Wednesday, March 07, 2001 1:54 AM
> To: Manoj Sontakke
> Cc: [email protected]
> Subject: Re: spinlock help
>
> On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> > 1. when spin_lock_irqsave() function is called the subsequent code is
> > executed untill spin_unloc_irqrestore()is called. is this right?
>
> Yes. The protected code will not be interrupted, or simultaneously
> executed by another CPU.
>
> > 2. is this sequence valid?
> > spin_lock_irqsave(a,b);
> > spin_lock_irqsave(c,d);
>
> Yes, as long as it is followed by:
>
> spin_unlock_irqrestore(c, d);
> spin_unlock_irqrestore(a, b);
>
> Nigel Gamble [email protected]
> Mountain View, CA, USA. http://www.nrg.org/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Regards,
Manoj Sontakke

2001-03-07 10:27:57

by Ofer Fryman

[permalink] [raw]
Subject: RE: spinlock help

Did you try looking at Becker eepro100 driver it seems to be simple, no
unnecessary spin_lock_irqsave?.

-----Original Message-----
From: Hen, Shmulik [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 11:21 AM
To: '[email protected]'; Manoj Sontakke
Cc: [email protected]
Subject: RE: spinlock help


How about if the same sequence occurred, but from two different drivers ?

We've had some bad experience with this stuff. Our driver, which acts as an
intermediate net driver, would call the hard_start_xmit in the base driver.
The base driver, wanting to block receive interrupts would issue a
'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full, it
could call an indication entry point in our intermediate driver to signal it
to stop sending more packets. Since our indication function handles many
types of indications but can process them only one at a time, we wanted to
block other indications while queuing the request.

The whole sequence would look like that:

[our driver]
ans_send() {
.
.
e100_hard_start_xmit(dev, skb);
.
.
}

[e100.o]
e100_hard_start_xmit() {
.
.
spin_lock_irqsave(a,b);
.
.
if(tx_queue_full)
ans_notify(TX_QUEUE_FULL); <--
.
.
spin_unlock_irqrestore(a,b);
}

[our driver]
ans_notify() {
.
.
spin_lock_irqsave(c,d);
queue_request(req_type);
spin_unlock_irqrestore(c,d); <--
.
.
}

At that point, for some reason, interrupts were back and the e100.o would
hang in an infinite loop (we verified it on kernel 2.4.0-test10 +kdb that
the processor was enabling interrupts and that the e100_isr was called for
processing an Rx int.).

How is that possible that a 'spin_unlock_irqrestore(c,d)' would also restore
what should have been restored only with a 'spin_unlock_irqrestore(a,b)' ?


Thanks in advance,
Shmulik Hen
Software Engineer
Linux Advanced Networking Services
Intel Network Communications Group
Jerusalem, Israel.

-----Original Message-----
From: Nigel Gamble [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 1:54 AM
To: Manoj Sontakke
Cc: [email protected]
Subject: Re: spinlock help


On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> 1. when spin_lock_irqsave() function is called the subsequent code is
> executed untill spin_unloc_irqrestore()is called. is this right?

Yes. The protected code will not be interrupted, or simultaneously
executed by another CPU.

> 2. is this sequence valid?
> spin_lock_irqsave(a,b);
> spin_lock_irqsave(c,d);

Yes, as long as it is followed by:

spin_unlock_irqrestore(c, d);
spin_unlock_irqrestore(a, b);

Nigel Gamble [email protected]
Mountain View, CA, USA. http://www.nrg.org/

2001-03-07 10:47:19

by Shmuel Hen

[permalink] [raw]
Subject: RE: spinlock help

spin_lock_bh() won't block interrupts and we need them blocked to prevent
more indications.
spin_lock_irq() could do the trick but it's counterpart spin_unlock_irq()
enables all interrupts by calling sti(), and this is even worse for us.


Shmulik.

-----Original Message-----
From: Manoj Sontakke [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 12:27 PM
To: Hen, Shmulik
Cc: '[email protected]'; Manoj Sontakke; [email protected]
Subject: Re: spinlock help


hi

spin_lock_irq() and spin_lock_bh()

can they be of any use to u?

"Hen, Shmulik" wrote:
>
> How about if the same sequence occurred, but from two different drivers ?
>
> We've had some bad experience with this stuff. Our driver, which acts as
an
> intermediate net driver, would call the hard_start_xmit in the base
driver.
> The base driver, wanting to block receive interrupts would issue a
> 'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full,
it
> could call an indication entry point in our intermediate driver to signal
it
> to stop sending more packets. Since our indication function handles many
> types of indications but can process them only one at a time, we wanted to
> block other indications while queuing the request.
>
> The whole sequence would look like that:
>
> [our driver]
> ans_send() {
> .
> .
> e100_hard_start_xmit(dev, skb);
> .
> .
> }
>
> [e100.o]
> e100_hard_start_xmit() {
> .
> .
> spin_lock_irqsave(a,b);
> .
> .
> if(tx_queue_full)
> ans_notify(TX_QUEUE_FULL); <--
> .
> .
> spin_unlock_irqrestore(a,b);
> }
>
> [our driver]
> ans_notify() {
> .
> .
> spin_lock_irqsave(c,d);
> queue_request(req_type);
> spin_unlock_irqrestore(c,d); <--
> .
> .
> }
>
> At that point, for some reason, interrupts were back and the e100.o would
> hang in an infinite loop (we verified it on kernel 2.4.0-test10 +kdb that
> the processor was enabling interrupts and that the e100_isr was called for
> processing an Rx int.).
>
> How is that possible that a 'spin_unlock_irqrestore(c,d)' would also
restore
> what should have been restored only with a 'spin_unlock_irqrestore(a,b)' ?
>
> Thanks in advance,
> Shmulik Hen
> Software Engineer
> Linux Advanced Networking Services
> Intel Network Communications Group
> Jerusalem, Israel.
>
> -----Original Message-----
> From: Nigel Gamble [mailto:[email protected]]
> Sent: Wednesday, March 07, 2001 1:54 AM
> To: Manoj Sontakke
> Cc: [email protected]
> Subject: Re: spinlock help
>
> On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> > 1. when spin_lock_irqsave() function is called the subsequent code is
> > executed untill spin_unloc_irqrestore()is called. is this right?
>
> Yes. The protected code will not be interrupted, or simultaneously
> executed by another CPU.
>
> > 2. is this sequence valid?
> > spin_lock_irqsave(a,b);
> > spin_lock_irqsave(c,d);
>
> Yes, as long as it is followed by:
>
> spin_unlock_irqrestore(c, d);
> spin_unlock_irqrestore(a, b);
>
> Nigel Gamble [email protected]
> Mountain View, CA, USA. http://www.nrg.org/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Regards,
Manoj Sontakke

2001-03-07 10:48:50

by Shmuel Hen

[permalink] [raw]
Subject: RE: spinlock help

e100 implements all sorts of hooks for our intermediate driver (kind of a
co-development effort), so eepro100 is out of the question for us.


Shmulik.

-----Original Message-----
From: Ofer Fryman [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 12:31 PM
To: 'Hen, Shmulik'
Cc: [email protected]
Subject: RE: spinlock help


Did you try looking at Becker eepro100 driver it seems to be simple, no
unnecessary spin_lock_irqsave?.

-----Original Message-----
From: Hen, Shmulik [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 11:21 AM
To: '[email protected]'; Manoj Sontakke
Cc: [email protected]
Subject: RE: spinlock help


How about if the same sequence occurred, but from two different drivers ?

We've had some bad experience with this stuff. Our driver, which acts as an
intermediate net driver, would call the hard_start_xmit in the base driver.
The base driver, wanting to block receive interrupts would issue a
'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full, it
could call an indication entry point in our intermediate driver to signal it
to stop sending more packets. Since our indication function handles many
types of indications but can process them only one at a time, we wanted to
block other indications while queuing the request.

The whole sequence would look like that:

[our driver]
ans_send() {
.
.
e100_hard_start_xmit(dev, skb);
.
.
}

[e100.o]
e100_hard_start_xmit() {
.
.
spin_lock_irqsave(a,b);
.
.
if(tx_queue_full)
ans_notify(TX_QUEUE_FULL); <--
.
.
spin_unlock_irqrestore(a,b);
}

[our driver]
ans_notify() {
.
.
spin_lock_irqsave(c,d);
queue_request(req_type);
spin_unlock_irqrestore(c,d); <--
.
.
}

At that point, for some reason, interrupts were back and the e100.o would
hang in an infinite loop (we verified it on kernel 2.4.0-test10 +kdb that
the processor was enabling interrupts and that the e100_isr was called for
processing an Rx int.).

How is that possible that a 'spin_unlock_irqrestore(c,d)' would also restore
what should have been restored only with a 'spin_unlock_irqrestore(a,b)' ?


Thanks in advance,
Shmulik Hen
Software Engineer
Linux Advanced Networking Services
Intel Network Communications Group
Jerusalem, Israel.

-----Original Message-----
From: Nigel Gamble [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 1:54 AM
To: Manoj Sontakke
Cc: [email protected]
Subject: Re: spinlock help


On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> 1. when spin_lock_irqsave() function is called the subsequent code is
> executed untill spin_unloc_irqrestore()is called. is this right?

Yes. The protected code will not be interrupted, or simultaneously
executed by another CPU.

> 2. is this sequence valid?
> spin_lock_irqsave(a,b);
> spin_lock_irqsave(c,d);

Yes, as long as it is followed by:

spin_unlock_irqrestore(c, d);
spin_unlock_irqrestore(a, b);

Nigel Gamble [email protected]
Mountain View, CA, USA. http://www.nrg.org/

2001-03-07 12:27:49

by Shmuel Hen

[permalink] [raw]
Subject: RE: spinlock help

The kdb trace was accurate, we could actually see the e100 ISR pop from no
where right in the middle of our ans_notify every time the TX queue would
fill up. When we commented out the call to spin_*_irqsave(), it worked fine
under heavy stress for days.

Is it possible it was something wrong with 2.4.0-test10 specifically ?

We had to drop the locks in the final release and never got around to
checking it on other kernel releases (it went on the TO_DO list ;-).

Shmulik.

-----Original Message-----
From: Andrew Morton [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 1:43 PM
To: Hen, Shmulik
Subject: Re: spinlock help


"Hen, Shmulik" wrote:
>
> How about if the same sequence occurred, but from two different drivers ?
>
> We've had some bad experience with this stuff. Our driver, which acts as
an
> intermediate net driver, would call the hard_start_xmit in the base
driver.
> The base driver, wanting to block receive interrupts would issue a
> 'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full,
it
> could call an indication entry point in our intermediate driver to signal
it
> to stop sending more packets. Since our indication function handles many
> types of indications but can process them only one at a time, we wanted to
> block other indications while queuing the request.
>
> The whole sequence would look like that:
>
> [our driver]
> ans_send() {
> .
> .
> e100_hard_start_xmit(dev, skb);
> .
> .
> }
>
> [e100.o]
> e100_hard_start_xmit() {
> .
> .
> spin_lock_irqsave(a,b);
> .
> .
> if(tx_queue_full)
> ans_notify(TX_QUEUE_FULL); <--
> .
> .
> spin_unlock_irqrestore(a,b);
> }
>
> [our driver]
> ans_notify() {
> .
> .
> spin_lock_irqsave(c,d);
> queue_request(req_type);
> spin_unlock_irqrestore(c,d); <--
> .
> .
> }
>
> At that point, for some reason, interrupts were back

Sorry, that can't happen.

Really, you must have made a mistake somewhere.

2001-03-07 12:54:02

by Andrew Morton

[permalink] [raw]
Subject: Re: spinlock help

"Hen, Shmulik" wrote:
>
> The kdb trace was accurate, we could actually see the e100 ISR pop from no
> where right in the middle of our ans_notify every time the TX queue would
> fill up. When we commented out the call to spin_*_irqsave(), it worked fine
> under heavy stress for days.
>
> Is it possible it was something wrong with 2.4.0-test10 specifically ?
>

Sorry, no. If spin_lock_irqsave()/spin_unlock_irqrestore()
were accidentally reenabling interrupts then it would be
the biggest, ugliest catastrophe since someone put out a kernel
which forgot to flush dirty inodes to disk :)

Conceivably it was a compiler bug. Were you using egcs-1.1.2/x86?

-

2001-03-07 13:17:42

by Alan

[permalink] [raw]
Subject: Re: spinlock help

> spin_lock_bh() won't block interrupts and we need them blocked to prevent
> more indications.
> spin_lock_irq() could do the trick but it's counterpart spin_unlock_irq()
> enables all interrupts by calling sti(), and this is even worse for us.

Why dont you queue your indications then. The eepro100 driver doesnt end up
with large locked sections so its obviously possible to handle it sanely

2001-03-07 16:50:13

by Ofer Fryman

[permalink] [raw]
Subject: RE: spinlock help

You can clear interrupts in the beginning of the routine, inside it use
test_and_set_bit() and clear_bit(), or you can replace the device xmit
pointer(dev->start_xmit) with a pointer to your routine, anyway eepro100
still sound better even for your purpose.

Ofer.

-----Original Message-----
From: Hen, Shmulik [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 12:47 PM
To: 'Ofer Fryman'; Hen, Shmulik
Cc: [email protected]
Subject: RE: spinlock help

e100 implements all sorts of hooks for our intermediate driver (kind of a
co-development effort), so eepro100 is out of the question for us.

Shmulik.

From: Ofer Fryman [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 12:31 PM
To: 'Hen, Shmulik'
Cc: [email protected]
Subject: RE: spinlock help


Did you try looking at Becker eepro100 driver it seems to be simple, no
unnecessary spin_lock_irqsave?.

-----Original Message-----
From: Hen, Shmulik [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 11:21 AM
To: '[email protected]'; Manoj Sontakke
Cc: [email protected]
Subject: RE: spinlock help


How about if the same sequence occurred, but from two different drivers ?

We've had some bad experience with this stuff. Our driver, which acts as an
intermediate net driver, would call the hard_start_xmit in the base driver.
The base driver, wanting to block receive interrupts would issue a
'spin_lock_irqsave(a,b)' and process the packet. If the TX queue is full, it
could call an indication entry point in our intermediate driver to signal it
to stop sending more packets. Since our indication function handles many
types of indications but can process them only one at a time, we wanted to
block other indications while queuing the request.

The whole sequence would look like that:

[our driver]
ans_send() {
.
.
e100_hard_start_xmit(dev, skb);
.
.
}

[e100.o]
e100_hard_start_xmit() {
.
.
spin_lock_irqsave(a,b);
.
.
if(tx_queue_full)
ans_notify(TX_QUEUE_FULL); <--
.
.
spin_unlock_irqrestore(a,b);
}

[our driver]
ans_notify() {
.
.
spin_lock_irqsave(c,d);
queue_request(req_type);
spin_unlock_irqrestore(c,d); <--
.
.
}

At that point, for some reason, interrupts were back and the e100.o would
hang in an infinite loop (we verified it on kernel 2.4.0-test10 +kdb that
the processor was enabling interrupts and that the e100_isr was called for
processing an Rx int.).

How is that possible that a 'spin_unlock_irqrestore(c,d)' would also restore
what should have been restored only with a 'spin_unlock_irqrestore(a,b)' ?


Thanks in advance,
Shmulik Hen
Software Engineer
Linux Advanced Networking Services
Intel Network Communications Group
Jerusalem, Israel.

-----Original Message-----
From: Nigel Gamble [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 1:54 AM
To: Manoj Sontakke
Cc: [email protected]
Subject: Re: spinlock help


On Tue, 6 Mar 2001, Manoj Sontakke wrote:
> 1. when spin_lock_irqsave() function is called the subsequent code is
> executed untill spin_unloc_irqrestore()is called. is this right?

Yes. The protected code will not be interrupted, or simultaneously
executed by another CPU.

> 2. is this sequence valid?
> spin_lock_irqsave(a,b);
> spin_lock_irqsave(c,d);

Yes, as long as it is followed by:

spin_unlock_irqrestore(c, d);
spin_unlock_irqrestore(a, b);

Nigel Gamble [email protected]
Mountain View, CA, USA. http://www.nrg.org/

2001-03-08 11:09:02

by Shmuel Hen

[permalink] [raw]
Subject: RE: spinlock help

OK guys, you were right. The bug was in our code - sorry for trouble.
Turns out that while I was away, the problem was solved by someone else. The
problem is probably related to the fact that when we did
'spin_lock_irqsave(c,d)', 'd' was a global variable. The fix was to wrap the
call with another function and declare 'd' as local. I can't quite explain,
but I think that changing from a static to automatic variable made the
difference. My best guess is that since 'd' is passed by value and not by
reference, the macro expansion of spin_lock_irqsave() relies on the location
of 'd' in the stack and if 'd' was on the heap instead, it might get
trashed.

I would really like to hear your expert opinion on my assumption.


Thanks,
Shmulik.

-----Original Message-----
From: Andrew Morton [mailto:[email protected]]
Sent: Wednesday, March 07, 2001 2:54 PM
To: Hen, Shmulik
Cc: 'LKML'
Subject: Re: spinlock help


"Hen, Shmulik" wrote:
>
> The kdb trace was accurate, we could actually see the e100 ISR pop from no
> where right in the middle of our ans_notify every time the TX queue would
> fill up. When we commented out the call to spin_*_irqsave(), it worked
fine
> under heavy stress for days.
>
> Is it possible it was something wrong with 2.4.0-test10 specifically ?
>

Sorry, no. If spin_lock_irqsave()/spin_unlock_irqrestore()
were accidentally reenabling interrupts then it would be
the biggest, ugliest catastrophe since someone put out a kernel
which forgot to flush dirty inodes to disk :)

Conceivably it was a compiler bug. Were you using egcs-1.1.2/x86?

-

2001-03-08 11:34:52

by Andrew Morton

[permalink] [raw]
Subject: Re: spinlock help

"Hen, Shmulik" wrote:
>
> OK guys, you were right. The bug was in our code - sorry for trouble.
> Turns out that while I was away, the problem was solved by someone else. The
> problem is probably related to the fact that when we did
> 'spin_lock_irqsave(c,d)', 'd' was a global variable. The fix was to wrap the
> call with another function and declare 'd' as local. I can't quite explain,
> but I think that changing from a static to automatic variable made the
> difference. My best guess is that since 'd' is passed by value and not by
> reference, the macro expansion of spin_lock_irqsave() relies on the location
> of 'd' in the stack and if 'd' was on the heap instead, it might get
> trashed.
>

Yes, that makes sense.

spin_lock_irqsave() really means "save the current irq mask
on the stack, then disable interrupts". spin_lock_irqrestore()
says "restore the current interrupt mask from the stack". So they
nest, and spin_lock_irqsave() doesn't have to care whether or
not interrupts are currently enabled.

Using a global variable you could get something like:

CPU0: CPU1

__cli();
spin_lock_irqsave(lock, global)
__sti();
spin_lock_irqsave(lock2, global)
spin_lock_irqrestore(lock2, global)
spin_unlock_irqrestore(lock, global)
/* interrupts should be disabled */

Here, CPU1 will set `global' to "interrupts enabled". So when
CPU0 restores its flags from `global' it will be picking up
CPU1's flags, not its own!

There are probably less subtle failure modes than this..

-