So I got bold and enabled CONFIG_PREEMPT in 2.5.39, and got the
following message at boot time:
Sleeping function called from illegal context at slab.c:1374
c12a5ea8 c0117a26 c0296260 c0298202 0000055e c1283060 c013ab4f c0298202
0000055e c03b668c 00000002 00000003 c01ff5ec c03b668c 00000042 c12838e0
c12838e0 c12838e0 00000246 cfdee214 c0207830 04000000 c03b65f0 c0109be2
Call Trace:
[<c0117a26>]__might_sleep+0x56/0x5d
[<c013ab4f>]kmalloc+0x4f/0x330
[<c01ff5ec>]piix_tune_chipset+0x33c/0x350
[<c0207830>]ide_intr+0x0/0x320
[<c0109be2>]request_irq+0x52/0xa0
[<c0200a33>]init_irq+0x263/0x400
[<c0207830>]ide_intr+0x0/0x320
[<c0200edc>]hwif_init+0x10c/0x260
[<c02006ad>]probe_hwif_init+0x1d/0x70
[<c02121d1>]ide_setup_pci_device+0x41/0x70
[<c01ff7a5>]piix_init_one+0x35/0x40
[<c010511b>]init+0x8b/0x250
[<c0105090>]init+0x0/0x250
[<c01055f9>]kernel_thread_helper+0x5/0xc
kksymoops is very nice :)
The system still seems to be running ok, but I think I'll turn off
CONFIG_PREEMPT just to be sure.
This is a SMP kernel running on a UP box. Other configuration options
available if needed.
thanks,
greg k-h
Greg KH wrote:
>
> So I got bold and enabled CONFIG_PREEMPT in 2.5.39, and got the
> following message at boot time:
>
> Sleeping function called from illegal context at slab.c:1374
> c12a5ea8 c0117a26 c0296260 c0298202 0000055e c1283060 c013ab4f c0298202
> 0000055e c03b668c 00000002 00000003 c01ff5ec c03b668c 00000042 c12838e0
> c12838e0 c12838e0 00000246 cfdee214 c0207830 04000000 c03b65f0 c0109be2
> Call Trace:
> [<c0117a26>]__might_sleep+0x56/0x5d
> [<c013ab4f>]kmalloc+0x4f/0x330
> [<c01ff5ec>]piix_tune_chipset+0x33c/0x350
> [<c0207830>]ide_intr+0x0/0x320
> [<c0109be2>]request_irq+0x52/0xa0
> [<c0200a33>]init_irq+0x263/0x400
> [<c0207830>]ide_intr+0x0/0x320
> [<c0200edc>]hwif_init+0x10c/0x260
> [<c02006ad>]probe_hwif_init+0x1d/0x70
> [<c02121d1>]ide_setup_pci_device+0x41/0x70
> [<c01ff7a5>]piix_init_one+0x35/0x40
> [<c010511b>]init+0x8b/0x250
> [<c0105090>]init+0x0/0x250
> [<c01055f9>]kernel_thread_helper+0x5/0xc
>
Everyone will get this. It's IDE's init_irq() function doing
unsafe things inside ide_lock.
It'll be quite harmless at boot-time, but it'd be nice to get
it fixed up.
On Fri, 2002-09-27 at 19:30, Greg KH wrote:
> The system still seems to be running ok, but I think I'll turn off
> CONFIG_PREEMPT just to be sure.
Note this has nothing to do with kernel preemption. IDE explicitly
sleeps while purposely holding a lock.
It is just we do not have the ability to measure atomicity w/o
preemption enabled - e.g. the debugging only works when it is enabled.
Robert Love
No SLEEPING under a KMALLOC on a spinlock while calling request_irq is not
the brightest thing ever done!
So your finger pointing stinks!
See in 2.4-ac there is a stub to indicate the brokeness.
int init_irq (ide_hwif_t *hwif)
{
unsigned long flags;
unsigned int index;
ide_hwgroup_t *hwgroup, *new_hwgroup;
ide_hwif_t *match = NULL;
#if 0
/* Allocate the buffer and no sleep allowed */
new_hwgroup = kmalloc(sizeof(ide_hwgroup_t),GFP_ATOMIC);
#else
/* Allocate the buffer and potentially sleep first */
new_hwgroup = kmalloc(sizeof(ide_hwgroup_t),GFP_KERNEL);
#endif
#ifndef __IRQ_HELL_SPIN
save_and_cli(flags);
#else
spin_lock_irqsave(&io_request_lock, flags);
#endif
<snip>
if (request_irq(hwif->irq,&ide_intr,sa,hwif->name,hwgroup)) {
if (!match)
kfree(hwgroup);
#ifndef __IRQ_HELL_SPIN
restore_flags(flags);
#else
spin_unlock_irqrestore(&io_request_lock, flags);
#endif
return 1;
}
<snip>
/* all CPUs; safe now that hwif->hwgroup is set up */
#ifndef __IRQ_HELL_SPIN
restore_flags(flags);
#else
spin_unlock_irqrestore(&io_request_lock, flags);
#endif
See in trying to move to a spinlock it goes totally south.
So now that you know the where, and why ... please go fix.
See I am off working with AC on the issues for 2.4.
Also with PREMPT, bah never mind.
Regards,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 27 Sep 2002, Andrew Morton wrote:
> Greg KH wrote:
> >
> > So I got bold and enabled CONFIG_PREEMPT in 2.5.39, and got the
> > following message at boot time:
> >
> > Sleeping function called from illegal context at slab.c:1374
> > c12a5ea8 c0117a26 c0296260 c0298202 0000055e c1283060 c013ab4f c0298202
> > 0000055e c03b668c 00000002 00000003 c01ff5ec c03b668c 00000042 c12838e0
> > c12838e0 c12838e0 00000246 cfdee214 c0207830 04000000 c03b65f0 c0109be2
> > Call Trace:
> > [<c0117a26>]__might_sleep+0x56/0x5d
> > [<c013ab4f>]kmalloc+0x4f/0x330
> > [<c01ff5ec>]piix_tune_chipset+0x33c/0x350
> > [<c0207830>]ide_intr+0x0/0x320
> > [<c0109be2>]request_irq+0x52/0xa0
> > [<c0200a33>]init_irq+0x263/0x400
> > [<c0207830>]ide_intr+0x0/0x320
> > [<c0200edc>]hwif_init+0x10c/0x260
> > [<c02006ad>]probe_hwif_init+0x1d/0x70
> > [<c02121d1>]ide_setup_pci_device+0x41/0x70
> > [<c01ff7a5>]piix_init_one+0x35/0x40
> > [<c010511b>]init+0x8b/0x250
> > [<c0105090>]init+0x0/0x250
> > [<c01055f9>]kernel_thread_helper+0x5/0xc
> >
>
> Everyone will get this. It's IDE's init_irq() function doing
> unsafe things inside ide_lock.
>
> It'll be quite harmless at boot-time, but it'd be nice to get
> it fixed up.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Fri, Sep 27, 2002 at 08:51:30PM -0400, Robert Love wrote:
> On Fri, 2002-09-27 at 19:30, Greg KH wrote:
>
> > The system still seems to be running ok, but I think I'll turn off
> > CONFIG_PREEMPT just to be sure.
>
> Note this has nothing to do with kernel preemption. IDE explicitly
> sleeps while purposely holding a lock.
>
> It is just we do not have the ability to measure atomicity w/o
> preemption enabled - e.g. the debugging only works when it is enabled.
Yes, you are correct. Sorry for stating that. It's shaking out lots of
potential proplems :)
greg k-h
On Fri, 2002-09-27 at 22:04, Andre Hedrick wrote:
> See in trying to move to a spinlock it goes totally south.
> So now that you know the where, and why ... please go fix.
> See I am off working with AC on the issues for 2.4.
>
> Also with PREMPT, bah never mind.
Sigh... I do not want to start this but this problem has nothing to do
with preemption and everything to do with you sleeping while holding a
lock. It exists whether preempt is on or off.
Robert Love
On 27 Sep 2002, Robert Love wrote:
> On Fri, 2002-09-27 at 22:04, Andre Hedrick wrote:
>
> > See in trying to move to a spinlock it goes totally south.
> > So now that you know the where, and why ... please go fix.
> > See I am off working with AC on the issues for 2.4.
> >
> > Also with PREMPT, bah never mind.
>
> Sigh... I do not want to start this but this problem has nothing to do
> with preemption and everything to do with you sleeping while holding a
> lock. It exists whether preempt is on or off.
Robert,
Glad we agree on the lock issue, thanks for confirming the point!
There is an issue of interrupt acknowledgement and when one can pre-empt.
I would like to resolve the issue, but I need a global caller/notifier api
from you so I can block IO in a safe spot on the 'data transfer' state
bar. Yeah, blah blah on underfined terms.
Some how I need to figure out how to address the pre-empt and keep the
driver data stable. Initially I would suggest throttling back on the
request size to maybe 4k or 8k regardless. I may not sound right but it
will serve the purpose.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 2002-09-27 at 23:21, Andre Hedrick wrote:
> Glad we agree on the lock issue, thanks for confirming the point!
Great!
> There is an issue of interrupt acknowledgement and when one can pre-empt.
> I would like to resolve the issue, but I need a global caller/notifier api
> from you so I can block IO in a safe spot on the 'data transfer' state
> bar. Yeah, blah blah on underfined terms.
Well, I do not know what the problem is (or what the hell you are
talking about, to be honest). You really should not have any problems
with preemption over regular SMP issues. If your code has a problem
with other code running concurrently, then it should already hold a lock
and thus be non-preemptive?
Also note we do not preempt interrupt handlers (obviously).
If you have a critical section in which you do not want to be preempted,
do a:
preempt_disable();
/* critical section ... */
preempt_enable();
This would have to be code that is in user-context and does not already
hold a lock. There are very few explicit places that need this. You
would be the first block driver, I believe.
Whatever this issue is, note it is entirely separate from the above
locking issue. I also want to iterate that the locking problem
(rescheduling while holding a lock) is a problem on UP even. Yes, think
about it. Assuming the lock really needs to be held, it is protecting a
critical region. If we reschedule, we can enter that region (or another
one of the same data protected hopefully by the same lock). On SMP, we
would deadlock. But on UP we will just silently corrupt the data.
I.e., we can race on UP here.
Robert Love
On Sat, 2002-09-28 at 04:21, Andre Hedrick wrote:
> There is an issue of interrupt acknowledgement and when one can pre-empt.
> I would like to resolve the issue, but I need a global caller/notifier api
> from you so I can block IO in a safe spot on the 'data transfer' state
> bar. Yeah, blah blah on underfined terms.
>
> Some how I need to figure out how to address the pre-empt and keep the
> driver data stable. Initially I would suggest throttling back on the
> request size to maybe 4k or 8k regardless. I may not sound right but it
> will serve the purpose.
For things like old old broken PIO where interrupting the data stream
screws up the data thats actually already covered. Pre-empt does
actually do some things sensibly, and one of them is that when you hold
a lock or disable irq you also disable pre-empt. That means hdparm -u0
PIO interface code is still going to do the right thing
Reminds me though Robert (and Jeff)
drivers/net/8390.c still needs ei_start_xmit fixing
pre-emption should be disabled between
/* Mask interrupts from the ethercard.
SMP: We have to grab the lock here otherwise the IRQ handler
and
disable_irq_nosync(dev->irq);
spin_lock(&ei_local->page_lock);
So that we don't leave the IRQ disabled due to pre-emption
(that code is wonderfully deranged but its the only way to make 8390
based chips not screw up things like serial I/O on a SMP box)
On Fri, Sep 27, 2002 at 08:51:30PM -0400, Robert Love wrote:
> Note this has nothing to do with kernel preemption. IDE explicitly
> sleeps while purposely holding a lock.
>
> It is just we do not have the ability to measure atomicity w/o
> preemption enabled - e.g. the debugging only works when it is enabled.
Would it be particularly difficult to separate this debug tool from the
feature ? Surely we could make it so that CONFIG_PREEMPT depends on
CONFIG_MIGHT_SLEEP or whatever, and just adds the actual ability to
reschedule.
I have a bit of a problem with __might_sleep because I call sleepable
stuff holding a spinlock (yes, it is justified, and yes, it is safe
afaics, at least with PREEMPT=n)
regards
john
--
"When your name is Winner, that's it. You don't need a nickname."
- Loser Lane
John Levon wrote:
>
> On Fri, Sep 27, 2002 at 08:51:30PM -0400, Robert Love wrote:
>
> > Note this has nothing to do with kernel preemption. IDE explicitly
> > sleeps while purposely holding a lock.
> >
> > It is just we do not have the ability to measure atomicity w/o
> > preemption enabled - e.g. the debugging only works when it is enabled.
>
> Would it be particularly difficult to separate this debug tool from the
> feature ? Surely we could make it so that CONFIG_PREEMPT depends on
> CONFIG_MIGHT_SLEEP or whatever, and just adds the actual ability to
> reschedule.
We need a standalone CONFIG_MIGHT_SLEEP. I sinfully hooked it
to CONFIG_DEBUG_KERNEL (it's not obvious why CONFIG_DEBUG_KERNEL
exists actually).
So yes, you could make CONFIG_MIGHT_SLEEP mutually exclusive
with CONFIG_OPROFILE. But that would make people look at you
suspiciously.
> I have a bit of a problem with __might_sleep because I call sleepable
> stuff holding a spinlock (yes, it is justified, and yes, it is safe
> afaics, at least with PREEMPT=n)
I'm looking at you suspiciously. How come?
On Sat, 2002-09-28 at 06:06, Alan Cox wrote:
> Reminds me though Robert (and Jeff)
>
> drivers/net/8390.c still needs ei_start_xmit fixing
>
> pre-emption should be disabled between
>
> /* Mask interrupts from the ethercard.
> SMP: We have to grab the lock here otherwise the IRQ handler
>
> and
> disable_irq_nosync(dev->irq);
>
> spin_lock(&ei_local->page_lock);
Sounds reasonable enough. What about the attached patch? If we flip
the order of the disable_irq and spin_lock, we do not need to actually
explicitly disable preemption... the lock will do that for us.
Is this safe?
This also has the general benefit of not spinning on the lock with the
irq disabled (same sort of downside has preempting with the irq
disabled).
Robert Love
diff -urN linux-2.5.39/drivers/net/8390.c linux/drivers/net/8390.c
--- linux-2.5.39/drivers/net/8390.c Fri Sep 27 17:49:05 2002
+++ linux/drivers/net/8390.c Sat Sep 28 13:02:47 2002
@@ -243,9 +243,9 @@
}
/* Ugly but a reset can be slow, yet must be protected */
-
- disable_irq_nosync(dev->irq);
+
spin_lock(&ei_local->page_lock);
+ disable_irq_nosync(dev->irq);
/* Try to restart the card. Perhaps the user has fixed something. */
ei_reset_8390(dev);
@@ -286,10 +286,9 @@
/*
* Slow phase with lock held.
*/
-
- disable_irq_nosync(dev->irq);
-
+
spin_lock(&ei_local->page_lock);
+ disable_irq_nosync(dev->irq);
ei_local->irqlock = 1;
On Sat, Sep 28, 2002 at 10:05:17AM -0700, Andrew Morton wrote:
> We need a standalone CONFIG_MIGHT_SLEEP. I sinfully hooked it
> to CONFIG_DEBUG_KERNEL (it's not obvious why CONFIG_DEBUG_KERNEL
> exists actually).
Ah, OK.
> > I have a bit of a problem with __might_sleep because I call sleepable
> > stuff holding a spinlock (yes, it is justified, and yes, it is safe
> > afaics, at least with PREEMPT=n)
>
> I'm looking at you suspiciously. How come?
NMI interrupt handler cannot block so it trylocks against a spinlock
instead. Buffer processing code needs to block against concurrent NMI
interrupts so takes the spinlock for them. All actual blocks on the
spinlock are beneath a down() on another semaphore, so a sleep whilst
holding the spinlock won't actually cause deadlock.
I don't know a way out of this that can safely ensure we've finished
processing an NMI on the remote CPU the buffer processing code is about
to look at.
[I'll post a new patch against 2.5.39 in a bit so you can see in
context]
regards
john
--
"When your name is Winner, that's it. You don't need a nickname."
- Loser Lane
On Sat, 2002-09-28 at 13:24, John Levon wrote:
> NMI interrupt handler cannot block so it trylocks against a spinlock
> instead. Buffer processing code needs to block against concurrent NMI
> interrupts so takes the spinlock for them. All actual blocks on the
> spinlock are beneath a down() on another semaphore, so a sleep whilst
> holding the spinlock won't actually cause deadlock.
If all accesses to the spinlock are taken under a semaphore, then the
spinlock is not needed (i.e. the down'ed semaphore provides the same
protection), or am I missing something?
If this is not the case - e.g. there are other accesses to these locks -
then you cannot sleep, no?
I really can think of no case in which it is safe to sleep while holding
a spinlock or otherwise atomic. If it is, then the atomicity is not
needed, sort of by definition.
Robert Love
On Sat, Sep 28, 2002 at 02:27:44PM -0400, Robert Love wrote:
> > NMI interrupt handler cannot block so it trylocks against a spinlock
> > instead. Buffer processing code needs to block against concurrent NMI
> > interrupts so takes the spinlock for them. All actual blocks on the
> > spinlock are beneath a down() on another semaphore, so a sleep whilst
> > holding the spinlock won't actually cause deadlock.
>
> If all accesses to the spinlock are taken under a semaphore, then the
> spinlock is not needed (i.e. the down'ed semaphore provides the same
> protection), or am I missing something?
>
> If this is not the case - e.g. there are other accesses to these locks -
> then you cannot sleep, no?
The other accessors are spin_trylock()ers, as I mentioned. They will not
block but they are not under the semaphore.
The spinlock cannot be a semaphore because NMI interrupts do not take to
kindly to up()
regards
john
--
"When your name is Winner, that's it. You don't need a nickname."
- Loser Lane
On Sat, 2002-09-28 at 13:24, John Levon wrote:
>> NMI interrupt handler cannot block so it trylocks against a spinlock
>> instead. Buffer processing code needs to block against concurrent NMI
>> interrupts so takes the spinlock for them. All actual blocks on the
>> spinlock are beneath a down() on another semaphore, so a sleep whilst
>> holding the spinlock won't actually cause deadlock.
On Sat, Sep 28, 2002 at 02:27:44PM -0400, Robert Love wrote:
> If all accesses to the spinlock are taken under a semaphore, then the
> spinlock is not needed (i.e. the down'ed semaphore provides the same
> protection), or am I missing something?
> If this is not the case - e.g. there are other accesses to these locks -
> then you cannot sleep, no?
> I really can think of no case in which it is safe to sleep while holding
> a spinlock or otherwise atomic. If it is, then the atomicity is not
> needed, sort of by definition.
Actually, though he may be using a spinlock_t, when used this way, it
is not a spinlock, but rather a semaphore-like construct like PG_locked.
Spinlocks include blocking via busywait semantics, which this usage
does not have. It just happens to use the same data type. There are
other interesting abuses of spinlock-like constructs in "advanced"
locks, for instance, in non-sleeping handoff-scheduled queueing locks
(e.g. MCS spinlocks and rwlocks) it's a common idiom for one waiter to
set a "blocked" bit or lock word and then spin on it until another
waiter and/or cpu manipulating the lock clears it.
Bill