2002-10-17 02:29:12

by Steve Parker

[permalink] [raw]
Subject: 2.5.41 still not testable by end users

I've been trying to test the 2.5 kernel since 2.5.39, but these warnings
really scare me off... if there's to be a freeze at the end of October,
these really need to be fixed until I (as a -maybe typical- user) feel
comfortable testing it....

I can take Linus' assurances that there is little chance of data
corruption, but while I'm getting messages like these, I can't use it as
my regular system, and I don't have any "Use regularly but don't depend
on" boxes lying around - it's rather oxymoronic.

This machine has run 2.0, 2.2, and 2.4 perfectly happily; if 2.5 can cope
with a pretty bog-standard hard disk (2.4.18 reports: HITACHI_DK239A-65)
then I'll happily test it (2.5 feels faster than 2.4 as a desktop system
from what I've seen) but until it boots without grief, I don't feel safe
testing it.

>From what I've seen, this has been, "Yeah, dealt with" since 2.5.39, but
it still isn't fixed - I don't know whose problem it is, but before it's
done, I can't give any feedback,

Thanks for an excellent kernel so far guys,

Steve.... here's the dmesg output:

Oct 16 21:40:59 declan kernel: Debug: sleeping function called from
illegal context at mm/slab.c:1374
Oct 16 21:40:59 declan kernel: Call Trace:
Oct 16 21:40:59 declan kernel: [__might_sleep+84/96]
__might_sleep+0x54/0x60
Oct 16 21:40:59 declan kernel: [kmem_cache_alloc+38/432]
kmem_cache_alloc+0x26/0x1b0
Oct 16 21:40:59 declan kernel: [startup_8259A_irq+10/16]
startup_8259A_irq+0xa/0x10
Oct 16 21:40:59 declan kernel: [blk_init_free_list+76/208]
blk_init_free_list+0x4c/0xd0
Oct 16 21:40:59 declan kernel: [request_irq+140/168]
request_irq+0x8c/0xa8
Oct 16 21:40:59 declan kernel: [blk_init_queue+12/212]
blk_init_queue+0xc/0xd4
Oct 16 21:40:59 declan kernel: [ide_init_queue+40/104]
ide_init_queue+0x28/0x68
Oct 16 21:41:00 declan kernel: [do_ide_request+0/24]
do_ide_request+0x0/0x18
Oct 16 21:41:00 declan kernel: [init_irq+637/820] init_irq+0x27d/0x334
Oct 16 21:41:00 declan kernel: [hwif_init+274/600] hwif_init+0x112/0x258
Oct 16 21:41:00 declan kernel: [probe_hwif_init+28/108]
probe_hwif_init+0x1c/0x6c
Oct 16 21:41:00 declan kernel: [ide_setup_pci_device+61/104]
ide_setup_pci_device+0x3d/0x68
Oct 16 21:41:00 declan kernel: [piix_init_one+55/64]
piix_init_one+0x37/0x40
Oct 16 21:41:00 declan kernel: [init+46/376] init+0x2e/0x178
Oct 16 21:41:00 declan kernel: [init+0/376] init+0x0/0x178
Oct 16 21:41:00 declan kernel: [kernel_thread_helper+5/12]
kernel_thread_helper+0x5/0xc
Oct 16 21:41:00 declan kernel:



2002-10-17 02:49:26

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

Steve Parker wrote:
>
> I've been trying to test the 2.5 kernel since 2.5.39, but these warnings
> really scare me off...
> ...
> Oct 16 21:40:59 declan kernel: Debug: sleeping function called from
> illegal context at mm/slab.c:1374

It's just debug. Everyone gets it. Don't worry about it.

It's there to remind the IDE developers to fix it.

> Oct 16 21:40:59 declan kernel: Call Trace:
> ...
> Oct 16 21:40:59 declan kernel: [__might_sleep+84/96]
> ...
> Oct 16 21:41:00 declan kernel: [init_irq+637/820] init_irq+0x27d/0x334
>

One day. Before we all die. Please.

2002-10-17 22:18:16

by Thomas Molina

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

On Wed, 16 Oct 2002, Andrew Morton wrote:

> Steve Parker wrote:
> >
> > I've been trying to test the 2.5 kernel since 2.5.39, but these warnings
> > really scare me off...
> > ...
> > Oct 16 21:40:59 declan kernel: Debug: sleeping function called from
> > illegal context at mm/slab.c:1374
>
> It's just debug. Everyone gets it. Don't worry about it.
>
> It's there to remind the IDE developers to fix it.
>
> > Oct 16 21:40:59 declan kernel: Call Trace:
> > ...
> > Oct 16 21:40:59 declan kernel: [__might_sleep+84/96]
> > ...
> > Oct 16 21:41:00 declan kernel: [init_irq+637/820] init_irq+0x27d/0x334
> >
>
> One day. Before we all die. Please.

I had that as fixed in my problem list. It should have been integrated by
2.5.42, certainly 2.5.43. I'm not seeing any additional reports since
then.

2002-10-17 22:33:59

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

Thomas Molina wrote:
>
> ...
> > > Oct 16 21:40:59 declan kernel: [__might_sleep+84/96]
> > > ...
> > > Oct 16 21:41:00 declan kernel: [init_irq+637/820] init_irq+0x27d/0x334
> > >
> >
> > One day. Before we all die. Please.
>
> I had that as fixed in my problem list. It should have been integrated by
> 2.5.42, certainly 2.5.43. I'm not seeing any additional reports since
> then.

Oh. We still have:

if (request_irq(hwif->irq,&ide_intr,sa,hwif->name,hwgroup)) {
if (!match)
kfree(hwgroup);
spin_unlock_irqrestore(&ide_lock, flags);

request_irq() was changed to use GFP_ATOMIC, so it's "fixed".

But only for i386.

request_irq() inside spinlock is a *very* common bug. Moreso
as people move cli()-using code across to use spinlocks.

And we've just lost our ability to detect this bug.

request_irq() needs to take the allocation mode as an argument.
Should always have. Sigh. I'll fix it up sometime.

2002-10-17 23:12:05

by Steve Parker

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

On Thu, 17 Oct 2002, Thomas Molina wrote:

> On Wed, 16 Oct 2002, Andrew Morton wrote:
> > It's just debug. Everyone gets it. Don't worry about it.
> > It's there to remind the IDE developers to fix it.
> > One day. Before we all die. Please.
>
> I had that as fixed in my problem list. It should have been integrated by
> 2.5.42, certainly 2.5.43. I'm not seeing any additional reports since
> then.
>

Thanks, using 2.5.43 and not getting the warnings.
It makes it difficult for an unknowledgeable user (such as myself) to test
the kernel when some messages matter and others don't.

Here goes ....

2002-10-18 02:43:01

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

Andrew Morton <[email protected]> writes:
>
> request_irq() needs to take the allocation mode as an argument.
> Should always have. Sigh. I'll fix it up sometime.

If you change it I would change it to let the caller pass it in. Then
it's explicit and most drivers can just slab it somewhere in their
private structures without any allocation.

-Andi

2002-10-18 03:29:59

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

Andi Kleen wrote:
>
> Andrew Morton <[email protected]> writes:
> >
> > request_irq() needs to take the allocation mode as an argument.
> > Should always have. Sigh. I'll fix it up sometime.
>
> If you change it I would change it to let the caller pass it in. Then
> it's explicit and most drivers can just slab it somewhere in their
> private structures without any allocation.
>

Well that would require that the drivers become aware of struct irqaction,
and make assumptions about how the particular arch handles interrupts.

But it's a bit academic. ia32's request_irq() calls proc_mkdir() and
create_proc_entry(). So there's no point in feeding gfp_flags down
into request_irq or whatever. We need to fix IDE still.

2002-10-18 08:41:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users


> request_irq() was changed to use GFP_ATOMIC, so it's "fixed".

Are you sure? Afaik request_irq also changes some files in proc.... are
you sure that is all done atomically ?


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2002-10-21 10:34:53

by Alan

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

On Thu, 2002-10-17 at 23:39, Andrew Morton wrote:
> request_irq() was changed to use GFP_ATOMIC, so it's "fixed".
>
> But only for i386.
>
> request_irq() inside spinlock is a *very* common bug. Moreso
> as people move cli()-using code across to use spinlocks.
>
> And we've just lost our ability to detect this bug.
>
> request_irq() needs to take the allocation mode as an argument.
> Should always have. Sigh. I'll fix it up sometime.

Many of the people who use request_irq out of spinlocks are actually the
buggy ones, especially in the PCI world. Im not sure passing the
argument is the real fix. I'd like to be able to write code that was
more of the form

irqptr = allocate_irq(irqnum, flags, handler, &err);

install_irq(irqptr);

That would clean up vast amounts of locking, if I can allocate, check I
can obtain and handle all the setup -before- I turn the IRQ one.
enable/disable_irq doesnt really cut it for granularity and has the same
can't sleep issue

2002-10-21 10:38:15

by Alan

[permalink] [raw]
Subject: Re: 2.5.41 still not testable by end users

On Fri, 2002-10-18 at 04:35, Andrew Morton wrote:
> But it's a bit academic. ia32's request_irq() calls proc_mkdir() and
> create_proc_entry(). So there's no point in feeding gfp_flags down
> into request_irq or whatever. We need to fix IDE still.

I don't plan to fix the IDE layer, or the sound drivers, or the hundred
odd other drivers where this is close to impossible to fix. What is
broken is how we do IRQ setup still being like 0.12 not split.