2008-02-06 11:20:41

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> Hi!
>
> Trying to boot 2.6.25-git0 (few days old), I get
>
> BUG: unable to handle kernel paging request at ffff..ffb0
> IP at init_irq+0x42e
>
> Call trace:
> ide_device_add_all
> ide_generic_init
> kernel_init
> child_rip
> vgacon_cursor
> kernel_init
> child_rip
>
> Excerpt from config:
>
> CONFIG_IDE=y
> CONFIG_BLK_DEV_IDE=y

Disabling CONFIG_IDE made my machine boot, as it was using libata
anyway.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Subject: Re: 2.6.26-git0: IDE oops during boot


Hi,

On Wednesday 06 February 2008, Pavel Machek wrote:
> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> > Hi!
> >
> > Trying to boot 2.6.25-git0 (few days old), I get
> >
> > BUG: unable to handle kernel paging request at ffff..ffb0
> > IP at init_irq+0x42e

init_irq? hmm...

> > Call trace:
> > ide_device_add_all

this comes from ide-generic
(Generic IDE host driver)

> > ide_generic_init
> > kernel_init
> > child_rip
> > vgacon_cursor
> > kernel_init
> > child_rip
> >
> > Excerpt from config:
> >
> > CONFIG_IDE=y
> > CONFIG_BLK_DEV_IDE=y
>
> Disabling CONFIG_IDE made my machine boot, as it was using libata
> anyway.

Kamalesh/Pavel:

Could you try latest git and see if the OOPS is still there?

[ Yeah, I'm unable to reproduce it. :( ]

Thanks,
Bart

2008-02-07 09:35:35

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

Bartlomiej Zolnierkiewicz wrote:
> Hi,
>
> On Wednesday 06 February 2008, Pavel Machek wrote:
>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
>>> Hi!
>>>
>>> Trying to boot 2.6.25-git0 (few days old), I get
>>>
>>> BUG: unable to handle kernel paging request at ffff..ffb0
>>> IP at init_irq+0x42e
>
> init_irq? hmm...
>
>>> Call trace:
>>> ide_device_add_all
>
> this comes from ide-generic
> (Generic IDE host driver)
>
>>> ide_generic_init
>>> kernel_init
>>> child_rip
>>> vgacon_cursor
>>> kernel_init
>>> child_rip
>>>
>>> Excerpt from config:
>>>
>>> CONFIG_IDE=y
>>> CONFIG_BLK_DEV_IDE=y
>> Disabling CONFIG_IDE made my machine boot, as it was using libata
>> anyway.
>
> Kamalesh/Pavel:
>
> Could you try latest git and see if the OOPS is still there?
>
> [ Yeah, I'm unable to reproduce it. :( ]
>
> Thanks,
> Bart
Hi Bart,

The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
similar to the previous one

BUG: unable to handle kernel paging request at ffffffffffffffa0
IP: [<ffffffff80415673>] init_irq+0x188/0x444
PGD 203067 PUD 204067 PMD 0
Oops: 0000 [1] SMP
CPU 3
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-git16 #1
RIP: 0010:[<ffffffff80415673>] [<ffffffff80415673>] init_irq+0x188/0x444
RSP: 0000:ffff81022f093e00 EFLAGS: 00010282
RAX: ffffffffffffff80 RBX: ffffffff808ad200 RCX: 0000000000000000
RDX: 00000000ffffffff RSI: ffff81022fc039c0 RDI: ffffffff807512c0
RBP: ffff81022f093e30 R08: ffff81022f093d70 R09: 0000000000000002
R10: 0000000000000001 R11: ffff81022f093c00 R12: ffffffff808b4500
R13: ffffffff808b4510 R14: 0000000000000000 R15: ffffffffffffffff
FS: 0000000000000000(0000) GS:ffff81022f0e7ac0(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffffffffffa0 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81022f092000, task ffff81022f0797e0)
Stack: ffff81022f093e30 0000000000000000 ffffffff808ad200 ffffffff808ad220
ffffffff808add80 0000000000000000 ffff81022f093eb0 ffffffff8041648f
ffff81022f093ec0 0000000000000000 0000000080751ee0 0000000000000246
Call Trace:
[<ffffffff8041648f>] ide_device_add_all+0xb60/0xe54
[<ffffffff807d6d48>] ide_generic_init+0x46/0x4a
[<ffffffff807b873b>] kernel_init+0x175/0x2e7
[<ffffffff8020bff8>] child_rip+0xa/0x12
[<ffffffff8037476c>] acpi_ds_init_one_object+0x0/0x88
[<ffffffff807b85c6>] kernel_init+0x0/0x2e7
[<ffffffff8020bfee>] child_rip+0x0/0x12


Code: 89 03 49 8b 45 18 48 89 18 48 39 1b 75 04 0f 0b eb fe fe 05 20 71 38 00 fb eb 5b 48 8b 83 20 07 00 00 83 ca ff 48 83 c0 80 74 0e <48> 8b 40 20 48 8b 80 88 00 00 00 8b 50 04 48 8b 3d 48 11 30 00
RIP [<ffffffff80415673>] init_irq+0x188/0x444
RSP <ffff81022f093e00>
CR2: ffffffffffffffa0
---[ end trace 165798c72d52c3e3 ]---


--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

Subject: Re: 2.6.26-git0: IDE oops during boot


On Thursday 07 February 2008, Kamalesh Babulal wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > On Wednesday 06 February 2008, Pavel Machek wrote:
> >> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> >>> Hi!
> >>>
> >>> Trying to boot 2.6.25-git0 (few days old), I get
> >>>
> >>> BUG: unable to handle kernel paging request at ffff..ffb0
> >>> IP at init_irq+0x42e
> >
> > init_irq? hmm...
> >
> >>> Call trace:
> >>> ide_device_add_all
> >
> > this comes from ide-generic
> > (Generic IDE host driver)
> >
> >>> ide_generic_init
> >>> kernel_init
> >>> child_rip
> >>> vgacon_cursor
> >>> kernel_init
> >>> child_rip
> >>>
> >>> Excerpt from config:
> >>>
> >>> CONFIG_IDE=y
> >>> CONFIG_BLK_DEV_IDE=y
> >> Disabling CONFIG_IDE made my machine boot, as it was using libata
> >> anyway.
> >
> > Kamalesh/Pavel:
> >
> > Could you try latest git and see if the OOPS is still there?
> >
> > [ Yeah, I'm unable to reproduce it. :( ]
> >
> > Thanks,
> > Bart
> Hi Bart,
>
> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> similar to the previous one

Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...

Could you please bisect it down to the guilty commit?

> BUG: unable to handle kernel paging request at ffffffffffffffa0
> IP: [<ffffffff80415673>] init_irq+0x188/0x444

Please also try disassembling init_irq using gdb so we see where it fails.

Bart

> PGD 203067 PUD 204067 PMD 0
> Oops: 0000 [1] SMP
> CPU 3
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.24-git16 #1
> RIP: 0010:[<ffffffff80415673>] [<ffffffff80415673>] init_irq+0x188/0x444
> RSP: 0000:ffff81022f093e00 EFLAGS: 00010282
> RAX: ffffffffffffff80 RBX: ffffffff808ad200 RCX: 0000000000000000
> RDX: 00000000ffffffff RSI: ffff81022fc039c0 RDI: ffffffff807512c0
> RBP: ffff81022f093e30 R08: ffff81022f093d70 R09: 0000000000000002
> R10: 0000000000000001 R11: ffff81022f093c00 R12: ffffffff808b4500
> R13: ffffffff808b4510 R14: 0000000000000000 R15: ffffffffffffffff
> FS: 0000000000000000(0000) GS:ffff81022f0e7ac0(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: ffffffffffffffa0 CR3: 0000000000201000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 1, threadinfo ffff81022f092000, task ffff81022f0797e0)
> Stack: ffff81022f093e30 0000000000000000 ffffffff808ad200 ffffffff808ad220
> ffffffff808add80 0000000000000000 ffff81022f093eb0 ffffffff8041648f
> ffff81022f093ec0 0000000000000000 0000000080751ee0 0000000000000246
> Call Trace:
> [<ffffffff8041648f>] ide_device_add_all+0xb60/0xe54
> [<ffffffff807d6d48>] ide_generic_init+0x46/0x4a
> [<ffffffff807b873b>] kernel_init+0x175/0x2e7
> [<ffffffff8020bff8>] child_rip+0xa/0x12
> [<ffffffff8037476c>] acpi_ds_init_one_object+0x0/0x88
> [<ffffffff807b85c6>] kernel_init+0x0/0x2e7
> [<ffffffff8020bfee>] child_rip+0x0/0x12
>
>
> Code: 89 03 49 8b 45 18 48 89 18 48 39 1b 75 04 0f 0b eb fe fe 05 20 71 38 00 fb eb 5b 48 8b 83 20 07 00 00 83 ca ff 48 83 c0 80 74 0e <48> 8b 40 20 48 8b 80 88 00 00 00 8b 50 04 48 8b 3d 48 11 30 00
> RIP [<ffffffff80415673>] init_irq+0x188/0x444
> RSP <ffff81022f093e00>
> CR2: ffffffffffffffa0
> ---[ end trace 165798c72d52c3e3 ]---
>
>
> --
> Thanks & Regards,
> Kamalesh Babulal,
> Linux Technology Center,
> IBM, ISTL.

2008-02-10 21:33:22

by Nish Aravamudan

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>
> On Thursday 07 February 2008, Kamalesh Babulal wrote:
> > Bartlomiej Zolnierkiewicz wrote:
> > > Hi,
> > >
> > > On Wednesday 06 February 2008, Pavel Machek wrote:
> > >> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> > >>> Hi!
> > >>>
> > >>> Trying to boot 2.6.25-git0 (few days old), I get
> > >>>
> > >>> BUG: unable to handle kernel paging request at ffff..ffb0
> > >>> IP at init_irq+0x42e
> > >
> > > init_irq? hmm...
> > >
> > >>> Call trace:
> > >>> ide_device_add_all
> > >
> > > this comes from ide-generic
> > > (Generic IDE host driver)
> > >
> > >>> ide_generic_init
> > >>> kernel_init
> > >>> child_rip
> > >>> vgacon_cursor
> > >>> kernel_init
> > >>> child_rip
> > >>>
> > >>> Excerpt from config:
> > >>>
> > >>> CONFIG_IDE=y
> > >>> CONFIG_BLK_DEV_IDE=y
> > >> Disabling CONFIG_IDE made my machine boot, as it was using libata
> > >> anyway.
> > >
> > > Kamalesh/Pavel:
> > >
> > > Could you try latest git and see if the OOPS is still there?
> > >
> > > [ Yeah, I'm unable to reproduce it. :( ]
> > >
> > > Thanks,
> > > Bart
> > Hi Bart,
> >
> > The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> > similar to the previous one
>
> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
>
> Could you please bisect it down to the guilty commit?

Kamalesh, were you able to bisect this down? I just got hit by the
same panic on a 4-way x86_64, with 2.6.24-git22.

Thanks,
Nish

2008-02-11 07:54:40

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

Nish Aravamudan wrote:
> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
>>> Bartlomiej Zolnierkiewicz wrote:
>>>> Hi,
>>>>
>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
>>>>>> Hi!
>>>>>>
>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
>>>>>>
>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
>>>>>> IP at init_irq+0x42e
>>>> init_irq? hmm...
>>>>
>>>>>> Call trace:
>>>>>> ide_device_add_all
>>>> this comes from ide-generic
>>>> (Generic IDE host driver)
>>>>
>>>>>> ide_generic_init
>>>>>> kernel_init
>>>>>> child_rip
>>>>>> vgacon_cursor
>>>>>> kernel_init
>>>>>> child_rip
>>>>>>
>>>>>> Excerpt from config:
>>>>>>
>>>>>> CONFIG_IDE=y
>>>>>> CONFIG_BLK_DEV_IDE=y
>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
>>>>> anyway.
>>>> Kamalesh/Pavel:
>>>>
>>>> Could you try latest git and see if the OOPS is still there?
>>>>
>>>> [ Yeah, I'm unable to reproduce it. :( ]
>>>>
>>>> Thanks,
>>>> Bart
>>> Hi Bart,
>>>
>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
>>> similar to the previous one
>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
>>
>> Could you please bisect it down to the guilty commit?
>
> Kamalesh, were you able to bisect this down? I just got hit by the
> same panic on a 4-way x86_64, with 2.6.24-git22.
>
> Thanks,
> Nish

Hi Nish,

I tried bisecting and the guilty patch seems to be

36501650ec45b1db308c3b51886044863be2d762 is first bad commit
commit 36501650ec45b1db308c3b51886044863be2d762
Author: Bartlomiej Zolnierkiewicz <[email protected]>
Date: Fri Feb 1 23:09:31 2008 +0100

ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t


the gdb output, also points to the changes made by the guilty patch

(gdb) p ide_device_add_all
$1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
(gdb) p/x 0xffffffff804176ac+0xb60
$2 = 0xffffffff8041820c
(gdb) l *0xffffffff8041820c
0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
1244 goto out;
1245 }
1246
1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
1248
1249 if (init_irq(hwif) == 0)
1250 goto done;
1251
1252 old_irq = hwif->irq;
1253 /*
(gdb)


(gdb) p init_irq
$1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
(gdb) p/x 0xffffffff8041721f+0x1a4
$2 = 0xffffffff804173c3
(gdb) l *0xffffffff804173c3
0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
96 /* Returns the node based on pci bus */
97 static inline int __pcibus_to_node(struct pci_bus *bus)
98 {
99 struct pci_sysdata *sd = bus->sysdata;
100
101 return sd->node;
102 }
103
104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
105 {
(gdb)


--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

Subject: Re: 2.6.26-git0: IDE oops during boot


Hi,

On Monday 11 February 2008, Kamalesh Babulal wrote:
> Nish Aravamudan wrote:
> > On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> >> On Thursday 07 February 2008, Kamalesh Babulal wrote:
> >>> Bartlomiej Zolnierkiewicz wrote:
> >>>> Hi,
> >>>>
> >>>> On Wednesday 06 February 2008, Pavel Machek wrote:
> >>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> >>>>>> Hi!
> >>>>>>
> >>>>>> Trying to boot 2.6.25-git0 (few days old), I get
> >>>>>>
> >>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
> >>>>>> IP at init_irq+0x42e
> >>>> init_irq? hmm...
> >>>>
> >>>>>> Call trace:
> >>>>>> ide_device_add_all
> >>>> this comes from ide-generic
> >>>> (Generic IDE host driver)
> >>>>
> >>>>>> ide_generic_init
> >>>>>> kernel_init
> >>>>>> child_rip
> >>>>>> vgacon_cursor
> >>>>>> kernel_init
> >>>>>> child_rip
> >>>>>>
> >>>>>> Excerpt from config:
> >>>>>>
> >>>>>> CONFIG_IDE=y
> >>>>>> CONFIG_BLK_DEV_IDE=y
> >>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
> >>>>> anyway.
> >>>> Kamalesh/Pavel:
> >>>>
> >>>> Could you try latest git and see if the OOPS is still there?
> >>>>
> >>>> [ Yeah, I'm unable to reproduce it. :( ]
> >>>>
> >>>> Thanks,
> >>>> Bart
> >>> Hi Bart,
> >>>
> >>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> >>> similar to the previous one
> >> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
> >>
> >> Could you please bisect it down to the guilty commit?
> >
> > Kamalesh, were you able to bisect this down? I just got hit by the
> > same panic on a 4-way x86_64, with 2.6.24-git22.
> >
> > Thanks,
> > Nish
>
> Hi Nish,
>
> I tried bisecting and the guilty patch seems to be
>
> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
> commit 36501650ec45b1db308c3b51886044863be2d762
> Author: Bartlomiej Zolnierkiewicz <[email protected]>
> Date: Fri Feb 1 23:09:31 2008 +0100
>
> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
>
>
> the gdb output, also points to the changes made by the guilty patch
>
> (gdb) p ide_device_add_all
> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
> (gdb) p/x 0xffffffff804176ac+0xb60
> $2 = 0xffffffff8041820c
> (gdb) l *0xffffffff8041820c
> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
> 1244 goto out;
> 1245 }
> 1246
> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> 1248
> 1249 if (init_irq(hwif) == 0)
> 1250 goto done;
> 1251
> 1252 old_irq = hwif->irq;
> 1253 /*
> (gdb)
>
>
> (gdb) p init_irq
> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
> (gdb) p/x 0xffffffff8041721f+0x1a4
> $2 = 0xffffffff804173c3
> (gdb) l *0xffffffff804173c3
> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
> 96 /* Returns the node based on pci bus */
> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
> 98 {
> 99 struct pci_sysdata *sd = bus->sysdata;
> 100
> 101 return sd->node;
> 102 }
> 103
> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
> 105 {
> (gdb)

Thanks for the detailed analysis and sorry for the bug.

I think that this may has been just fixed by Andi's recent hwif_to_node()
fix (patch below, it is in Linus' tree already), could please verify this?

commit 1f07e988290fc45932f5028c9e2a862c37a57336
Author: Andi Kleen <[email protected]>
Date: Mon Feb 11 01:35:20 2008 +0100

Prevent IDE boot ops on NUMA system

Without this patch a Opteron test system here oopses at boot with
current git.

Calling to_pci_dev() on a NULL pointer gives a negative value so the
following NULL pointer check never triggers and then an illegal address
is referenced. Check the unadjusted original device pointer for NULL
instead.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/include/linux/ide.h b/include/linux/ide.h
index 23fad89..a3b69c1 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
static inline int hwif_to_node(ide_hwif_t *hwif)
{
struct pci_dev *dev = to_pci_dev(hwif->dev);
- return dev ? pcibus_to_node(dev->bus) : -1;
+ return hwif->dev ? pcibus_to_node(dev->bus) : -1;
}

static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)

2008-02-12 09:05:20

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

Bartlomiej Zolnierkiewicz wrote:
> Hi,
>
> On Monday 11 February 2008, Kamalesh Babulal wrote:
>> Nish Aravamudan wrote:
>>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
>>>>> Bartlomiej Zolnierkiewicz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
>>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
>>>>>>>>
>>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
>>>>>>>> IP at init_irq+0x42e
>>>>>> init_irq? hmm...
>>>>>>
>>>>>>>> Call trace:
>>>>>>>> ide_device_add_all
>>>>>> this comes from ide-generic
>>>>>> (Generic IDE host driver)
>>>>>>
>>>>>>>> ide_generic_init
>>>>>>>> kernel_init
>>>>>>>> child_rip
>>>>>>>> vgacon_cursor
>>>>>>>> kernel_init
>>>>>>>> child_rip
>>>>>>>>
>>>>>>>> Excerpt from config:
>>>>>>>>
>>>>>>>> CONFIG_IDE=y
>>>>>>>> CONFIG_BLK_DEV_IDE=y
>>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
>>>>>>> anyway.
>>>>>> Kamalesh/Pavel:
>>>>>>
>>>>>> Could you try latest git and see if the OOPS is still there?
>>>>>>
>>>>>> [ Yeah, I'm unable to reproduce it. :( ]
>>>>>>
>>>>>> Thanks,
>>>>>> Bart
>>>>> Hi Bart,
>>>>>
>>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
>>>>> similar to the previous one
>>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
>>>>
>>>> Could you please bisect it down to the guilty commit?
>>> Kamalesh, were you able to bisect this down? I just got hit by the
>>> same panic on a 4-way x86_64, with 2.6.24-git22.
>>>
>>> Thanks,
>>> Nish
>> Hi Nish,
>>
>> I tried bisecting and the guilty patch seems to be
>>
>> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
>> commit 36501650ec45b1db308c3b51886044863be2d762
>> Author: Bartlomiej Zolnierkiewicz <[email protected]>
>> Date: Fri Feb 1 23:09:31 2008 +0100
>>
>> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
>>
>>
>> the gdb output, also points to the changes made by the guilty patch
>>
>> (gdb) p ide_device_add_all
>> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
>> (gdb) p/x 0xffffffff804176ac+0xb60
>> $2 = 0xffffffff8041820c
>> (gdb) l *0xffffffff8041820c
>> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
>> 1244 goto out;
>> 1245 }
>> 1246
>> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
>> 1248
>> 1249 if (init_irq(hwif) == 0)
>> 1250 goto done;
>> 1251
>> 1252 old_irq = hwif->irq;
>> 1253 /*
>> (gdb)
>>
>>
>> (gdb) p init_irq
>> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
>> (gdb) p/x 0xffffffff8041721f+0x1a4
>> $2 = 0xffffffff804173c3
>> (gdb) l *0xffffffff804173c3
>> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
>> 96 /* Returns the node based on pci bus */
>> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
>> 98 {
>> 99 struct pci_sysdata *sd = bus->sysdata;
>> 100
>> 101 return sd->node;
>> 102 }
>> 103
>> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
>> 105 {
>> (gdb)
>
> Thanks for the detailed analysis and sorry for the bug.
>
> I think that this may has been just fixed by Andi's recent hwif_to_node()
> fix (patch below, it is in Linus' tree already), could please verify this?
>
> commit 1f07e988290fc45932f5028c9e2a862c37a57336
> Author: Andi Kleen <[email protected]>
> Date: Mon Feb 11 01:35:20 2008 +0100
>
> Prevent IDE boot ops on NUMA system
>
> Without this patch a Opteron test system here oopses at boot with
> current git.
>
> Calling to_pci_dev() on a NULL pointer gives a negative value so the
> following NULL pointer check never triggers and then an illegal address
> is referenced. Check the unadjusted original device pointer for NULL
> instead.
>
> Signed-off-by: Andi Kleen <[email protected]>
> Signed-off-by: Linus Torvalds <[email protected]>
>
> diff --git a/include/linux/ide.h b/include/linux/ide.h
> index 23fad89..a3b69c1 100644
> --- a/include/linux/ide.h
> +++ b/include/linux/ide.h
> @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
> static inline int hwif_to_node(ide_hwif_t *hwif)
> {
> struct pci_dev *dev = to_pci_dev(hwif->dev);
> - return dev ? pcibus_to_node(dev->bus) : -1;
> + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
> }
>
> static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
Hi Bart,
Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
able to mount the filesystem and panics, am i not sure what is likely causing the panic.

Creating root device.
Mounting root filesystem.
mount: could not find filesystem
Kernel panic - not syncing: Attempted to kill init!


--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

Subject: Re: 2.6.26-git0: IDE oops during boot


Hi,

On Tuesday 12 February 2008, Kamalesh Babulal wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > On Monday 11 February 2008, Kamalesh Babulal wrote:
> >> Nish Aravamudan wrote:
> >>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> >>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
> >>>>> Bartlomiej Zolnierkiewicz wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
> >>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> >>>>>>>> Hi!
> >>>>>>>>
> >>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
> >>>>>>>>
> >>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
> >>>>>>>> IP at init_irq+0x42e
> >>>>>> init_irq? hmm...
> >>>>>>
> >>>>>>>> Call trace:
> >>>>>>>> ide_device_add_all
> >>>>>> this comes from ide-generic
> >>>>>> (Generic IDE host driver)
> >>>>>>
> >>>>>>>> ide_generic_init
> >>>>>>>> kernel_init
> >>>>>>>> child_rip
> >>>>>>>> vgacon_cursor
> >>>>>>>> kernel_init
> >>>>>>>> child_rip
> >>>>>>>>
> >>>>>>>> Excerpt from config:
> >>>>>>>>
> >>>>>>>> CONFIG_IDE=y
> >>>>>>>> CONFIG_BLK_DEV_IDE=y
> >>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
> >>>>>>> anyway.
> >>>>>> Kamalesh/Pavel:
> >>>>>>
> >>>>>> Could you try latest git and see if the OOPS is still there?
> >>>>>>
> >>>>>> [ Yeah, I'm unable to reproduce it. :( ]
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bart
> >>>>> Hi Bart,
> >>>>>
> >>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> >>>>> similar to the previous one
> >>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
> >>>>
> >>>> Could you please bisect it down to the guilty commit?
> >>> Kamalesh, were you able to bisect this down? I just got hit by the
> >>> same panic on a 4-way x86_64, with 2.6.24-git22.
> >>>
> >>> Thanks,
> >>> Nish
> >> Hi Nish,
> >>
> >> I tried bisecting and the guilty patch seems to be
> >>
> >> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
> >> commit 36501650ec45b1db308c3b51886044863be2d762
> >> Author: Bartlomiej Zolnierkiewicz <[email protected]>
> >> Date: Fri Feb 1 23:09:31 2008 +0100
> >>
> >> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
> >>
> >>
> >> the gdb output, also points to the changes made by the guilty patch
> >>
> >> (gdb) p ide_device_add_all
> >> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
> >> (gdb) p/x 0xffffffff804176ac+0xb60
> >> $2 = 0xffffffff8041820c
> >> (gdb) l *0xffffffff8041820c
> >> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
> >> 1244 goto out;
> >> 1245 }
> >> 1246
> >> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> >> 1248
> >> 1249 if (init_irq(hwif) == 0)
> >> 1250 goto done;
> >> 1251
> >> 1252 old_irq = hwif->irq;
> >> 1253 /*
> >> (gdb)
> >>
> >>
> >> (gdb) p init_irq
> >> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
> >> (gdb) p/x 0xffffffff8041721f+0x1a4
> >> $2 = 0xffffffff804173c3
> >> (gdb) l *0xffffffff804173c3
> >> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
> >> 96 /* Returns the node based on pci bus */
> >> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
> >> 98 {
> >> 99 struct pci_sysdata *sd = bus->sysdata;
> >> 100
> >> 101 return sd->node;
> >> 102 }
> >> 103
> >> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
> >> 105 {
> >> (gdb)
> >
> > Thanks for the detailed analysis and sorry for the bug.
> >
> > I think that this may has been just fixed by Andi's recent hwif_to_node()
> > fix (patch below, it is in Linus' tree already), could please verify this?
> >
> > commit 1f07e988290fc45932f5028c9e2a862c37a57336
> > Author: Andi Kleen <[email protected]>
> > Date: Mon Feb 11 01:35:20 2008 +0100
> >
> > Prevent IDE boot ops on NUMA system
> >
> > Without this patch a Opteron test system here oopses at boot with
> > current git.
> >
> > Calling to_pci_dev() on a NULL pointer gives a negative value so the
> > following NULL pointer check never triggers and then an illegal address
> > is referenced. Check the unadjusted original device pointer for NULL
> > instead.
> >
> > Signed-off-by: Andi Kleen <[email protected]>
> > Signed-off-by: Linus Torvalds <[email protected]>
> >
> > diff --git a/include/linux/ide.h b/include/linux/ide.h
> > index 23fad89..a3b69c1 100644
> > --- a/include/linux/ide.h
> > +++ b/include/linux/ide.h
> > @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
> > static inline int hwif_to_node(ide_hwif_t *hwif)
> > {
> > struct pci_dev *dev = to_pci_dev(hwif->dev);
> > - return dev ? pcibus_to_node(dev->bus) : -1;
> > + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
> > }
> >
> > static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
> Hi Bart,
> Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
> able to mount the filesystem and panics, am i not sure what is likely causing the panic.

Is

- the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied

or

- the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
(the one before commit 36501650ec45b1db308c3b51886044863be2d762)

working for you?

> Creating root device.
> Mounting root filesystem.
> mount: could not find filesystem
> Kernel panic - not syncing: Attempted to kill init!

Is IDE actually used for the boot device?

[ Please send a dmesg output from the working system. ]

Thanks,
Bart

2008-02-14 09:47:06

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

Bartlomiej Zolnierkiewicz wrote:
> Hi,
>
> On Tuesday 12 February 2008, Kamalesh Babulal wrote:
>> Bartlomiej Zolnierkiewicz wrote:
>>> Hi,
>>>
>>> On Monday 11 February 2008, Kamalesh Babulal wrote:
>>>> Nish Aravamudan wrote:
>>>>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>>>>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
>>>>>>> Bartlomiej Zolnierkiewicz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
>>>>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
>>>>>>>>>>
>>>>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
>>>>>>>>>> IP at init_irq+0x42e
>>>>>>>> init_irq? hmm...
>>>>>>>>
>>>>>>>>>> Call trace:
>>>>>>>>>> ide_device_add_all
>>>>>>>> this comes from ide-generic
>>>>>>>> (Generic IDE host driver)
>>>>>>>>
>>>>>>>>>> ide_generic_init
>>>>>>>>>> kernel_init
>>>>>>>>>> child_rip
>>>>>>>>>> vgacon_cursor
>>>>>>>>>> kernel_init
>>>>>>>>>> child_rip
>>>>>>>>>>
>>>>>>>>>> Excerpt from config:
>>>>>>>>>>
>>>>>>>>>> CONFIG_IDE=y
>>>>>>>>>> CONFIG_BLK_DEV_IDE=y
>>>>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
>>>>>>>>> anyway.
>>>>>>>> Kamalesh/Pavel:
>>>>>>>>
>>>>>>>> Could you try latest git and see if the OOPS is still there?
>>>>>>>>
>>>>>>>> [ Yeah, I'm unable to reproduce it. :( ]
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Bart
>>>>>>> Hi Bart,
>>>>>>>
>>>>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
>>>>>>> similar to the previous one
>>>>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
>>>>>>
>>>>>> Could you please bisect it down to the guilty commit?
>>>>> Kamalesh, were you able to bisect this down? I just got hit by the
>>>>> same panic on a 4-way x86_64, with 2.6.24-git22.
>>>>>
>>>>> Thanks,
>>>>> Nish
>>>> Hi Nish,
>>>>
>>>> I tried bisecting and the guilty patch seems to be
>>>>
>>>> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
>>>> commit 36501650ec45b1db308c3b51886044863be2d762
>>>> Author: Bartlomiej Zolnierkiewicz <[email protected]>
>>>> Date: Fri Feb 1 23:09:31 2008 +0100
>>>>
>>>> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
>>>>
>>>>
>>>> the gdb output, also points to the changes made by the guilty patch
>>>>
>>>> (gdb) p ide_device_add_all
>>>> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
>>>> (gdb) p/x 0xffffffff804176ac+0xb60
>>>> $2 = 0xffffffff8041820c
>>>> (gdb) l *0xffffffff8041820c
>>>> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
>>>> 1244 goto out;
>>>> 1245 }
>>>> 1246
>>>> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
>>>> 1248
>>>> 1249 if (init_irq(hwif) == 0)
>>>> 1250 goto done;
>>>> 1251
>>>> 1252 old_irq = hwif->irq;
>>>> 1253 /*
>>>> (gdb)
>>>>
>>>>
>>>> (gdb) p init_irq
>>>> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
>>>> (gdb) p/x 0xffffffff8041721f+0x1a4
>>>> $2 = 0xffffffff804173c3
>>>> (gdb) l *0xffffffff804173c3
>>>> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
>>>> 96 /* Returns the node based on pci bus */
>>>> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
>>>> 98 {
>>>> 99 struct pci_sysdata *sd = bus->sysdata;
>>>> 100
>>>> 101 return sd->node;
>>>> 102 }
>>>> 103
>>>> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
>>>> 105 {
>>>> (gdb)
>>> Thanks for the detailed analysis and sorry for the bug.
>>>
>>> I think that this may has been just fixed by Andi's recent hwif_to_node()
>>> fix (patch below, it is in Linus' tree already), could please verify this?
>>>
>>> commit 1f07e988290fc45932f5028c9e2a862c37a57336
>>> Author: Andi Kleen <[email protected]>
>>> Date: Mon Feb 11 01:35:20 2008 +0100
>>>
>>> Prevent IDE boot ops on NUMA system
>>>
>>> Without this patch a Opteron test system here oopses at boot with
>>> current git.
>>>
>>> Calling to_pci_dev() on a NULL pointer gives a negative value so the
>>> following NULL pointer check never triggers and then an illegal address
>>> is referenced. Check the unadjusted original device pointer for NULL
>>> instead.
>>>
>>> Signed-off-by: Andi Kleen <[email protected]>
>>> Signed-off-by: Linus Torvalds <[email protected]>
>>>
>>> diff --git a/include/linux/ide.h b/include/linux/ide.h
>>> index 23fad89..a3b69c1 100644
>>> --- a/include/linux/ide.h
>>> +++ b/include/linux/ide.h
>>> @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
>>> static inline int hwif_to_node(ide_hwif_t *hwif)
>>> {
>>> struct pci_dev *dev = to_pci_dev(hwif->dev);
>>> - return dev ? pcibus_to_node(dev->bus) : -1;
>>> + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
>>> }
>>>
>>> static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
>> Hi Bart,
>> Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
>> able to mount the filesystem and panics, am i not sure what is likely causing the panic.
>
> Is
>
> - the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied
>
> or
>
> - the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
> (the one before commit 36501650ec45b1db308c3b51886044863be2d762)
>
> working for you?

No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i
get the same kernel panic.

>
>> Creating root device.
>> Mounting root filesystem.
>> mount: could not find filesystem
>> Kernel panic - not syncing: Attempted to kill init!
>
> Is IDE actually used for the boot device?
>
> [ Please send a dmesg output from the working system. ]
>


--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.


Attachments:
dmesg_feb_14 (18.84 kB)

2008-02-14 10:28:38

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

On Thu, Feb 14, 2008 at 1:46 AM, Kamalesh Babulal
<[email protected]> wrote:
>
> Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > On Tuesday 12 February 2008, Kamalesh Babulal wrote:
> >> Bartlomiej Zolnierkiewicz wrote:
> >>> Hi,
> >>>
> >>> On Monday 11 February 2008, Kamalesh Babulal wrote:
> >>>> Nish Aravamudan wrote:
> >>>>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> >>>>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
> >>>>>>> Bartlomiej Zolnierkiewicz wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
> >>>>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> >>>>>>>>>> Hi!
> >>>>>>>>>>
> >>>>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
> >>>>>>>>>>
> >>>>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
> >>>>>>>>>> IP at init_irq+0x42e
> >>>>>>>> init_irq? hmm...
> >>>>>>>>
> >>>>>>>>>> Call trace:
> >>>>>>>>>> ide_device_add_all
> >>>>>>>> this comes from ide-generic
> >>>>>>>> (Generic IDE host driver)
> >>>>>>>>
> >>>>>>>>>> ide_generic_init
> >>>>>>>>>> kernel_init
> >>>>>>>>>> child_rip
> >>>>>>>>>> vgacon_cursor
> >>>>>>>>>> kernel_init
> >>>>>>>>>> child_rip
> >>>>>>>>>>
> >>>>>>>>>> Excerpt from config:
> >>>>>>>>>>
> >>>>>>>>>> CONFIG_IDE=y
> >>>>>>>>>> CONFIG_BLK_DEV_IDE=y
> >>>>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
> >>>>>>>>> anyway.
> >>>>>>>> Kamalesh/Pavel:
> >>>>>>>>
> >>>>>>>> Could you try latest git and see if the OOPS is still there?
> >>>>>>>>
> >>>>>>>> [ Yeah, I'm unable to reproduce it. :( ]
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Bart
> >>>>>>> Hi Bart,
> >>>>>>>
> >>>>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> >>>>>>> similar to the previous one
> >>>>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
> >>>>>>
> >>>>>> Could you please bisect it down to the guilty commit?
> >>>>> Kamalesh, were you able to bisect this down? I just got hit by the
> >>>>> same panic on a 4-way x86_64, with 2.6.24-git22.
> >>>>>
> >>>>> Thanks,
> >>>>> Nish
> >>>> Hi Nish,
> >>>>
> >>>> I tried bisecting and the guilty patch seems to be
> >>>>
> >>>> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
> >>>> commit 36501650ec45b1db308c3b51886044863be2d762
> >>>> Author: Bartlomiej Zolnierkiewicz <[email protected]>
> >>>> Date: Fri Feb 1 23:09:31 2008 +0100
> >>>>
> >>>> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
> >>>>
> >>>>
> >>>> the gdb output, also points to the changes made by the guilty patch
> >>>>
> >>>> (gdb) p ide_device_add_all
> >>>> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
> >>>> (gdb) p/x 0xffffffff804176ac+0xb60
> >>>> $2 = 0xffffffff8041820c
> >>>> (gdb) l *0xffffffff8041820c
> >>>> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
> >>>> 1244 goto out;
> >>>> 1245 }
> >>>> 1246
> >>>> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> >>>> 1248
> >>>> 1249 if (init_irq(hwif) == 0)
> >>>> 1250 goto done;
> >>>> 1251
> >>>> 1252 old_irq = hwif->irq;
> >>>> 1253 /*
> >>>> (gdb)
> >>>>
> >>>>
> >>>> (gdb) p init_irq
> >>>> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
> >>>> (gdb) p/x 0xffffffff8041721f+0x1a4
> >>>> $2 = 0xffffffff804173c3
> >>>> (gdb) l *0xffffffff804173c3
> >>>> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
> >>>> 96 /* Returns the node based on pci bus */
> >>>> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
> >>>> 98 {
> >>>> 99 struct pci_sysdata *sd = bus->sysdata;
> >>>> 100
> >>>> 101 return sd->node;
> >>>> 102 }
> >>>> 103
> >>>> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
> >>>> 105 {
> >>>> (gdb)
> >>> Thanks for the detailed analysis and sorry for the bug.
> >>>
> >>> I think that this may has been just fixed by Andi's recent hwif_to_node()
> >>> fix (patch below, it is in Linus' tree already), could please verify this?
> >>>
> >>> commit 1f07e988290fc45932f5028c9e2a862c37a57336
> >>> Author: Andi Kleen <[email protected]>
> >>> Date: Mon Feb 11 01:35:20 2008 +0100
> >>>
> >>> Prevent IDE boot ops on NUMA system
> >>>
> >>> Without this patch a Opteron test system here oopses at boot with
> >>> current git.
> >>>
> >>> Calling to_pci_dev() on a NULL pointer gives a negative value so the
> >>> following NULL pointer check never triggers and then an illegal address
> >>> is referenced. Check the unadjusted original device pointer for NULL
> >>> instead.
> >>>
> >>> Signed-off-by: Andi Kleen <[email protected]>
> >>> Signed-off-by: Linus Torvalds <[email protected]>
> >>>
> >>> diff --git a/include/linux/ide.h b/include/linux/ide.h
> >>> index 23fad89..a3b69c1 100644
> >>> --- a/include/linux/ide.h
> >>> +++ b/include/linux/ide.h
> >>> @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
> >>> static inline int hwif_to_node(ide_hwif_t *hwif)
> >>> {
> >>> struct pci_dev *dev = to_pci_dev(hwif->dev);
> >>> - return dev ? pcibus_to_node(dev->bus) : -1;
> >>> + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
> >>> }
> >>>
> >>> static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
> >> Hi Bart,
> >> Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
> >> able to mount the filesystem and panics, am i not sure what is likely causing the panic.
> >
> > Is
> >
> > - the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied
> >
> > or
> >
> > - the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
> > (the one before commit 36501650ec45b1db308c3b51886044863be2d762)
> >
> > working for you?
>
> No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i
> get the same kernel panic.
>
>
> >
> >> Creating root device.
> >> Mounting root filesystem.
> >> mount: could not find filesystem
> >> Kernel panic - not syncing: Attempted to kill init!
> >
> > Is IDE actually used for the boot device?
> >
> > [ Please send a dmesg output from the working system. ]

it seems you have enclosure connected.

please check if you enable the SES in .config.

if so, please try

http://lkml.org/lkml/2008/2/13/673

YH

Subject: "mount: could not find filesystem" - aacraid? (was: Re: 2.6.26-git0: IDE oops during boot)


Hi,

On Thursday 14 February 2008, Kamalesh Babulal wrote:
> Bartlomiej Zolnierkiewicz wrote:
> > Hi,
> >
> > On Tuesday 12 February 2008, Kamalesh Babulal wrote:
> >> Bartlomiej Zolnierkiewicz wrote:
> >>> Hi,
> >>>
> >>> On Monday 11 February 2008, Kamalesh Babulal wrote:
> >>>> Nish Aravamudan wrote:
> >>>>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> >>>>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
> >>>>>>> Bartlomiej Zolnierkiewicz wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
> >>>>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> >>>>>>>>>> Hi!
> >>>>>>>>>>
> >>>>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
> >>>>>>>>>>
> >>>>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
> >>>>>>>>>> IP at init_irq+0x42e
> >>>>>>>> init_irq? hmm...
> >>>>>>>>
> >>>>>>>>>> Call trace:
> >>>>>>>>>> ide_device_add_all
> >>>>>>>> this comes from ide-generic
> >>>>>>>> (Generic IDE host driver)
> >>>>>>>>
> >>>>>>>>>> ide_generic_init
> >>>>>>>>>> kernel_init
> >>>>>>>>>> child_rip
> >>>>>>>>>> vgacon_cursor
> >>>>>>>>>> kernel_init
> >>>>>>>>>> child_rip
> >>>>>>>>>>
> >>>>>>>>>> Excerpt from config:
> >>>>>>>>>>
> >>>>>>>>>> CONFIG_IDE=y
> >>>>>>>>>> CONFIG_BLK_DEV_IDE=y
> >>>>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
> >>>>>>>>> anyway.
> >>>>>>>> Kamalesh/Pavel:
> >>>>>>>>
> >>>>>>>> Could you try latest git and see if the OOPS is still there?
> >>>>>>>>
> >>>>>>>> [ Yeah, I'm unable to reproduce it. :( ]
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Bart
> >>>>>>> Hi Bart,
> >>>>>>>
> >>>>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> >>>>>>> similar to the previous one
> >>>>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
> >>>>>>
> >>>>>> Could you please bisect it down to the guilty commit?
> >>>>> Kamalesh, were you able to bisect this down? I just got hit by the
> >>>>> same panic on a 4-way x86_64, with 2.6.24-git22.
> >>>>>
> >>>>> Thanks,
> >>>>> Nish
> >>>> Hi Nish,
> >>>>
> >>>> I tried bisecting and the guilty patch seems to be
> >>>>
> >>>> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
> >>>> commit 36501650ec45b1db308c3b51886044863be2d762
> >>>> Author: Bartlomiej Zolnierkiewicz <[email protected]>
> >>>> Date: Fri Feb 1 23:09:31 2008 +0100
> >>>>
> >>>> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
> >>>>
> >>>>
> >>>> the gdb output, also points to the changes made by the guilty patch
> >>>>
> >>>> (gdb) p ide_device_add_all
> >>>> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
> >>>> (gdb) p/x 0xffffffff804176ac+0xb60
> >>>> $2 = 0xffffffff8041820c
> >>>> (gdb) l *0xffffffff8041820c
> >>>> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
> >>>> 1244 goto out;
> >>>> 1245 }
> >>>> 1246
> >>>> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> >>>> 1248
> >>>> 1249 if (init_irq(hwif) == 0)
> >>>> 1250 goto done;
> >>>> 1251
> >>>> 1252 old_irq = hwif->irq;
> >>>> 1253 /*
> >>>> (gdb)
> >>>>
> >>>>
> >>>> (gdb) p init_irq
> >>>> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
> >>>> (gdb) p/x 0xffffffff8041721f+0x1a4
> >>>> $2 = 0xffffffff804173c3
> >>>> (gdb) l *0xffffffff804173c3
> >>>> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
> >>>> 96 /* Returns the node based on pci bus */
> >>>> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
> >>>> 98 {
> >>>> 99 struct pci_sysdata *sd = bus->sysdata;
> >>>> 100
> >>>> 101 return sd->node;
> >>>> 102 }
> >>>> 103
> >>>> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
> >>>> 105 {
> >>>> (gdb)
> >>> Thanks for the detailed analysis and sorry for the bug.
> >>>
> >>> I think that this may has been just fixed by Andi's recent hwif_to_node()
> >>> fix (patch below, it is in Linus' tree already), could please verify this?
> >>>
> >>> commit 1f07e988290fc45932f5028c9e2a862c37a57336
> >>> Author: Andi Kleen <[email protected]>
> >>> Date: Mon Feb 11 01:35:20 2008 +0100
> >>>
> >>> Prevent IDE boot ops on NUMA system
> >>>
> >>> Without this patch a Opteron test system here oopses at boot with
> >>> current git.
> >>>
> >>> Calling to_pci_dev() on a NULL pointer gives a negative value so the
> >>> following NULL pointer check never triggers and then an illegal address
> >>> is referenced. Check the unadjusted original device pointer for NULL
> >>> instead.
> >>>
> >>> Signed-off-by: Andi Kleen <[email protected]>
> >>> Signed-off-by: Linus Torvalds <[email protected]>
> >>>
> >>> diff --git a/include/linux/ide.h b/include/linux/ide.h
> >>> index 23fad89..a3b69c1 100644
> >>> --- a/include/linux/ide.h
> >>> +++ b/include/linux/ide.h
> >>> @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
> >>> static inline int hwif_to_node(ide_hwif_t *hwif)
> >>> {
> >>> struct pci_dev *dev = to_pci_dev(hwif->dev);
> >>> - return dev ? pcibus_to_node(dev->bus) : -1;
> >>> + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
> >>> }
> >>>
> >>> static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
> >> Hi Bart,
> >> Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
> >> able to mount the filesystem and panics, am i not sure what is likely causing the panic.
> >
> > Is
> >
> > - the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied
> >
> > or
> >
> > - the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
> > (the one before commit 36501650ec45b1db308c3b51886044863be2d762)
> >
> > working for you?
>
> No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i
> get the same kernel panic.
>
> >
> >> Creating root device.
> >> Mounting root filesystem.
> >> mount: could not find filesystem
> >> Kernel panic - not syncing: Attempted to kill init!
> >
> > Is IDE actually used for the boot device?
> >
> > [ Please send a dmesg output from the working system. ]

Hmm, it is not (from dmesg):

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
Probing IDE interface ide0...
hda: HL-DT-STCD-RW/DVD DRIVE GCC-4244N, ATAPI CD/DVD-ROM drive
Probing IDE interface ide1...
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache
Uniform CD-ROM driver Revision: 3.20

[...]

Adaptec aacraid driver 1.1-5[2449]-ms
ACPI: PCI Interrupt 0000:01:02.0[A] -> GSI 25 (level, low) -> IRQ 25
AAC0: kernel 5.2-0[11835] Jan 9 2007
AAC0: monitor 5.2-0[11835]
AAC0: bios 5.2-0[11835]
AAC0: serial 1625D1
AAC0: 64bit support enabled.
AAC0: 64 Bit DAC enabled
scsi0 : ServeRAID
scsi 0:0:0:0: Direct-Access IBM x366 V1.0 PQ: 0 ANSI: 2
scsi 0:1:0:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
scsi 0:1:1:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
scsi 0:1:2:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
scsi 0:3:0:0: Enclosure IBM SAS SES-2 DEVICE 0.09 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 0:0:0:0: [sda] Attached SCSI removable disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 0:1:0:0: Attached scsi generic sg1 type 0
scsi 0:1:1:0: Attached scsi generic sg2 type 0
scsi 0:1:2:0: Attached scsi generic sg3 type 0
scsi 0:3:0:0: Attached scsi generic sg4 type 13

[...]

kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.

[...]

EXT3 FS on sda1, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.

I worry that another git-bisect session will be needed unless SCSI
developers are already aware of the problem source.

Thanks,
Bart

Subject: Re: "mount: could not find filesystem" - aacraid? (was: Re: 2.6.26-git0: IDE oops during boot)

On Thursday 14 February 2008, Bartlomiej Zolnierkiewicz wrote:
>
> Hi,
>
> On Thursday 14 February 2008, Kamalesh Babulal wrote:
> > Bartlomiej Zolnierkiewicz wrote:
> > > Hi,
> > >
> > > On Tuesday 12 February 2008, Kamalesh Babulal wrote:
> > >> Bartlomiej Zolnierkiewicz wrote:
> > >>> Hi,
> > >>>
> > >>> On Monday 11 February 2008, Kamalesh Babulal wrote:
> > >>>> Nish Aravamudan wrote:
> > >>>>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
> > >>>>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
> > >>>>>>> Bartlomiej Zolnierkiewicz wrote:
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
> > >>>>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
> > >>>>>>>>>> Hi!
> > >>>>>>>>>>
> > >>>>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
> > >>>>>>>>>>
> > >>>>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
> > >>>>>>>>>> IP at init_irq+0x42e
> > >>>>>>>> init_irq? hmm...
> > >>>>>>>>
> > >>>>>>>>>> Call trace:
> > >>>>>>>>>> ide_device_add_all
> > >>>>>>>> this comes from ide-generic
> > >>>>>>>> (Generic IDE host driver)
> > >>>>>>>>
> > >>>>>>>>>> ide_generic_init
> > >>>>>>>>>> kernel_init
> > >>>>>>>>>> child_rip
> > >>>>>>>>>> vgacon_cursor
> > >>>>>>>>>> kernel_init
> > >>>>>>>>>> child_rip
> > >>>>>>>>>>
> > >>>>>>>>>> Excerpt from config:
> > >>>>>>>>>>
> > >>>>>>>>>> CONFIG_IDE=y
> > >>>>>>>>>> CONFIG_BLK_DEV_IDE=y
> > >>>>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
> > >>>>>>>>> anyway.
> > >>>>>>>> Kamalesh/Pavel:
> > >>>>>>>>
> > >>>>>>>> Could you try latest git and see if the OOPS is still there?
> > >>>>>>>>
> > >>>>>>>> [ Yeah, I'm unable to reproduce it. :( ]
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Bart
> > >>>>>>> Hi Bart,
> > >>>>>>>
> > >>>>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
> > >>>>>>> similar to the previous one
> > >>>>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
> > >>>>>>
> > >>>>>> Could you please bisect it down to the guilty commit?
> > >>>>> Kamalesh, were you able to bisect this down? I just got hit by the
> > >>>>> same panic on a 4-way x86_64, with 2.6.24-git22.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Nish
> > >>>> Hi Nish,
> > >>>>
> > >>>> I tried bisecting and the guilty patch seems to be
> > >>>>
> > >>>> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
> > >>>> commit 36501650ec45b1db308c3b51886044863be2d762
> > >>>> Author: Bartlomiej Zolnierkiewicz <[email protected]>
> > >>>> Date: Fri Feb 1 23:09:31 2008 +0100
> > >>>>
> > >>>> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
> > >>>>
> > >>>>
> > >>>> the gdb output, also points to the changes made by the guilty patch
> > >>>>
> > >>>> (gdb) p ide_device_add_all
> > >>>> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
> > >>>> (gdb) p/x 0xffffffff804176ac+0xb60
> > >>>> $2 = 0xffffffff8041820c
> > >>>> (gdb) l *0xffffffff8041820c
> > >>>> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
> > >>>> 1244 goto out;
> > >>>> 1245 }
> > >>>> 1246
> > >>>> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
> > >>>> 1248
> > >>>> 1249 if (init_irq(hwif) == 0)
> > >>>> 1250 goto done;
> > >>>> 1251
> > >>>> 1252 old_irq = hwif->irq;
> > >>>> 1253 /*
> > >>>> (gdb)
> > >>>>
> > >>>>
> > >>>> (gdb) p init_irq
> > >>>> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
> > >>>> (gdb) p/x 0xffffffff8041721f+0x1a4
> > >>>> $2 = 0xffffffff804173c3
> > >>>> (gdb) l *0xffffffff804173c3
> > >>>> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
> > >>>> 96 /* Returns the node based on pci bus */
> > >>>> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
> > >>>> 98 {
> > >>>> 99 struct pci_sysdata *sd = bus->sysdata;
> > >>>> 100
> > >>>> 101 return sd->node;
> > >>>> 102 }
> > >>>> 103
> > >>>> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
> > >>>> 105 {
> > >>>> (gdb)
> > >>> Thanks for the detailed analysis and sorry for the bug.
> > >>>
> > >>> I think that this may has been just fixed by Andi's recent hwif_to_node()
> > >>> fix (patch below, it is in Linus' tree already), could please verify this?
> > >>>
> > >>> commit 1f07e988290fc45932f5028c9e2a862c37a57336
> > >>> Author: Andi Kleen <[email protected]>
> > >>> Date: Mon Feb 11 01:35:20 2008 +0100
> > >>>
> > >>> Prevent IDE boot ops on NUMA system
> > >>>
> > >>> Without this patch a Opteron test system here oopses at boot with
> > >>> current git.
> > >>>
> > >>> Calling to_pci_dev() on a NULL pointer gives a negative value so the
> > >>> following NULL pointer check never triggers and then an illegal address
> > >>> is referenced. Check the unadjusted original device pointer for NULL
> > >>> instead.
> > >>>
> > >>> Signed-off-by: Andi Kleen <[email protected]>
> > >>> Signed-off-by: Linus Torvalds <[email protected]>
> > >>>
> > >>> diff --git a/include/linux/ide.h b/include/linux/ide.h
> > >>> index 23fad89..a3b69c1 100644
> > >>> --- a/include/linux/ide.h
> > >>> +++ b/include/linux/ide.h
> > >>> @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
> > >>> static inline int hwif_to_node(ide_hwif_t *hwif)
> > >>> {
> > >>> struct pci_dev *dev = to_pci_dev(hwif->dev);
> > >>> - return dev ? pcibus_to_node(dev->bus) : -1;
> > >>> + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
> > >>> }
> > >>>
> > >>> static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
> > >> Hi Bart,
> > >> Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
> > >> able to mount the filesystem and panics, am i not sure what is likely causing the panic.
> > >
> > > Is
> > >
> > > - the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied
> > >
> > > or
> > >
> > > - the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
> > > (the one before commit 36501650ec45b1db308c3b51886044863be2d762)
> > >
> > > working for you?
> >
> > No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i
> > get the same kernel panic.
> >
> > >
> > >> Creating root device.
> > >> Mounting root filesystem.
> > >> mount: could not find filesystem
> > >> Kernel panic - not syncing: Attempted to kill init!
> > >
> > > Is IDE actually used for the boot device?
> > >
> > > [ Please send a dmesg output from the working system. ]
>
> Hmm, it is not (from dmesg):
>
> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
> Probing IDE interface ide0...
> hda: HL-DT-STCD-RW/DVD DRIVE GCC-4244N, ATAPI CD/DVD-ROM drive
> Probing IDE interface ide1...
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> hda: ATAPI 24X DVD-ROM CD-R/RW drive, 2048kB Cache
> Uniform CD-ROM driver Revision: 3.20
>
> [...]
>
> Adaptec aacraid driver 1.1-5[2449]-ms
> ACPI: PCI Interrupt 0000:01:02.0[A] -> GSI 25 (level, low) -> IRQ 25
> AAC0: kernel 5.2-0[11835] Jan 9 2007
> AAC0: monitor 5.2-0[11835]
> AAC0: bios 5.2-0[11835]
> AAC0: serial 1625D1
> AAC0: 64bit support enabled.
> AAC0: 64 Bit DAC enabled
> scsi0 : ServeRAID
> scsi 0:0:0:0: Direct-Access IBM x366 V1.0 PQ: 0 ANSI: 2
> scsi 0:1:0:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
> scsi 0:1:1:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
> scsi 0:1:2:0: Direct-Access IBM-ESXS ST973401SS B519 PQ: 0 ANSI: 5
> scsi 0:3:0:0: Enclosure IBM SAS SES-2 DEVICE 0.09 PQ: 0 ANSI: 5
> sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> sd 0:0:0:0: [sda] 429459456 512-byte hardware sectors (219883 MB)
> sd 0:0:0:0: [sda] Write Protect is off
> sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00
> sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
> sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
> sd 0:0:0:0: [sda] Attached SCSI removable disk
> sd 0:0:0:0: Attached scsi generic sg0 type 0
> scsi 0:1:0:0: Attached scsi generic sg1 type 0
> scsi 0:1:1:0: Attached scsi generic sg2 type 0
> scsi 0:1:2:0: Attached scsi generic sg3 type 0
> scsi 0:3:0:0: Attached scsi generic sg4 type 13
>
> [...]
>
> kjournald starting. Commit interval 5 seconds
> EXT3-fs: mounted filesystem with ordered data mode.
>
> [...]
>
> EXT3 FS on sda1, internal journal
> kjournald starting. Commit interval 5 seconds
> EXT3 FS on sda2, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
>
> I worry that another git-bisect session will be needed unless SCSI
> developers are already aware of the problem source.

Yinghai Lu noticed that it may be actually a SES problem:

http://lkml.org/lkml/2008/2/14/88

[ I overlooked the above mail, sorry ]

2008-02-14 15:47:57

by James Bottomley

[permalink] [raw]
Subject: Re: "mount: could not find filesystem" - aacraid? (was: Re: 2.6.26-git0: IDE oops during boot)

On Thu, 2008-02-14 at 13:07 +0100, Bartlomiej Zolnierkiewicz wrote:
> > I worry that another git-bisect session will be needed unless SCSI
> > developers are already aware of the problem source.
>
> Yinghai Lu noticed that it may be actually a SES problem:
>
> http://lkml.org/lkml/2008/2/14/88
>
> [ I overlooked the above mail, sorry ]

Only if SES is enabled, is it (CONFIG_SCSI_ENCLOSURE)? ... is there
actually a dmesg of the failing system somewhere, I couldn't find it in
the (somewhat long) thread?

James



2008-02-15 11:15:30

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

Yinghai Lu wrote:
> On Thu, Feb 14, 2008 at 1:46 AM, Kamalesh Babulal
> <[email protected]> wrote:
>> Bartlomiej Zolnierkiewicz wrote:
>> > Hi,
>> >
>> > On Tuesday 12 February 2008, Kamalesh Babulal wrote:
>> >> Bartlomiej Zolnierkiewicz wrote:
>> >>> Hi,
>> >>>
>> >>> On Monday 11 February 2008, Kamalesh Babulal wrote:
>> >>>> Nish Aravamudan wrote:
>> >>>>> On 2/7/08, Bartlomiej Zolnierkiewicz <[email protected]> wrote:
>> >>>>>> On Thursday 07 February 2008, Kamalesh Babulal wrote:
>> >>>>>>> Bartlomiej Zolnierkiewicz wrote:
>> >>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>> On Wednesday 06 February 2008, Pavel Machek wrote:
>> >>>>>>>>> On Wed 2008-02-06 11:53:34, Pavel Machek wrote:
>> >>>>>>>>>> Hi!
>> >>>>>>>>>>
>> >>>>>>>>>> Trying to boot 2.6.25-git0 (few days old), I get
>> >>>>>>>>>>
>> >>>>>>>>>> BUG: unable to handle kernel paging request at ffff..ffb0
>> >>>>>>>>>> IP at init_irq+0x42e
>> >>>>>>>> init_irq? hmm...
>> >>>>>>>>
>> >>>>>>>>>> Call trace:
>> >>>>>>>>>> ide_device_add_all
>> >>>>>>>> this comes from ide-generic
>> >>>>>>>> (Generic IDE host driver)
>> >>>>>>>>
>> >>>>>>>>>> ide_generic_init
>> >>>>>>>>>> kernel_init
>> >>>>>>>>>> child_rip
>> >>>>>>>>>> vgacon_cursor
>> >>>>>>>>>> kernel_init
>> >>>>>>>>>> child_rip
>> >>>>>>>>>>
>> >>>>>>>>>> Excerpt from config:
>> >>>>>>>>>>
>> >>>>>>>>>> CONFIG_IDE=y
>> >>>>>>>>>> CONFIG_BLK_DEV_IDE=y
>> >>>>>>>>> Disabling CONFIG_IDE made my machine boot, as it was using libata
>> >>>>>>>>> anyway.
>> >>>>>>>> Kamalesh/Pavel:
>> >>>>>>>>
>> >>>>>>>> Could you try latest git and see if the OOPS is still there?
>> >>>>>>>>
>> >>>>>>>> [ Yeah, I'm unable to reproduce it. :( ]
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Bart
>> >>>>>>> Hi Bart,
>> >>>>>>>
>> >>>>>>> The panic is reproducible with the 2.6.24-git16 kernel, the call trace is
>> >>>>>>> similar to the previous one
>> >>>>>> Thanks, I again reviewed ide-probe.c changes but nothing seems wrong...
>> >>>>>>
>> >>>>>> Could you please bisect it down to the guilty commit?
>> >>>>> Kamalesh, were you able to bisect this down? I just got hit by the
>> >>>>> same panic on a 4-way x86_64, with 2.6.24-git22.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Nish
>> >>>> Hi Nish,
>> >>>>
>> >>>> I tried bisecting and the guilty patch seems to be
>> >>>>
>> >>>> 36501650ec45b1db308c3b51886044863be2d762 is first bad commit
>> >>>> commit 36501650ec45b1db308c3b51886044863be2d762
>> >>>> Author: Bartlomiej Zolnierkiewicz <[email protected]>
>> >>>> Date: Fri Feb 1 23:09:31 2008 +0100
>> >>>>
>> >>>> ide: keep pointer to struct device instead of struct pci_dev in ide_hwif_t
>> >>>>
>> >>>>
>> >>>> the gdb output, also points to the changes made by the guilty patch
>> >>>>
>> >>>> (gdb) p ide_device_add_all
>> >>>> $1 = {int (u8 *, const struct ide_port_info *)} 0xffffffff804176ac <ide_device_add_all>
>> >>>> (gdb) p/x 0xffffffff804176ac+0xb60
>> >>>> $2 = 0xffffffff8041820c
>> >>>> (gdb) l *0xffffffff8041820c
>> >>>> 0xffffffff8041820c is in ide_device_add_all (drivers/ide/ide-probe.c:1249).
>> >>>> 1244 goto out;
>> >>>> 1245 }
>> >>>> 1246
>> >>>> 1247 sg_init_table(hwif->sg_table, hwif->sg_max_nents);
>> >>>> 1248
>> >>>> 1249 if (init_irq(hwif) == 0)
>> >>>> 1250 goto done;
>> >>>> 1251
>> >>>> 1252 old_irq = hwif->irq;
>> >>>> 1253 /*
>> >>>> (gdb)
>> >>>>
>> >>>>
>> >>>> (gdb) p init_irq
>> >>>> $1 = {int (ide_hwif_t *)} 0xffffffff8041721f <init_irq>
>> >>>> (gdb) p/x 0xffffffff8041721f+0x1a4
>> >>>> $2 = 0xffffffff804173c3
>> >>>> (gdb) l *0xffffffff804173c3
>> >>>> 0xffffffff804173c3 is in init_irq (include/asm/pci.h:101).
>> >>>> 96 /* Returns the node based on pci bus */
>> >>>> 97 static inline int __pcibus_to_node(struct pci_bus *bus)
>> >>>> 98 {
>> >>>> 99 struct pci_sysdata *sd = bus->sysdata;
>> >>>> 100
>> >>>> 101 return sd->node;
>> >>>> 102 }
>> >>>> 103
>> >>>> 104 static inline cpumask_t __pcibus_to_cpumask(struct pci_bus *bus)
>> >>>> 105 {
>> >>>> (gdb)
>> >>> Thanks for the detailed analysis and sorry for the bug.
>> >>>
>> >>> I think that this may has been just fixed by Andi's recent hwif_to_node()
>> >>> fix (patch below, it is in Linus' tree already), could please verify this?
>> >>>
>> >>> commit 1f07e988290fc45932f5028c9e2a862c37a57336
>> >>> Author: Andi Kleen <[email protected]>
>> >>> Date: Mon Feb 11 01:35:20 2008 +0100
>> >>>
>> >>> Prevent IDE boot ops on NUMA system
>> >>>
>> >>> Without this patch a Opteron test system here oopses at boot with
>> >>> current git.
>> >>>
>> >>> Calling to_pci_dev() on a NULL pointer gives a negative value so the
>> >>> following NULL pointer check never triggers and then an illegal address
>> >>> is referenced. Check the unadjusted original device pointer for NULL
>> >>> instead.
>> >>>
>> >>> Signed-off-by: Andi Kleen <[email protected]>
>> >>> Signed-off-by: Linus Torvalds <[email protected]>
>> >>>
>> >>> diff --git a/include/linux/ide.h b/include/linux/ide.h
>> >>> index 23fad89..a3b69c1 100644
>> >>> --- a/include/linux/ide.h
>> >>> +++ b/include/linux/ide.h
>> >>> @@ -1295,7 +1295,7 @@ static inline void ide_dump_identify(u8 *id)
>> >>> static inline int hwif_to_node(ide_hwif_t *hwif)
>> >>> {
>> >>> struct pci_dev *dev = to_pci_dev(hwif->dev);
>> >>> - return dev ? pcibus_to_node(dev->bus) : -1;
>> >>> + return hwif->dev ? pcibus_to_node(dev->bus) : -1;
>> >>> }
>> >>>
>> >>> static inline ide_drive_t *ide_get_paired_drive(ide_drive_t *drive)
>> >> Hi Bart,
>> >> Thanks !! the patch solves the kernel panic but when after applying the patch,kernel is not
>> >> able to mount the filesystem and panics, am i not sure what is likely causing the panic.
>> >
>> > Is
>> >
>> > - the commit 36501650ec45b1db308c3b51886044863be2d762 with Andi's fix applied
>> >
>> > or
>> >
>> > - the commit f6fb786d6dcdd7d730e4fba620b071796f487e1b
>> > (the one before commit 36501650ec45b1db308c3b51886044863be2d762)
>> >
>> > working for you?
>>
>> No, the commit before the commit 36501650ec45b1db308c3b51886044863be2d762 did not either work, i
>> get the same kernel panic.
>>
>>
>> >
>> >> Creating root device.
>> >> Mounting root filesystem.
>> >> mount: could not find filesystem
>> >> Kernel panic - not syncing: Attempted to kill init!
>> >
>> > Is IDE actually used for the boot device?
>> >
>> > [ Please send a dmesg output from the working system. ]
>
> it seems you have enclosure connected.
>
> please check if you enable the SES in .config.
>
> if so, please try
>
> http://lkml.org/lkml/2008/2/13/673
>
> YH
> --
Hi,

Thanks for pointing the patch, I do not have the SES config option enabled,
then too i tried your patch, but that does not solve the panic. The kernel
panic's with the same panic message as before. I have attached the .config
file which i am using, please let me know if i am missing out/getting wrong
any option in the configuration.



--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
---





Attachments:
config (35.49 kB)

2008-02-25 07:05:37

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

On Fri, Feb 15, 2008 at 3:15 AM, Kamalesh Babulal
<[email protected]> wrote:
>
>
> Thanks for pointing the patch, I do not have the SES config option enabled,
> then too i tried your patch, but that does not solve the panic. The kernel
> panic's with the same panic message as before. I have attached the .config
> file which i am using, please let me know if i am missing out/getting wrong
> any option in the configuration.

can you try x86.git#testing?

http://people.redhat.com/mingo/x86.git/README

YH

2008-02-25 07:23:22

by Yinghai Lu

[permalink] [raw]
Subject: Re: 2.6.26-git0: IDE oops during boot

On Sun, Feb 24, 2008 at 11:05 PM, Yinghai Lu <[email protected]> wrote:
> On Fri, Feb 15, 2008 at 3:15 AM, Kamalesh Babulal
>
> <[email protected]> wrote:
> >
> >
>
> > Thanks for pointing the patch, I do not have the SES config option enabled,
> > then too i tried your patch, but that does not solve the panic. The kernel
> > panic's with the same panic message as before. I have attached the .config
> > file which i am using, please let me know if i am missing out/getting wrong
> > any option in the configuration.
>
> can you try x86.git#testing?
>
> http://people.redhat.com/mingo/x86.git/README
>

and try attached patch.

YH


Attachments:
(No filename) (661.00 B)
fix_intel_numa.patch (495.00 B)
Download all attachments