2009-06-25 10:25:16

by Alessandro Suardi

[permalink] [raw]
Subject: [2.6.31-rc1] oops in acpi_get_pci_dev

On boot:

[snip]

kernel: ricoh-mmc: Controller is now disabled.
kernel: firewire_ohci 0000:03:01.0: PCI INT B -> GSI 17 (level, low) -> IRQ 17
kernel: BUG: unable to handle kernel NULL pointer dereference at
0000000000000018
kernel: IP: [<ffffffff8121b556>] acpi_get_pci_dev+0x113/0x179
kernel: PGD 11b0f1067 PUD 11b0e5067 PMD 0
kernel: Oops: 0000 [#1] SMP
kernel: last sysfs file: /sys/devices/virtual/misc/rfkill/dev
kernel: CPU 0
kernel: Modules linked in: dell_laptop(+) snd firewire_ohci(+)
ricoh_mmc firewire_core soundcore sdhci_pci snd_page_alloc rfkill
i2c_i801 sdhci pcspkr mmc_core joydev dcdbas i2c_core video output
crc_itu_t battery ac [last unloaded: scsi_wait_scan]
kernel: Pid: 1258, comm: modprobe Not tainted 2.6.31-rc1 #1 Latitude
E6400
kernel: RIP: 0010:[<ffffffff8121b556>] [<ffffffff8121b556>]
acpi_get_pci_dev+0x113/0x179
kernel: RSP: 0018:ffff88011b0f3bf8 EFLAGS: 00010287
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000824bb959
kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffffffff815a0f74
kernel: RBP: ffff88011b0f3c68 R08: ffff88011b7396f0 R09: 0000000000000000
kernel: R10: 0000000000000052 R11: ffff88011b176338 R12: ffff88011b427560
kernel: R13: ffff88011f9a4000 R14: ffff88011f814d60 R15: ffff88011f814d20
kernel: FS: 00007f715ffc06f0(0000) GS:ffff88002801f000(0000)
knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
kernel: CR2: 0000000000000018 CR3: 000000011b0e6000 CR4: 00000000000006f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
kernel: Process modprobe (pid: 1258, threadinfo ffff88011b0f2000, task
ffff88011b7396f0)
kernel: Stack:
kernel: ffff88011b0f3c38 ffff88011b0f3c08 ffff88011b427560 ffff88011b427e60
kernel: <0> ffff88011b0f3c68 ffff88011f812020 0000000000010000 00000000824bb959
kernel: <0> ffff88011b0f3c68 ffff88011f814d60 ffff88011b0f3de8 ffff88011f814d20
kernel: Call Trace:
kernel: [<ffffffff8121ef98>] find_video+0x62/0xa2
kernel: [<ffffffff81233240>] acpi_ns_walk_namespace+0xc4/0x14d
kernel: [<ffffffff8121ef36>] ? find_video+0x0/0xa2
kernel: [<ffffffff8121ef36>] ? find_video+0x0/0xa2
kernel: [<ffffffff81230b0a>] acpi_walk_namespace+0x82/0xd0
kernel: [<ffffffff8121ecdd>] acpi_video_get_capabilities+0x61/0xe2
kernel: [<ffffffffa000a060>] ? dell_send_request+0x60/0x82 [dell_laptop]
kernel: [<ffffffff8121ede9>] acpi_video_backlight_support+0x27/0x64
kernel: [<ffffffffa001c1db>] dell_init+0x1db/0x329 [dell_laptop]
kernel: [<ffffffff811d9db9>] ? __up_read+0x9c/0xbb
kernel: [<ffffffffa001c000>] ? dell_init+0x0/0x329 [dell_laptop]
kernel: [<ffffffff81009092>] do_one_initcall+0x65/0x153
kernel: [<ffffffff8106cf69>] ? __blocking_notifier_call_chain+0x63/0x83
kernel: [<ffffffff8107ed7e>] sys_init_module+0xe0/0x22e
kernel: [<ffffffff8100be2b>] system_call_fastpath+0x16/0x1b
kernel: Code: 75 4a 48 8b 45 c0 4c 89 ef 48 89 c6 83 e0 07 48 c1 ee 10
c1 e6 03 81 e6 f8 00 00 00 09 c6 e8 29 3e fd ff 4d 39 f7 48 89 c3 74
21 <4c> 8b 68 18 48 89 c7 e8 d0 34 fd ff 4d 8b 24 24 49 8b 04 24 4c
kernel: RIP [<ffffffff8121b556>] acpi_get_pci_dev+0x113/0x179
kernel: RSP <ffff88011b0f3bf8>
kernel: CR2: 0000000000000018
kernel: ---[ end trace 96ab648bd362f7da ]---


Dell Latitude E6400 x86_64 SMP with Fedora 11 userspace.

Box continues working after Oops, but locks up when exiting X session.

2.6.30-git22 works fine (probably because acpi_get_pci_dev is
introduced in -rc1)


Thanks,

--alessandro

"And if a God will lay to rest anywhere we want to go
In your house I long to be, room by room, patiently"

(Audioslave, "Like A Stone")


2009-06-25 15:38:39

by Troy Moure

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev


Alessandro Suardi wrote:

> On boot:
...
> kernel: BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000018
> kernel: IP: [<ffffffff8121b556>] acpi_get_pci_dev+0x113/0x17
...
> Dell Latitude E6400 x86_64 SMP with Fedora 11 userspace.

> Box continues working after Oops, but locks up when exiting X session.

> 2.6.30-git22 works fine (probably because acpi_get_pci_dev is
> introduced in -rc1)

I've encountered what seems to be the same issue (a NULL pointer
dereference in acpi_get_pci_dev()). In my case, it caused a kernel panic
during boot (so I don't have any text logs to attach).

In my case, pci_get_slot() is returning a NULL pointer that
acpi_get_pci_dev() doesn't check for. The following patch fixes things
for me. Does it work for you, Alessandro?

(I don't know if it's the "right" fix or not, not being familiar with the
system. If it is, I can send it in as a proper patch.)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 8a5bf3b..55b5b90 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -395,7 +395,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
fn = adr & 0xffff;

pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
- if (hnd == handle)
+ if (!pdev || hnd == handle)
break;

pbus = pdev->subordinate;

2009-06-25 16:09:16

by Jeff Chua

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev

On Thu, Jun 25, 2009 at 11:13 PM, Troy Moure<[email protected]> wrote:
> (I don't know if it's the "right" fix or not, not being familiar with the
> system. ?If it is, I can send it in as a proper patch.)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 8a5bf3b..55b5b90 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -395,7 +395,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> ? ? ? ? ? ? ? ?fn ?= adr & 0xffff;
>
> ? ? ? ? ? ? ? ?pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
> - ? ? ? ? ? ? ? if (hnd == handle)
> + ? ? ? ? ? ? ? if (!pdev || hnd == handle)
> ? ? ? ? ? ? ? ? ? ? ? ?break;
>
> ? ? ? ? ? ? ? ?pbus = pdev->subordinate;

I've the same problem with booting hangs, and your patch fixed the
problem on my ThinkPad X61.


Thanks,
Jeff.

2009-06-25 16:41:18

by Alex Chiang

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev

* Troy Moure <[email protected]>:
> Alessandro Suardi wrote:
> > On boot:
> ...
> > kernel: BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000018
> > kernel: IP: [<ffffffff8121b556>] acpi_get_pci_dev+0x113/0x17
> ...
> > Dell Latitude E6400 x86_64 SMP with Fedora 11 userspace.
>
> > Box continues working after Oops, but locks up when exiting X session.
>
> > 2.6.30-git22 works fine (probably because acpi_get_pci_dev is
> > introduced in -rc1)

Sorry about this panic. I was nervous about touching the ACPI
backlight stuff, and with good reason, it seems.

> I've encountered what seems to be the same issue (a NULL pointer
> dereference in acpi_get_pci_dev()). In my case, it caused a kernel panic
> during boot (so I don't have any text logs to attach).
>
> In my case, pci_get_slot() is returning a NULL pointer that
> acpi_get_pci_dev() doesn't check for. The following patch fixes things
> for me. Does it work for you, Alessandro?
>
> (I don't know if it's the "right" fix or not, not being familiar with the
> system. If it is, I can send it in as a proper patch.)

Let me have a think about this.

Thanks.

/ac

>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 8a5bf3b..55b5b90 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -395,7 +395,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> fn = adr & 0xffff;
>
> pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
> - if (hnd == handle)
> + if (!pdev || hnd == handle)
> break;
>
> pbus = pdev->subordinate;
>

2009-06-25 18:59:38

by Alex Chiang

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev

Hi Jeff, Alessandro,

First, thanks for reporting this bug, and apologies for the
inconvenience.

* Jeff Chua <[email protected]>:
> On Thu, Jun 25, 2009 at 11:13 PM, Troy Moure<[email protected]> wrote:
> > (I don't know if it's the "right" fix or not, not being familiar with the
> > system. ?If it is, I can send it in as a proper patch.)
> >
> > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> > index 8a5bf3b..55b5b90 100644
> > --- a/drivers/acpi/pci_root.c
> > +++ b/drivers/acpi/pci_root.c
> > @@ -395,7 +395,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> > ? ? ? ? ? ? ? ?fn ?= adr & 0xffff;
> >
> > ? ? ? ? ? ? ? ?pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
> > - ? ? ? ? ? ? ? if (hnd == handle)
> > + ? ? ? ? ? ? ? if (!pdev || hnd == handle)
> > ? ? ? ? ? ? ? ? ? ? ? ?break;
> >
> > ? ? ? ? ? ? ? ?pbus = pdev->subordinate;

I'm a little hesitant to do this (yet), because it means one of
my assumptions was wrong.

Can you please try this debug patch and send me the dmesg output?
Please boot with 'debug'. I did add the same NULL check so you
shouldn't crash, and can send me the output after you're done
booting up.

Also, if you could include the output of 'lspci -v', that would
be great too.

Thanks.

/ac

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 8a5bf3b..7674987 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -355,12 +355,20 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
struct pci_dev *pdev = NULL;
struct acpi_handle_node *node, *tmp;
struct acpi_pci_root *root;
+ struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
LIST_HEAD(device_list);

/*
* Walk up the ACPI CA namespace until we reach a PCI root bridge.
*/
phandle = handle;
+
+ acpi_get_name(phandle, ACPI_FULL_PATHNAME, &buffer);
+ printk("Starting root bridge search from %s\n", (char *)buffer.pointer);
+ kfree(buffer.pointer);
+ buffer.pointer = NULL;
+ buffer.length = 0;
+
while (!acpi_is_root_bridge(phandle)) {
node = kzalloc(sizeof(struct acpi_handle_node), GFP_KERNEL);
if (!node)
@@ -370,6 +378,12 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
node->handle = phandle;
list_add(&node->node, &device_list);

+ acpi_get_name(phandle, ACPI_FULL_PATHNAME, &buffer);
+ printk("+ Adding %s\n", (char *)buffer.pointer);
+ kfree(buffer.pointer);
+ buffer.pointer = NULL;
+ buffer.length = 0;
+
status = acpi_get_parent(phandle, &phandle);
if (ACPI_FAILURE(status))
goto out;
@@ -380,6 +394,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
goto out;

pbus = root->bus;
+ dev_info(&pbus->dev, "I'm a little pci_bus, short and stout...\n");

/*
* Now, walk back down the PCI device tree until we return to our
@@ -394,7 +409,16 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
dev = (adr >> 16) & 0xffff;
fn = adr & 0xffff;

+ printk("Searching for %04x:%02x:%02x.%d\n",
+ pci_domain_nr(pbus), pbus->number,
+ PCI_SLOT(PCI_DEVFN(dev, fn)),
+ PCI_FUNC(PCI_DEVFN(dev, fn)));
+
pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
+ if (!pdev) {
+ printk("Ouch.\n");
+ break;
+ }
if (hnd == handle)
break;

2009-06-25 21:32:19

by Alessandro Suardi

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev

On Thu, Jun 25, 2009 at 8:59 PM, Alex Chiang<[email protected]> wrote:
> Hi Jeff, Alessandro,
>
> First, thanks for reporting this bug, and apologies for the
> inconvenience.

No problem - that's why we're trying out prerelease kernels :)

> * Jeff Chua <[email protected]>:
>> On Thu, Jun 25, 2009 at 11:13 PM, Troy Moure<[email protected]> wrote:
>> > (I don't know if it's the "right" fix or not, not being familiar with the
>> > system. ?If it is, I can send it in as a proper patch.)
>> >
>> > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> > index 8a5bf3b..55b5b90 100644
>> > --- a/drivers/acpi/pci_root.c
>> > +++ b/drivers/acpi/pci_root.c
>> > @@ -395,7 +395,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
>> > ? ? ? ? ? ? ? ?fn ?= adr & 0xffff;
>> >
>> > ? ? ? ? ? ? ? ?pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
>> > - ? ? ? ? ? ? ? if (hnd == handle)
>> > + ? ? ? ? ? ? ? if (!pdev || hnd == handle)
>> > ? ? ? ? ? ? ? ? ? ? ? ?break;
>> >
>> > ? ? ? ? ? ? ? ?pbus = pdev->subordinate;
>
> I'm a little hesitant to do this (yet), because it means one of
> my assumptions was wrong.
>
> Can you please try this debug patch and send me the dmesg output?
> Please boot with 'debug'. I did add the same NULL check so you
> shouldn't crash, and can send me the output after you're done
> booting up.
>
> Also, if you could include the output of 'lspci -v', that would
> be great too.
>
> Thanks.
>
> /ac
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 8a5bf3b..7674987 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -355,12 +355,20 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> ? ? ? ?struct pci_dev *pdev = NULL;
> ? ? ? ?struct acpi_handle_node *node, *tmp;
> ? ? ? ?struct acpi_pci_root *root;
> + ? ? ? struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
> ? ? ? ?LIST_HEAD(device_list);
>
> ? ? ? ?/*
> ? ? ? ? * Walk up the ACPI CA namespace until we reach a PCI root bridge.
> ? ? ? ? */
> ? ? ? ?phandle = handle;
> +
> + ? ? ? acpi_get_name(phandle, ACPI_FULL_PATHNAME, &buffer);
> + ? ? ? printk("Starting root bridge search from %s\n", (char *)buffer.pointer);
> + ? ? ? kfree(buffer.pointer);
> + ? ? ? buffer.pointer = NULL;
> + ? ? ? buffer.length = 0;
> +
> ? ? ? ?while (!acpi_is_root_bridge(phandle)) {
> ? ? ? ? ? ? ? ?node = kzalloc(sizeof(struct acpi_handle_node), GFP_KERNEL);
> ? ? ? ? ? ? ? ?if (!node)
> @@ -370,6 +378,12 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> ? ? ? ? ? ? ? ?node->handle = phandle;
> ? ? ? ? ? ? ? ?list_add(&node->node, &device_list);
>
> + ? ? ? ? ? ? ? acpi_get_name(phandle, ACPI_FULL_PATHNAME, &buffer);
> + ? ? ? ? ? ? ? printk("+ Adding %s\n", (char *)buffer.pointer);
> + ? ? ? ? ? ? ? kfree(buffer.pointer);
> + ? ? ? ? ? ? ? buffer.pointer = NULL;
> + ? ? ? ? ? ? ? buffer.length = 0;
> +
> ? ? ? ? ? ? ? ?status = acpi_get_parent(phandle, &phandle);
> ? ? ? ? ? ? ? ?if (ACPI_FAILURE(status))
> ? ? ? ? ? ? ? ? ? ? ? ?goto out;
> @@ -380,6 +394,7 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> ? ? ? ? ? ? ? ?goto out;
>
> ? ? ? ?pbus = root->bus;
> + ? ? ? dev_info(&pbus->dev, "I'm a little pci_bus, short and stout...\n");
>
> ? ? ? ?/*
> ? ? ? ? * Now, walk back down the PCI device tree until we return to our
> @@ -394,7 +409,16 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> ? ? ? ? ? ? ? ?dev = (adr >> 16) & 0xffff;
> ? ? ? ? ? ? ? ?fn ?= adr & 0xffff;
>
> + ? ? ? ? ? ? ? printk("Searching for %04x:%02x:%02x.%d\n",
> + ? ? ? ? ? ? ? ? ? ? ? pci_domain_nr(pbus), pbus->number,
> + ? ? ? ? ? ? ? ? ? ? ? PCI_SLOT(PCI_DEVFN(dev, fn)),
> + ? ? ? ? ? ? ? ? ? ? ? PCI_FUNC(PCI_DEVFN(dev, fn)));
> +
> ? ? ? ? ? ? ? ?pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
> + ? ? ? ? ? ? ? if (!pdev) {
> + ? ? ? ? ? ? ? ? ? ? ? printk("Ouch.\n");
> + ? ? ? ? ? ? ? ? ? ? ? break;
> + ? ? ? ? ? ? ? }
> ? ? ? ? ? ? ? ?if (hnd == handle)
> ? ? ? ? ? ? ? ? ? ? ? ?break;
>
>

I'm not sure what you mean by "boot with debug" - so I'm attaching
the output of dmesg from a 2.6.31-rc1 plus your patch ordinary boot,
(which doesn't crash) and lspci -v -- hoping it's enough... and if it
isn't just please explain what I should do (and I'll do that tomorrow).

Thanks,

--alessandro

"And if a God will lay to rest anywhere we want to go
In your house I long to be, room by room, patiently"

(Audioslave, "Like A Stone")


Attachments:
dmesg-2631rc1-acpidebug (44.11 kB)
lspci-2631rc1 (10.15 kB)
Download all attachments

2009-06-27 11:26:53

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev

Hi!

I had similar oops here, too, on thinkpad x60. It is interesting: most
breakage seems to happen just before -rc1...

Will configuring out acpi_video help?

Pavel

> kernel: [<ffffffff8121ef98>] find_video+0x62/0xa2
> kernel: [<ffffffff81233240>] acpi_ns_walk_namespace+0xc4/0x14d
> kernel: [<ffffffff8121ef36>] ? find_video+0x0/0xa2
> kernel: [<ffffffff8121ef36>] ? find_video+0x0/0xa2
> kernel: [<ffffffff81230b0a>] acpi_walk_namespace+0x82/0xd0
> kernel: [<ffffffff8121ecdd>] acpi_video_get_capabilities+0x61/0xe2
> kernel: [<ffffffffa000a060>] ? dell_send_request+0x60/0x82 [dell_laptop]
> kernel: [<ffffffff8121ede9>] acpi_video_backlight_support+0x27/0x64

Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2009-06-27 18:55:27

by Alex Chiang

[permalink] [raw]
Subject: Re: [2.6.31-rc1] oops in acpi_get_pci_dev

* Pavel Machek <[email protected]>:
>
> I had similar oops here, too, on thinkpad x60. It is interesting: most
> breakage seems to happen just before -rc1...

Already fixed in Linus's tree.

Thanks.

/ac