commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
can be NULL, then a NULL was passed to pci_get_slot, this results
the kernel oops when resume from suspend.
This patch resolves following kernel oops:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
Signed-off-by: Xiaotian Feng <[email protected]>
---
drivers/acpi/pci_root.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 3112221..3c35144 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
if (!pdev || hnd == handle)
break;
- pbus = pdev->subordinate;
+ if (pdev->subordinate)
+ pbus = pdev->subordinate;
+ else
+ pbus = pdev->bus;
+
pci_dev_put(pdev);
}
out:
--
1.6.2.5
Hi Xiaotian,
Thanks for the bug report.
* Xiaotian Feng <[email protected]>:
> commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
> can be NULL, then a NULL was passed to pci_get_slot, this results
> the kernel oops when resume from suspend.
>
> This patch resolves following kernel oops:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>
> Signed-off-by: Xiaotian Feng <[email protected]>
> ---
> drivers/acpi/pci_root.c | 6 +++++-
> 1 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> index 3112221..3c35144 100644
> --- a/drivers/acpi/pci_root.c
> +++ b/drivers/acpi/pci_root.c
> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> if (!pdev || hnd == handle)
> break;
>
> - pbus = pdev->subordinate;
> + if (pdev->subordinate)
> + pbus = pdev->subordinate;
> + else
> + pbus = pdev->bus;
> +
I'm a little confused by this. If we start from the PCI root
bridge and walk back down the hierarchy, shouldn't everything
between the root and the device be a P2P bridge?
What is special about suspend/resume that causes the subordinate
bus to become NULL?
Can you send the full stacktrace?
Thanks.
/ac
On Monday 28 September 2009, Alex Chiang wrote:
> Hi Xiaotian,
>
> Thanks for the bug report.
>
> * Xiaotian Feng <[email protected]>:
> > commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
> > can be NULL, then a NULL was passed to pci_get_slot, this results
> > the kernel oops when resume from suspend.
> >
> > This patch resolves following kernel oops:
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> >
> > Signed-off-by: Xiaotian Feng <[email protected]>
> > ---
> > drivers/acpi/pci_root.c | 6 +++++-
> > 1 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> > index 3112221..3c35144 100644
> > --- a/drivers/acpi/pci_root.c
> > +++ b/drivers/acpi/pci_root.c
> > @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> > if (!pdev || hnd == handle)
> > break;
> >
> > - pbus = pdev->subordinate;
> > + if (pdev->subordinate)
> > + pbus = pdev->subordinate;
> > + else
> > + pbus = pdev->bus;
> > +
>
> I'm a little confused by this. If we start from the PCI root
> bridge and walk back down the hierarchy, shouldn't everything
> between the root and the device be a P2P bridge?
Well, if my reading of the code is correct, there's no guarantee that
pci_get_slot() will always return either the right device or a bridge. If it
returns anything that's not a bridge or the device we're looking for, we'll hit
the NULL pointer deref.
In fact it looks like we should also clear pdev if no device was found.
Thanks,
Rafael
On Monday 28 September 2009, Rafael J. Wysocki wrote:
> On Monday 28 September 2009, Alex Chiang wrote:
> > Hi Xiaotian,
> >
> > Thanks for the bug report.
> >
> > * Xiaotian Feng <[email protected]>:
> > > commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
> > > can be NULL, then a NULL was passed to pci_get_slot, this results
> > > the kernel oops when resume from suspend.
> > >
> > > This patch resolves following kernel oops:
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> > > IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> > >
> > > Signed-off-by: Xiaotian Feng <[email protected]>
> > > ---
> > > drivers/acpi/pci_root.c | 6 +++++-
> > > 1 files changed, 5 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> > > index 3112221..3c35144 100644
> > > --- a/drivers/acpi/pci_root.c
> > > +++ b/drivers/acpi/pci_root.c
> > > @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> > > if (!pdev || hnd == handle)
> > > break;
> > >
> > > - pbus = pdev->subordinate;
> > > + if (pdev->subordinate)
> > > + pbus = pdev->subordinate;
> > > + else
> > > + pbus = pdev->bus;
> > > +
> >
> > I'm a little confused by this. If we start from the PCI root
> > bridge and walk back down the hierarchy, shouldn't everything
> > between the root and the device be a P2P bridge?
>
> Well, if my reading of the code is correct, there's no guarantee that
> pci_get_slot() will always return either the right device or a bridge.
I should have been more precise.
If devfn of node happens to be the same as devfn of a non-bridge device on
pbus, the pci_get_slot() will return a valid pointer to it, but
pdev->subordinate will be NULL. Is it impossible for some reason?
Thanks,
Rafael
* Rafael J. Wysocki <[email protected]>:
> On Monday 28 September 2009, Rafael J. Wysocki wrote:
> > On Monday 28 September 2009, Alex Chiang wrote:
> > > * Xiaotian Feng <[email protected]>:
> > > > --- a/drivers/acpi/pci_root.c
> > > > +++ b/drivers/acpi/pci_root.c
> > > > @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> > > > if (!pdev || hnd == handle)
> > > > break;
> > > >
> > > > - pbus = pdev->subordinate;
> > > > + if (pdev->subordinate)
> > > > + pbus = pdev->subordinate;
> > > > + else
> > > > + pbus = pdev->bus;
> > > > +
> > >
> > > I'm a little confused by this. If we start from the PCI root
> > > bridge and walk back down the hierarchy, shouldn't everything
> > > between the root and the device be a P2P bridge?
> >
> > Well, if my reading of the code is correct, there's no guarantee that
> > pci_get_slot() will always return either the right device or a bridge.
>
> I should have been more precise.
>
> If devfn of node happens to be the same as devfn of a non-bridge device on
> pbus, the pci_get_slot() will return a valid pointer to it, but
> pdev->subordinate will be NULL. Is it impossible for some reason?
Hm, that's a good thought, but I'm still confused. Here's the
first part of the full function (acpi_get_pci_dev):
phandle = handle;
while (!acpi_is_root_bridge(phandle)) {
node = kzalloc(sizeof(struct acpi_handle_node), GFP_KERNEL);
if (!node)
goto out;
INIT_LIST_HEAD(&node->node);
node->handle = phandle;
list_add(&node->node, &device_list);
status = acpi_get_parent(phandle, &phandle);
if (ACPI_FAILURE(status))
goto out;
}
phandle starts off as the input parameter, and we make successive
calls to acpi_get_parent() to walk up the ACPI device tree until
we get to a root bridge.
My assumption here is that all those ACPI devices must be P2P
bridges.
root = acpi_pci_find_root(phandle);
if (!root)
goto out;
pbus = root->bus;
Now we've got an acpi_pci_root() which has a struct pci_bus, and
we can start walking back down the PCI tree. Except what we're
really doing is iterating across the device_list which we created
above.
device_list should only contain P2P bridges, based on my
assumption above.
list_for_each_entry(node, &device_list, node) {
acpi_handle hnd = node->handle;
status = acpi_evaluate_integer(hnd, "_ADR", NULL, &adr);
if (ACPI_FAILURE(status))
goto out;
dev = (adr >> 16) & 0xffff;
fn = adr & 0xffff;
pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
if (!pdev || hnd == handle)
break;
pbus = pdev->subordinate;
pci_dev_put(pdev);
}
The point you raise about collision between the devfn of 'node'
and some non-bridge device returned by pci_get_slot() seems like
it really shouldn't happen, because we evaluate _ADR for each
node on device_list, in the reverse order that we found them, and
based on my assumption, all those nodes should be bridges.
I'm not saying that Xiaotian's patch is wrong. I'm saying I'd
like to be educated as to why my basic assumption was wrong,
because now you're making me think that this code is pretty
fragile. :-/
Thanks.
/ac
On Tuesday 29 September 2009, Alex Chiang wrote:
> * Rafael J. Wysocki <[email protected]>:
> > On Monday 28 September 2009, Rafael J. Wysocki wrote:
> > > On Monday 28 September 2009, Alex Chiang wrote:
> > > > * Xiaotian Feng <[email protected]>:
> > > > > --- a/drivers/acpi/pci_root.c
> > > > > +++ b/drivers/acpi/pci_root.c
> > > > > @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> > > > > if (!pdev || hnd == handle)
> > > > > break;
> > > > >
> > > > > - pbus = pdev->subordinate;
> > > > > + if (pdev->subordinate)
> > > > > + pbus = pdev->subordinate;
> > > > > + else
> > > > > + pbus = pdev->bus;
> > > > > +
> > > >
> > > > I'm a little confused by this. If we start from the PCI root
> > > > bridge and walk back down the hierarchy, shouldn't everything
> > > > between the root and the device be a P2P bridge?
> > >
> > > Well, if my reading of the code is correct, there's no guarantee that
> > > pci_get_slot() will always return either the right device or a bridge.
> >
> > I should have been more precise.
> >
> > If devfn of node happens to be the same as devfn of a non-bridge device on
> > pbus, the pci_get_slot() will return a valid pointer to it, but
> > pdev->subordinate will be NULL. Is it impossible for some reason?
>
> Hm, that's a good thought, but I'm still confused. Here's the
> first part of the full function (acpi_get_pci_dev):
>
> phandle = handle;
> while (!acpi_is_root_bridge(phandle)) {
> node = kzalloc(sizeof(struct acpi_handle_node), GFP_KERNEL);
> if (!node)
> goto out;
>
> INIT_LIST_HEAD(&node->node);
> node->handle = phandle;
> list_add(&node->node, &device_list);
>
> status = acpi_get_parent(phandle, &phandle);
> if (ACPI_FAILURE(status))
> goto out;
> }
>
> phandle starts off as the input parameter, and we make successive
> calls to acpi_get_parent() to walk up the ACPI device tree until
> we get to a root bridge.
>
> My assumption here is that all those ACPI devices must be P2P
> bridges.
>
> root = acpi_pci_find_root(phandle);
> if (!root)
> goto out;
>
> pbus = root->bus;
>
> Now we've got an acpi_pci_root() which has a struct pci_bus, and
> we can start walking back down the PCI tree. Except what we're
> really doing is iterating across the device_list which we created
> above.
>
> device_list should only contain P2P bridges, based on my
> assumption above.
>
> list_for_each_entry(node, &device_list, node) {
> acpi_handle hnd = node->handle;
> status = acpi_evaluate_integer(hnd, "_ADR", NULL, &adr);
> if (ACPI_FAILURE(status))
> goto out;
> dev = (adr >> 16) & 0xffff;
> fn = adr & 0xffff;
>
> pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
> if (!pdev || hnd == handle)
> break;
>
> pbus = pdev->subordinate;
> pci_dev_put(pdev);
> }
>
> The point you raise about collision between the devfn of 'node'
> and some non-bridge device returned by pci_get_slot() seems like
> it really shouldn't happen, because we evaluate _ADR for each
> node on device_list, in the reverse order that we found them, and
> based on my assumption, all those nodes should be bridges.
You seem to be right, but if the Xiaotian's patch actually fixes the NULL
pointer deref, one of the assumptions is clearly wrong.
> I'm not saying that Xiaotian's patch is wrong. I'm saying I'd
> like to be educated as to why my basic assumption was wrong,
> because now you're making me think that this code is pretty
> fragile. :-/
Perhaps Xiaotian can add some printk()s on top of his patch that will show us
exactly in what conditions pbus becomes NULL.
Thanks,
Rafael
On 09/29/2009 01:38 AM, Alex Chiang wrote:
> Hi Xiaotian,
>
> Thanks for the bug report.
>
> * Xiaotian Feng<[email protected]>:
>
>> commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
>> can be NULL, then a NULL was passed to pci_get_slot, this results
>> the kernel oops when resume from suspend.
>>
>> This patch resolves following kernel oops:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>>
>> Signed-off-by: Xiaotian Feng<[email protected]>
>> ---
>> drivers/acpi/pci_root.c | 6 +++++-
>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>> index 3112221..3c35144 100644
>> --- a/drivers/acpi/pci_root.c
>> +++ b/drivers/acpi/pci_root.c
>> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
>> if (!pdev || hnd == handle)
>> break;
>>
>> - pbus = pdev->subordinate;
>> + if (pdev->subordinate)
>> + pbus = pdev->subordinate;
>> + else
>> + pbus = pdev->bus;
>> +
>>
> I'm a little confused by this. If we start from the PCI root
> bridge and walk back down the hierarchy, shouldn't everything
> between the root and the device be a P2P bridge?
>
> What is special about suspend/resume that causes the subordinate
> bus to become NULL?
>
> Can you send the full stacktrace?
>
> Thanks.
>
> /ac
>
>
>
the full call trace is here:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
PGD 208b9d067 PUD 208a89067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/power/state
CPU 0
Modules linked in: fuse radeon ttm drm_kms_helper drm i2c_algo_bit sco
bridge stp llc bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6
ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm snd_timer i2c_i801
i2c_core snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support
serio_raw ppdev parport_pc parport pcspkr dcdbas ata_generic pata_acpi
[last unloaded: speedstep_lib]
Pid: 35, comm: kacpi_hotplug Not tainted 2.6.32-rc2 #3 OptiPlex 760
RIP: 0010:[<ffffffff812217e7>] [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
RSP: 0018:ffff88022ee69aa0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88022e9b1090 RCX: 00000000000000a0
RDX: 000000000000002f RSI: ffffffff8168ab38 RDI: ffffffff8168ab38
RBP: ffff88022ee69ac0 R08: ffffffff8168ab30 R09: ffff880100000000
R10: ffffffff8168ab50 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000001 R14: ffff88022f712000 R15: ffff88022f710dd0
FS: 0000000000000000(0000) GS:ffff880028200000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 00000001fc298000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kacpi_hotplug (pid: 35, threadinfo ffff88022ee68000, task
ffff88022eefc120)
Stack:
0000000000000018 ffff88022e9b1090 ffff88020880e9c0 0000000000000000
<0> ffff88022ee69b30 ffffffff81254193 0000000000000000 ffff88022ee69ae0
<0> ffff88020880e340 ffff88020880ee38 ffff88022f710208 0000000000000001
Call Trace:
[<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
[<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
[<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
[<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
[<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
[<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
[<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
[<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
[<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
[<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
[<ffffffff8106a244>] worker_thread+0x251/0x347
[<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
[<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
[<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
[<ffffffff81069ff3>] ? worker_thread+0x0/0x347
[<ffffffff8106e0e0>] kthread+0x7f/0x87
[<ffffffff81012cea>] child_rip+0xa/0x20
[<ffffffff81012650>] ? restore_args+0x0/0x30
[<ffffffff8106e061>] ? kthread+0x0/0x87
[<ffffffff81012ce0>] ? child_rip+0x0/0x20
Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00 <49> 8b
5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
RSP <ffff88022ee69aa0>
CR2: 0000000000000028
---[ end trace b5a7793bd9db2a4d ]---
On 09/29/2009 06:50 AM, Rafael J. Wysocki wrote:
> On Tuesday 29 September 2009, Alex Chiang wrote:
>> * Rafael J. Wysocki<[email protected]>:
>>> On Monday 28 September 2009, Rafael J. Wysocki wrote:
>>>> On Monday 28 September 2009, Alex Chiang wrote:
>>>>> * Xiaotian Feng<[email protected]>:
>>>>>> --- a/drivers/acpi/pci_root.c
>>>>>> +++ b/drivers/acpi/pci_root.c
>>>>>> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
>>>>>> if (!pdev || hnd == handle)
>>>>>> break;
>>>>>>
>>>>>> - pbus = pdev->subordinate;
>>>>>> + if (pdev->subordinate)
>>>>>> + pbus = pdev->subordinate;
>>>>>> + else
>>>>>> + pbus = pdev->bus;
>>>>>> +
>>>>>
>>>>> I'm a little confused by this. If we start from the PCI root
>>>>> bridge and walk back down the hierarchy, shouldn't everything
>>>>> between the root and the device be a P2P bridge?
>>>>
>>>> Well, if my reading of the code is correct, there's no guarantee that
>>>> pci_get_slot() will always return either the right device or a bridge.
>>>
>>> I should have been more precise.
>>>
>>> If devfn of node happens to be the same as devfn of a non-bridge device on
>>> pbus, the pci_get_slot() will return a valid pointer to it, but
>>> pdev->subordinate will be NULL. Is it impossible for some reason?
>>
>> Hm, that's a good thought, but I'm still confused. Here's the
>> first part of the full function (acpi_get_pci_dev):
>>
>> phandle = handle;
>> while (!acpi_is_root_bridge(phandle)) {
>> node = kzalloc(sizeof(struct acpi_handle_node), GFP_KERNEL);
>> if (!node)
>> goto out;
>>
>> INIT_LIST_HEAD(&node->node);
>> node->handle = phandle;
>> list_add(&node->node,&device_list);
>>
>> status = acpi_get_parent(phandle,&phandle);
>> if (ACPI_FAILURE(status))
>> goto out;
>> }
>>
>> phandle starts off as the input parameter, and we make successive
>> calls to acpi_get_parent() to walk up the ACPI device tree until
>> we get to a root bridge.
>>
>> My assumption here is that all those ACPI devices must be P2P
>> bridges.
>>
>> root = acpi_pci_find_root(phandle);
>> if (!root)
>> goto out;
>>
>> pbus = root->bus;
>>
>> Now we've got an acpi_pci_root() which has a struct pci_bus, and
>> we can start walking back down the PCI tree. Except what we're
>> really doing is iterating across the device_list which we created
>> above.
>>
>> device_list should only contain P2P bridges, based on my
>> assumption above.
>>
>> list_for_each_entry(node,&device_list, node) {
>> acpi_handle hnd = node->handle;
>> status = acpi_evaluate_integer(hnd, "_ADR", NULL,&adr);
>> if (ACPI_FAILURE(status))
>> goto out;
>> dev = (adr>> 16)& 0xffff;
>> fn = adr& 0xffff;
>>
>> pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
>> if (!pdev || hnd == handle)
>> break;
>>
>> pbus = pdev->subordinate;
>> pci_dev_put(pdev);
>> }
>>
>> The point you raise about collision between the devfn of 'node'
>> and some non-bridge device returned by pci_get_slot() seems like
>> it really shouldn't happen, because we evaluate _ADR for each
>> node on device_list, in the reverse order that we found them, and
>> based on my assumption, all those nodes should be bridges.
>
> You seem to be right, but if the Xiaotian's patch actually fixes the NULL
> pointer deref, one of the assumptions is clearly wrong.
>
>> I'm not saying that Xiaotian's patch is wrong. I'm saying I'd
>> like to be educated as to why my basic assumption was wrong,
>> because now you're making me think that this code is pretty
>> fragile. :-/
>
> Perhaps Xiaotian can add some printk()s on top of his patch that will show us
> exactly in what conditions pbus becomes NULL.
>
> Thanks,
> Rafael
>
Is there any cases that pdev->subordinate is NULL while pdev is bridge
device?
From pci_slot.c::walk_p2p_bridge, there's code like following:
dev = pci_get_slot(pci_bus, PCI_DEVFN(device, function));
if (!dev || !dev->subordinate)
goto out;
It looks like dev->subordinate can be NULL even if in p2p bridge, right?
Thanks
Xiaotian
On Tuesday 29 September 2009, Danny Feng wrote:
> On 09/29/2009 06:50 AM, Rafael J. Wysocki wrote:
> > On Tuesday 29 September 2009, Alex Chiang wrote:
> >> * Rafael J. Wysocki<[email protected]>:
> >>> On Monday 28 September 2009, Rafael J. Wysocki wrote:
> >>>> On Monday 28 September 2009, Alex Chiang wrote:
> >>>>> * Xiaotian Feng<[email protected]>:
> >>>>>> --- a/drivers/acpi/pci_root.c
> >>>>>> +++ b/drivers/acpi/pci_root.c
> >>>>>> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> >>>>>> if (!pdev || hnd == handle)
> >>>>>> break;
> >>>>>>
> >>>>>> - pbus = pdev->subordinate;
> >>>>>> + if (pdev->subordinate)
> >>>>>> + pbus = pdev->subordinate;
> >>>>>> + else
> >>>>>> + pbus = pdev->bus;
> >>>>>> +
> >>>>>
> >>>>> I'm a little confused by this. If we start from the PCI root
> >>>>> bridge and walk back down the hierarchy, shouldn't everything
> >>>>> between the root and the device be a P2P bridge?
> >>>>
> >>>> Well, if my reading of the code is correct, there's no guarantee that
> >>>> pci_get_slot() will always return either the right device or a bridge.
> >>>
> >>> I should have been more precise.
> >>>
> >>> If devfn of node happens to be the same as devfn of a non-bridge device on
> >>> pbus, the pci_get_slot() will return a valid pointer to it, but
> >>> pdev->subordinate will be NULL. Is it impossible for some reason?
> >>
> >> Hm, that's a good thought, but I'm still confused. Here's the
> >> first part of the full function (acpi_get_pci_dev):
> >>
> >> phandle = handle;
> >> while (!acpi_is_root_bridge(phandle)) {
> >> node = kzalloc(sizeof(struct acpi_handle_node), GFP_KERNEL);
> >> if (!node)
> >> goto out;
> >>
> >> INIT_LIST_HEAD(&node->node);
> >> node->handle = phandle;
> >> list_add(&node->node,&device_list);
> >>
> >> status = acpi_get_parent(phandle,&phandle);
> >> if (ACPI_FAILURE(status))
> >> goto out;
> >> }
> >>
> >> phandle starts off as the input parameter, and we make successive
> >> calls to acpi_get_parent() to walk up the ACPI device tree until
> >> we get to a root bridge.
> >>
> >> My assumption here is that all those ACPI devices must be P2P
> >> bridges.
> >>
> >> root = acpi_pci_find_root(phandle);
> >> if (!root)
> >> goto out;
> >>
> >> pbus = root->bus;
> >>
> >> Now we've got an acpi_pci_root() which has a struct pci_bus, and
> >> we can start walking back down the PCI tree. Except what we're
> >> really doing is iterating across the device_list which we created
> >> above.
> >>
> >> device_list should only contain P2P bridges, based on my
> >> assumption above.
> >>
> >> list_for_each_entry(node,&device_list, node) {
> >> acpi_handle hnd = node->handle;
> >> status = acpi_evaluate_integer(hnd, "_ADR", NULL,&adr);
> >> if (ACPI_FAILURE(status))
> >> goto out;
> >> dev = (adr>> 16)& 0xffff;
> >> fn = adr& 0xffff;
> >>
> >> pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
> >> if (!pdev || hnd == handle)
> >> break;
> >>
> >> pbus = pdev->subordinate;
> >> pci_dev_put(pdev);
> >> }
> >>
> >> The point you raise about collision between the devfn of 'node'
> >> and some non-bridge device returned by pci_get_slot() seems like
> >> it really shouldn't happen, because we evaluate _ADR for each
> >> node on device_list, in the reverse order that we found them, and
> >> based on my assumption, all those nodes should be bridges.
> >
> > You seem to be right, but if the Xiaotian's patch actually fixes the NULL
> > pointer deref, one of the assumptions is clearly wrong.
> >
> >> I'm not saying that Xiaotian's patch is wrong. I'm saying I'd
> >> like to be educated as to why my basic assumption was wrong,
> >> because now you're making me think that this code is pretty
> >> fragile. :-/
> >
> > Perhaps Xiaotian can add some printk()s on top of his patch that will show us
> > exactly in what conditions pbus becomes NULL.
> >
> > Thanks,
> > Rafael
> >
> Is there any cases that pdev->subordinate is NULL while pdev is bridge
> device?
> From pci_slot.c::walk_p2p_bridge, there's code like following:
>
> dev = pci_get_slot(pci_bus, PCI_DEVFN(device, function));
> if (!dev || !dev->subordinate)
> goto out;
>
> It looks like dev->subordinate can be NULL even if in p2p bridge, right?
Right, in general, but in this particular case each device we inspect is
supposed to be a parent of another device, which implies that there's a bus
below it (given that it's a PCI device).
Thanks,
Rafael
On Tuesday 29 September 2009, Danny Feng wrote:
> On 09/29/2009 01:38 AM, Alex Chiang wrote:
> > Hi Xiaotian,
> >
> > Thanks for the bug report.
> >
> > * Xiaotian Feng<[email protected]>:
> >
> >> commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
> >> can be NULL, then a NULL was passed to pci_get_slot, this results
> >> the kernel oops when resume from suspend.
> >>
> >> This patch resolves following kernel oops:
> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> >> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> >>
> >> Signed-off-by: Xiaotian Feng<[email protected]>
> >> ---
> >> drivers/acpi/pci_root.c | 6 +++++-
> >> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> >> index 3112221..3c35144 100644
> >> --- a/drivers/acpi/pci_root.c
> >> +++ b/drivers/acpi/pci_root.c
> >> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> >> if (!pdev || hnd == handle)
> >> break;
> >>
> >> - pbus = pdev->subordinate;
> >> + if (pdev->subordinate)
> >> + pbus = pdev->subordinate;
> >> + else
> >> + pbus = pdev->bus;
> >> +
> >>
> > I'm a little confused by this. If we start from the PCI root
> > bridge and walk back down the hierarchy, shouldn't everything
> > between the root and the device be a P2P bridge?
> >
> > What is special about suspend/resume that causes the subordinate
> > bus to become NULL?
> >
> > Can you send the full stacktrace?
> >
> > Thanks.
> >
> > /ac
> >
> >
> >
> the full call trace is here:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> PGD 208b9d067 PUD 208a89067 PMD 0
> Oops: 0000 [#1] SMP
> last sysfs file: /sys/power/state
> CPU 0
> Modules linked in: fuse radeon ttm drm_kms_helper drm i2c_algo_bit sco
> bridge stp llc bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6
> ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_hda_codec_analog
> snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm snd_timer i2c_i801
> i2c_core snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support
> serio_raw ppdev parport_pc parport pcspkr dcdbas ata_generic pata_acpi
> [last unloaded: speedstep_lib]
> Pid: 35, comm: kacpi_hotplug Not tainted 2.6.32-rc2 #3 OptiPlex 760
> RIP: 0010:[<ffffffff812217e7>] [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> RSP: 0018:ffff88022ee69aa0 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff88022e9b1090 RCX: 00000000000000a0
> RDX: 000000000000002f RSI: ffffffff8168ab38 RDI: ffffffff8168ab38
> RBP: ffff88022ee69ac0 R08: ffffffff8168ab30 R09: ffff880100000000
> R10: ffffffff8168ab50 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000001 R14: ffff88022f712000 R15: ffff88022f710dd0
> FS: 0000000000000000(0000) GS:ffff880028200000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000028 CR3: 00000001fc298000 CR4: 00000000000406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kacpi_hotplug (pid: 35, threadinfo ffff88022ee68000, task
> ffff88022eefc120)
> Stack:
> 0000000000000018 ffff88022e9b1090 ffff88020880e9c0 0000000000000000
> <0> ffff88022ee69b30 ffffffff81254193 0000000000000000 ffff88022ee69ae0
> <0> ffff88020880e340 ffff88020880ee38 ffff88022f710208 0000000000000001
> Call Trace:
> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
Have you checked (using gdb) which source code line this corresponds to?
> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
This looks like a device has just been discovered.
> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
Have the machine been docked while suspended?
> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
> [<ffffffff8106a244>] worker_thread+0x251/0x347
> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
> [<ffffffff8106e0e0>] kthread+0x7f/0x87
> [<ffffffff81012cea>] child_rip+0xa/0x20
> [<ffffffff81012650>] ? restore_args+0x0/0x30
> [<ffffffff8106e061>] ? kthread+0x0/0x87
> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00 <49> 8b
> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> RSP <ffff88022ee69aa0>
> CR2: 0000000000000028
> ---[ end trace b5a7793bd9db2a4d ]---
Thanks,
Rafael
* Rafael J. Wysocki <[email protected]>:
> On Tuesday 29 September 2009, Danny Feng wrote:
> > Is there any cases that pdev->subordinate is NULL while pdev is bridge
> > device?
> > From pci_slot.c::walk_p2p_bridge, there's code like following:
> >
> > dev = pci_get_slot(pci_bus, PCI_DEVFN(device, function));
> > if (!dev || !dev->subordinate)
> > goto out;
> >
> > It looks like dev->subordinate can be NULL even if in p2p
> > bridge, right?
In the code you post above, that results from doing an ACPI
namespace walk, which will definitely find non-bridge devices.
> Right, in general, but in this particular case each device we
> inspect is supposed to be a parent of another device, which
> implies that there's a bus below it (given that it's a PCI
> device).
Right, that's why I'm surprised that my assumption broke. But now
that I see that a dock device is involved, maybe that's not so
surprising.
/ac
On Tuesday 29 September 2009, Alex Chiang wrote:
> * Rafael J. Wysocki <[email protected]>:
> > On Tuesday 29 September 2009, Danny Feng wrote:
> > > Is there any cases that pdev->subordinate is NULL while pdev is bridge
> > > device?
> > > From pci_slot.c::walk_p2p_bridge, there's code like following:
> > >
> > > dev = pci_get_slot(pci_bus, PCI_DEVFN(device, function));
> > > if (!dev || !dev->subordinate)
> > > goto out;
> > >
> > > It looks like dev->subordinate can be NULL even if in p2p
> > > bridge, right?
>
> In the code you post above, that results from doing an ACPI
> namespace walk, which will definitely find non-bridge devices.
>
> > Right, in general, but in this particular case each device we
> > inspect is supposed to be a parent of another device, which
> > implies that there's a bus below it (given that it's a PCI
> > device).
>
> Right, that's why I'm surprised that my assumption broke. But now
> that I see that a dock device is involved, maybe that's not so
> surprising.
Yeah. Still, I'd like to know the root cause.
Thanks,
Rafael
On 09/30/2009 04:12 AM, Rafael J. Wysocki wrote:
> On Tuesday 29 September 2009, Danny Feng wrote:
>> On 09/29/2009 01:38 AM, Alex Chiang wrote:
>>> Hi Xiaotian,
>>>
>>> Thanks for the bug report.
>>>
>>> * Xiaotian Feng<[email protected]>:
>>>
>>>> commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
>>>> can be NULL, then a NULL was passed to pci_get_slot, this results
>>>> the kernel oops when resume from suspend.
>>>>
>>>> This patch resolves following kernel oops:
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>>>> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>>>>
>>>> Signed-off-by: Xiaotian Feng<[email protected]>
>>>> ---
>>>> drivers/acpi/pci_root.c | 6 +++++-
>>>> 1 files changed, 5 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
>>>> index 3112221..3c35144 100644
>>>> --- a/drivers/acpi/pci_root.c
>>>> +++ b/drivers/acpi/pci_root.c
>>>> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
>>>> if (!pdev || hnd == handle)
>>>> break;
>>>>
>>>> - pbus = pdev->subordinate;
>>>> + if (pdev->subordinate)
>>>> + pbus = pdev->subordinate;
>>>> + else
>>>> + pbus = pdev->bus;
>>>> +
>>>>
>>> I'm a little confused by this. If we start from the PCI root
>>> bridge and walk back down the hierarchy, shouldn't everything
>>> between the root and the device be a P2P bridge?
>>>
>>> What is special about suspend/resume that causes the subordinate
>>> bus to become NULL?
>>>
>>> Can you send the full stacktrace?
>>>
>>> Thanks.
>>>
>>> /ac
>>>
>>>
>>>
>> the full call trace is here:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
>> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>> PGD 208b9d067 PUD 208a89067 PMD 0
>> Oops: 0000 [#1] SMP
>> last sysfs file: /sys/power/state
>> CPU 0
>> Modules linked in: fuse radeon ttm drm_kms_helper drm i2c_algo_bit sco
>> bridge stp llc bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6
>> ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_hda_codec_analog
>> snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm snd_timer i2c_i801
>> i2c_core snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support
>> serio_raw ppdev parport_pc parport pcspkr dcdbas ata_generic pata_acpi
>> [last unloaded: speedstep_lib]
>> Pid: 35, comm: kacpi_hotplug Not tainted 2.6.32-rc2 #3 OptiPlex 760
>> RIP: 0010:[<ffffffff812217e7>] [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>> RSP: 0018:ffff88022ee69aa0 EFLAGS: 00010286
>> RAX: 0000000000000000 RBX: ffff88022e9b1090 RCX: 00000000000000a0
>> RDX: 000000000000002f RSI: ffffffff8168ab38 RDI: ffffffff8168ab38
>> RBP: ffff88022ee69ac0 R08: ffffffff8168ab30 R09: ffff880100000000
>> R10: ffffffff8168ab50 R11: 0000000000000000 R12: 0000000000000000
>> R13: 0000000000000001 R14: ffff88022f712000 R15: ffff88022f710dd0
>> FS: 0000000000000000(0000) GS:ffff880028200000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 0000000000000028 CR3: 00000001fc298000 CR4: 00000000000406f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process kacpi_hotplug (pid: 35, threadinfo ffff88022ee68000, task
>> ffff88022eefc120)
>> Stack:
>> 0000000000000018 ffff88022e9b1090 ffff88020880e9c0 0000000000000000
>> <0> ffff88022ee69b30 ffffffff81254193 0000000000000000 ffff88022ee69ae0
>> <0> ffff88020880e340 ffff88020880ee38 ffff88022f710208 0000000000000001
>> Call Trace:
>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
>
> Have you checked (using gdb) which source code line this corresponds to?
>
Yep, the code line corresponds to
pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
Also gdb shows pci_bus->devices has offset of 0x28.
I've put some check in acpi_get_pci_dev, it shows that pbus is NULL when
the panic happens.
>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
>
> This looks like a device has just been discovered.
>
>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
>
> Have the machine been docked while suspended?
I was confused too..I didn't touch anything just suspend and then power
up. Are there some devices unplugged or ejected at suspend stage?
>
>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
>> [<ffffffff8106a244>] worker_thread+0x251/0x347
>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
>> [<ffffffff81012cea>] child_rip+0xa/0x20
>> [<ffffffff81012650>] ? restore_args+0x0/0x30
>> [<ffffffff8106e061>] ? kthread+0x0/0x87
>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>> RSP<ffff88022ee69aa0>
>> CR2: 0000000000000028
>> ---[ end trace b5a7793bd9db2a4d ]---
>
> Thanks,
> Rafael
>
On Wednesday 30 September 2009, Danny Feng wrote:
> On 09/30/2009 04:12 AM, Rafael J. Wysocki wrote:
> > On Tuesday 29 September 2009, Danny Feng wrote:
> >> On 09/29/2009 01:38 AM, Alex Chiang wrote:
> >>> Hi Xiaotian,
> >>>
> >>> Thanks for the bug report.
> >>>
> >>> * Xiaotian Feng<[email protected]>:
> >>>
> >>>> commit 275582 introduces acpi_get_pci_dev(), but pdev->subordinate
> >>>> can be NULL, then a NULL was passed to pci_get_slot, this results
> >>>> the kernel oops when resume from suspend.
> >>>>
> >>>> This patch resolves following kernel oops:
> >>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> >>>> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> >>>>
> >>>> Signed-off-by: Xiaotian Feng<[email protected]>
> >>>> ---
> >>>> drivers/acpi/pci_root.c | 6 +++++-
> >>>> 1 files changed, 5 insertions(+), 1 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
> >>>> index 3112221..3c35144 100644
> >>>> --- a/drivers/acpi/pci_root.c
> >>>> +++ b/drivers/acpi/pci_root.c
> >>>> @@ -387,7 +387,11 @@ struct pci_dev *acpi_get_pci_dev(acpi_handle handle)
> >>>> if (!pdev || hnd == handle)
> >>>> break;
> >>>>
> >>>> - pbus = pdev->subordinate;
> >>>> + if (pdev->subordinate)
> >>>> + pbus = pdev->subordinate;
> >>>> + else
> >>>> + pbus = pdev->bus;
> >>>> +
> >>>>
> >>> I'm a little confused by this. If we start from the PCI root
> >>> bridge and walk back down the hierarchy, shouldn't everything
> >>> between the root and the device be a P2P bridge?
> >>>
> >>> What is special about suspend/resume that causes the subordinate
> >>> bus to become NULL?
> >>>
> >>> Can you send the full stacktrace?
> >>>
> >>> Thanks.
> >>>
> >>> /ac
> >>>
> >>>
> >>>
> >> the full call trace is here:
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> >> IP: [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> >> PGD 208b9d067 PUD 208a89067 PMD 0
> >> Oops: 0000 [#1] SMP
> >> last sysfs file: /sys/power/state
> >> CPU 0
> >> Modules linked in: fuse radeon ttm drm_kms_helper drm i2c_algo_bit sco
> >> bridge stp llc bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6
> >> ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_hda_codec_analog
> >> snd_hda_intel snd_hda_codec snd_hwdep e1000e snd_pcm snd_timer i2c_i801
> >> i2c_core snd soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support
> >> serio_raw ppdev parport_pc parport pcspkr dcdbas ata_generic pata_acpi
> >> [last unloaded: speedstep_lib]
> >> Pid: 35, comm: kacpi_hotplug Not tainted 2.6.32-rc2 #3 OptiPlex 760
> >> RIP: 0010:[<ffffffff812217e7>] [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> >> RSP: 0018:ffff88022ee69aa0 EFLAGS: 00010286
> >> RAX: 0000000000000000 RBX: ffff88022e9b1090 RCX: 00000000000000a0
> >> RDX: 000000000000002f RSI: ffffffff8168ab38 RDI: ffffffff8168ab38
> >> RBP: ffff88022ee69ac0 R08: ffffffff8168ab30 R09: ffff880100000000
> >> R10: ffffffff8168ab50 R11: 0000000000000000 R12: 0000000000000000
> >> R13: 0000000000000001 R14: ffff88022f712000 R15: ffff88022f710dd0
> >> FS: 0000000000000000(0000) GS:ffff880028200000(0000)
> >> knlGS:0000000000000000
> >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> >> CR2: 0000000000000028 CR3: 00000001fc298000 CR4: 00000000000406f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> Process kacpi_hotplug (pid: 35, threadinfo ffff88022ee68000, task
> >> ffff88022eefc120)
> >> Stack:
> >> 0000000000000018 ffff88022e9b1090 ffff88020880e9c0 0000000000000000
> >> <0> ffff88022ee69b30 ffffffff81254193 0000000000000000 ffff88022ee69ae0
> >> <0> ffff88020880e340 ffff88020880ee38 ffff88022f710208 0000000000000001
> >> Call Trace:
> >> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
> >
> > Have you checked (using gdb) which source code line this corresponds to?
> >
> Yep, the code line corresponds to
> pdev = pci_get_slot(pbus, PCI_DEVFN(dev, fn));
>
> Also gdb shows pci_bus->devices has offset of 0x28.
>
> I've put some check in acpi_get_pci_dev, it shows that pbus is NULL when
> the panic happens.
OK, thanks.
> >> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
> >> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
> >> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
> >> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
> >> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
> >> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
> >
> > This looks like a device has just been discovered.
> >
> >> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
> >> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
> >
> > Have the machine been docked while suspended?
> I was confused too..I didn't touch anything just suspend and then power
> up. Are there some devices unplugged or ejected at suspend stage?
Well, that's what I'd like to find out.
Can you please build the kernel with CONFIG_PM_VERBOSE set and with the
$subject patch applied and post a dmesg output from it containing at least one
suspend-resume cycle?
Best,
Rafael
Hi Danny,
* Danny Feng <[email protected]>:
> Call Trace:
> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
> [<ffffffff8106a244>] worker_thread+0x251/0x347
> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
> [<ffffffff8106e0e0>] kthread+0x7f/0x87
> [<ffffffff81012cea>] child_rip+0xa/0x20
> [<ffffffff81012650>] ? restore_args+0x0/0x30
> [<ffffffff8106e061>] ? kthread+0x0/0x87
> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00 <49> 8b
> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> RSP <ffff88022ee69aa0>
> CR2: 0000000000000028
> ---[ end trace b5a7793bd9db2a4d ]---
Can you please reproduce with this debug patch? I'm guessing that
we're dying because we have a NULL parent device, but I'm curious
as to what causes this situation to occur.
Thanks.
/ac
---
diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
index 7338b6a..4c1b128 100644
--- a/drivers/acpi/dock.c
+++ b/drivers/acpi/dock.c
@@ -126,6 +126,7 @@ add_dock_dependent_device(struct dock_station *ds,
{
spin_lock(&ds->dd_lock);
list_add_tail(&dd->list, &ds->dependent_devices);
+ printk("%s adding handle %p\n", __func__, dd->handle);
spin_unlock(&ds->dd_lock);
}
@@ -142,6 +143,8 @@ dock_add_hotplug_device(struct dock_station *ds,
{
mutex_lock(&ds->hp_lock);
list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
+ dump_stack();
+ printk("%s adding handle %p\n", __func__, dd->handle);
mutex_unlock(&ds->hp_lock);
}
@@ -325,14 +328,17 @@ static struct acpi_device * dock_create_acpi_device(acpi_handle handle)
acpi_handle parent;
int ret;
+ printk("%s handle %p\n", __func__, handle);
if (acpi_bus_get_device(handle, &device)) {
/*
* no device created for this object,
* so we should create one.
*/
acpi_get_parent(handle, &parent);
- if (acpi_bus_get_device(parent, &parent_device))
+ if (acpi_bus_get_device(parent, &parent_device)) {
parent_device = NULL;
+ printk("%s no parent, setting NULL\n", __func__);
+ }
ret = acpi_bus_add(&device, parent_device, handle,
ACPI_BUS_TYPE_DEVICE);
@@ -385,8 +391,10 @@ static void hotplug_dock_devices(struct dock_station *ds, u32 event)
* First call driver specific hotplug functions
*/
list_for_each_entry(dd, &ds->hotplug_devices, hotplug_list) {
- if (dd->ops && dd->ops->handler)
+ if (dd->ops && dd->ops->handler) {
+ printk("%s handle %p\n", __func__, dd->handle);
dd->ops->handler(dd->handle, event, dd->context);
+ }
}
/*
@@ -1041,6 +1049,7 @@ static int dock_add(acpi_handle handle)
ret = -ENOMEM;
goto dock_add_err_unregister;
}
+ printk("%s adding self as dependent %p)\n", __func__, dd->handle);
add_dock_dependent_device(dock_station, dd);
dock_station_count++;
On Thursday 01 October 2009, Alex Chiang wrote:
> Hi Danny,
>
> * Danny Feng <[email protected]>:
> > Call Trace:
> > [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
> > [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
> > [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
> > [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
> > [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
> > [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
> > [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
> > [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
> > [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
> > [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
> > [<ffffffff8106a244>] worker_thread+0x251/0x347
> > [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
> > [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
> > [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
> > [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
> > [<ffffffff8106e0e0>] kthread+0x7f/0x87
> > [<ffffffff81012cea>] child_rip+0xa/0x20
> > [<ffffffff81012650>] ? restore_args+0x0/0x30
> > [<ffffffff8106e061>] ? kthread+0x0/0x87
> > [<ffffffff81012ce0>] ? child_rip+0x0/0x20
> > Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
> > 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00 <49> 8b
> > 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
> > RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> > RSP <ffff88022ee69aa0>
> > CR2: 0000000000000028
> > ---[ end trace b5a7793bd9db2a4d ]---
>
> Can you please reproduce with this debug patch? I'm guessing that
> we're dying because we have a NULL parent device, but I'm curious
> as to what causes this situation to occur.
If we had a NULL parent, acpi_get_parent() would return an error. Also, if we
one of the devices is NULL at the PCI level, pci_get_slot() will return NULL.
The only possibility left is that one of the buses we find in the ACPI tables
doesn't have a secondary PCI bus.
I think what happens is that on resume we get a dock notification
(via dock_acpi_notifier registered in dock_init()) for a dock station device
that is present in the ACPI tables, but not physically accessible at the moment
(I guess that falls into the "BIOS bug" category, but we can fix this easily in
the kernel).
So, IMO, the appended patch is the right fix.
Danny, please test it and report back (in particular, please tell us if you see
the "Secondary bus not present" message in dmesg).
Thanks,
Rafael
---
drivers/acpi/pci_root.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux-2.6/drivers/acpi/pci_root.c
===================================================================
--- linux-2.6.orig/drivers/acpi/pci_root.c
+++ linux-2.6/drivers/acpi/pci_root.c
@@ -389,6 +389,18 @@ struct pci_dev *acpi_get_pci_dev(acpi_ha
pbus = pdev->subordinate;
pci_dev_put(pdev);
+
+ /*
+ * During resume from a sleep state we can get a dock
+ * notification for a device that is present in ACPI tables,
+ * but not physically accessible at the moment, so tell the
+ * caller it's not present.
+ */
+ if (!pbus) {
+ dev_info(&pdev->dev, "Secondary bus not present\n");
+ pdev = NULL;
+ break;
+ }
}
out:
list_for_each_entry_safe(node, tmp, &device_list, node)
On 10/02/2009 04:05 AM, Alex Chiang wrote:
> Hi Danny,
>
> * Danny Feng<[email protected]>:
>
>> Call Trace:
>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
>> [<ffffffff8106a244>] worker_thread+0x251/0x347
>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
>> [<ffffffff81012cea>] child_rip+0xa/0x20
>> [<ffffffff81012650>] ? restore_args+0x0/0x30
>> [<ffffffff8106e061>] ? kthread+0x0/0x87
>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>> RSP<ffff88022ee69aa0>
>> CR2: 0000000000000028
>> ---[ end trace b5a7793bd9db2a4d ]---
>>
> Can you please reproduce with this debug patch? I'm guessing that
> we're dying because we have a NULL parent device, but I'm curious
> as to what causes this situation to occur.
>
> Thanks.
> /ac
> ---
> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> index 7338b6a..4c1b128 100644
> --- a/drivers/acpi/dock.c
> +++ b/drivers/acpi/dock.c
> @@ -126,6 +126,7 @@ add_dock_dependent_device(struct dock_station *ds,
> {
> spin_lock(&ds->dd_lock);
> list_add_tail(&dd->list,&ds->dependent_devices);
> + printk("%s adding handle %p\n", __func__, dd->handle);
> spin_unlock(&ds->dd_lock);
> }
>
> @@ -142,6 +143,8 @@ dock_add_hotplug_device(struct dock_station *ds,
> {
> mutex_lock(&ds->hp_lock);
> list_add_tail(&dd->hotplug_list,&ds->hotplug_devices);
> + dump_stack();
> + printk("%s adding handle %p\n", __func__, dd->handle);
> mutex_unlock(&ds->hp_lock);
> }
>
> @@ -325,14 +328,17 @@ static struct acpi_device * dock_create_acpi_device(acpi_handle handle)
> acpi_handle parent;
> int ret;
>
> + printk("%s handle %p\n", __func__, handle);
> if (acpi_bus_get_device(handle,&device)) {
> /*
> * no device created for this object,
> * so we should create one.
> */
> acpi_get_parent(handle,&parent);
> - if (acpi_bus_get_device(parent,&parent_device))
> + if (acpi_bus_get_device(parent,&parent_device)) {
> parent_device = NULL;
> + printk("%s no parent, setting NULL\n", __func__);
> + }
>
> ret = acpi_bus_add(&device, parent_device, handle,
> ACPI_BUS_TYPE_DEVICE);
> @@ -385,8 +391,10 @@ static void hotplug_dock_devices(struct dock_station *ds, u32 event)
> * First call driver specific hotplug functions
> */
> list_for_each_entry(dd,&ds->hotplug_devices, hotplug_list) {
> - if (dd->ops&& dd->ops->handler)
> + if (dd->ops&& dd->ops->handler) {
> + printk("%s handle %p\n", __func__, dd->handle);
> dd->ops->handler(dd->handle, event, dd->context);
> + }
> }
>
> /*
> @@ -1041,6 +1049,7 @@ static int dock_add(acpi_handle handle)
> ret = -ENOMEM;
> goto dock_add_err_unregister;
> }
> + printk("%s adding self as dependent %p)\n", __func__, dd->handle);
> add_dock_dependent_device(dock_station, dd);
>
> dock_station_count++;
>
>
Sorry for the late response, just back from holidays. Will do it soon,
thanks.
On 10/04/2009 06:56 AM, Rafael J. Wysocki wrote:
> On Thursday 01 October 2009, Alex Chiang wrote:
>
>> Hi Danny,
>>
>> * Danny Feng<[email protected]>:
>>
>>> Call Trace:
>>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
>>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
>>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
>>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
>>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
>>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
>>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
>>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
>>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
>>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
>>> [<ffffffff8106a244>] worker_thread+0x251/0x347
>>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
>>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
>>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
>>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
>>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
>>> [<ffffffff81012cea>] child_rip+0xa/0x20
>>> [<ffffffff81012650>] ? restore_args+0x0/0x30
>>> [<ffffffff8106e061>] ? kthread+0x0/0x87
>>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
>>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
>>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
>>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
>>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>>> RSP<ffff88022ee69aa0>
>>> CR2: 0000000000000028
>>> ---[ end trace b5a7793bd9db2a4d ]---
>>>
>> Can you please reproduce with this debug patch? I'm guessing that
>> we're dying because we have a NULL parent device, but I'm curious
>> as to what causes this situation to occur.
>>
> If we had a NULL parent, acpi_get_parent() would return an error. Also, if we
> one of the devices is NULL at the PCI level, pci_get_slot() will return NULL.
> The only possibility left is that one of the buses we find in the ACPI tables
> doesn't have a secondary PCI bus.
>
> I think what happens is that on resume we get a dock notification
> (via dock_acpi_notifier registered in dock_init()) for a dock station device
> that is present in the ACPI tables, but not physically accessible at the moment
> (I guess that falls into the "BIOS bug" category, but we can fix this easily in
> the kernel).
>
> So, IMO, the appended patch is the right fix.
>
> Danny, please test it and report back (in particular, please tell us if you see
> the "Secondary bus not present" message in dmesg).
>
> Thanks,
> Rafael
>
>
> ---
> drivers/acpi/pci_root.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> Index: linux-2.6/drivers/acpi/pci_root.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/pci_root.c
> +++ linux-2.6/drivers/acpi/pci_root.c
> @@ -389,6 +389,18 @@ struct pci_dev *acpi_get_pci_dev(acpi_ha
>
> pbus = pdev->subordinate;
> pci_dev_put(pdev);
> +
> + /*
> + * During resume from a sleep state we can get a dock
> + * notification for a device that is present in ACPI tables,
> + * but not physically accessible at the moment, so tell the
> + * caller it's not present.
> + */
> + if (!pbus) {
> + dev_info(&pdev->dev, "Secondary bus not present\n");
> + pdev = NULL;
> + break;
> + }
> }
> out:
> list_for_each_entry_safe(node, tmp,&device_list, node)
>
>
Sorry for the late response, just back from holidays. Will test it soon,
thanks.
On 10/04/2009 06:56 AM, Rafael J. Wysocki wrote:
> On Thursday 01 October 2009, Alex Chiang wrote:
>> Hi Danny,
>>
>> * Danny Feng<[email protected]>:
>>> Call Trace:
>>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
>>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
>>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
>>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
>>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
>>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
>>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
>>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
>>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
>>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
>>> [<ffffffff8106a244>] worker_thread+0x251/0x347
>>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
>>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
>>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
>>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
>>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
>>> [<ffffffff81012cea>] child_rip+0xa/0x20
>>> [<ffffffff81012650>] ? restore_args+0x0/0x30
>>> [<ffffffff8106e061>] ? kthread+0x0/0x87
>>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
>>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
>>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
>>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
>>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>>> RSP<ffff88022ee69aa0>
>>> CR2: 0000000000000028
>>> ---[ end trace b5a7793bd9db2a4d ]---
>>
>> Can you please reproduce with this debug patch? I'm guessing that
>> we're dying because we have a NULL parent device, but I'm curious
>> as to what causes this situation to occur.
>
> If we had a NULL parent, acpi_get_parent() would return an error. Also, if we
> one of the devices is NULL at the PCI level, pci_get_slot() will return NULL.
> The only possibility left is that one of the buses we find in the ACPI tables
> doesn't have a secondary PCI bus.
>
> I think what happens is that on resume we get a dock notification
> (via dock_acpi_notifier registered in dock_init()) for a dock station device
> that is present in the ACPI tables, but not physically accessible at the moment
> (I guess that falls into the "BIOS bug" category, but we can fix this easily in
> the kernel).
>
> So, IMO, the appended patch is the right fix.
>
> Danny, please test it and report back (in particular, please tell us if you see
> the "Secondary bus not present" message in dmesg).
Yes, this patch works. I got "ata_piix 0000:00:1f.2: Secondary bus not
present". Thanks.
>
> Thanks,
> Rafael
>
>
> ---
> drivers/acpi/pci_root.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> Index: linux-2.6/drivers/acpi/pci_root.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/pci_root.c
> +++ linux-2.6/drivers/acpi/pci_root.c
> @@ -389,6 +389,18 @@ struct pci_dev *acpi_get_pci_dev(acpi_ha
>
> pbus = pdev->subordinate;
> pci_dev_put(pdev);
> +
> + /*
> + * During resume from a sleep state we can get a dock
> + * notification for a device that is present in ACPI tables,
> + * but not physically accessible at the moment, so tell the
> + * caller it's not present.
> + */
> + if (!pbus) {
> + dev_info(&pdev->dev, "Secondary bus not present\n");
> + pdev = NULL;
> + break;
> + }
> }
> out:
> list_for_each_entry_safe(node, tmp,&device_list, node)
>
On 10/02/2009 04:05 AM, Alex Chiang wrote:
> Hi Danny,
>
> * Danny Feng<[email protected]>:
>> Call Trace:
>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
>> [<ffffffff8106a244>] worker_thread+0x251/0x347
>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
>> [<ffffffff81012cea>] child_rip+0xa/0x20
>> [<ffffffff81012650>] ? restore_args+0x0/0x30
>> [<ffffffff8106e061>] ? kthread+0x0/0x87
>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>> RSP<ffff88022ee69aa0>
>> CR2: 0000000000000028
>> ---[ end trace b5a7793bd9db2a4d ]---
>
> Can you please reproduce with this debug patch? I'm guessing that
> we're dying because we have a NULL parent device, but I'm curious
> as to what causes this situation to occur.
I got following call trace at the first boot stage.
Pid: 1, comm: swapper Not tainted 2.6.32-rc2 #16
Call Trace:
[<ffffffff81253538>] register_hotplug_dock_device+0x92/0xf7
[<ffffffff812f1622>] ata_acpi_associate+0x144/0x196
[<ffffffff812dfb29>] ata_host_register+0xc5/0x1fb
[<ffffffff812ee9b1>] ? ata_sff_interrupt+0x0/0x96
[<ffffffff812eca12>] ata_pci_sff_activate_host+0x19b/0x1d1
[<ffffffff8137216e>] ? pcibios_set_master+0x9b/0xa9
[<ffffffff8141d878>] piix_init_one+0x760/0x780
[<ffffffff81220325>] local_pci_probe+0x17/0x1b
[<ffffffff812210fd>] pci_device_probe+0xca/0xfa
[<ffffffff812b93aa>] ? driver_sysfs_add+0x4c/0x71
[<ffffffff812b94f2>] driver_probe_device+0xa2/0x127
[<ffffffff812b95d4>] __driver_attach+0x5d/0x81
[<ffffffff812b9577>] ? __driver_attach+0x0/0x81
[<ffffffff812b8ae9>] bus_for_each_dev+0x59/0x8e
[<ffffffff812b935c>] driver_attach+0x1e/0x20
[<ffffffff812b8fa7>] bus_add_driver+0xb9/0x202
[<ffffffff812b98c7>] driver_register+0x9d/0x10e
[<ffffffff81221343>] __pci_register_driver+0x68/0xd8
[<ffffffff81728e98>] ? piix_init+0x0/0x29
[<ffffffff81728eb1>] piix_init+0x19/0x29
[<ffffffff8100a069>] do_one_initcall+0x5e/0x15e
[<ffffffff816f86c7>] kernel_init+0x170/0x1ca
[<ffffffff81012cea>] child_rip+0xa/0x20
[<ffffffff81012650>] ? restore_args+0x0/0x30
[<ffffffff816f8557>] ? kernel_init+0x0/0x1ca
[<ffffffff81012ce0>] ? child_rip+0x0/0x20
dock_add_hotplug_device adding handle ffff88022f712000
Thanks,
Danny
>
> Thanks.
> /ac
> ---
> diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
> index 7338b6a..4c1b128 100644
> --- a/drivers/acpi/dock.c
> +++ b/drivers/acpi/dock.c
> @@ -126,6 +126,7 @@ add_dock_dependent_device(struct dock_station *ds,
> {
> spin_lock(&ds->dd_lock);
> list_add_tail(&dd->list,&ds->dependent_devices);
> + printk("%s adding handle %p\n", __func__, dd->handle);
> spin_unlock(&ds->dd_lock);
> }
>
> @@ -142,6 +143,8 @@ dock_add_hotplug_device(struct dock_station *ds,
> {
> mutex_lock(&ds->hp_lock);
> list_add_tail(&dd->hotplug_list,&ds->hotplug_devices);
> + dump_stack();
> + printk("%s adding handle %p\n", __func__, dd->handle);
> mutex_unlock(&ds->hp_lock);
> }
>
> @@ -325,14 +328,17 @@ static struct acpi_device * dock_create_acpi_device(acpi_handle handle)
> acpi_handle parent;
> int ret;
>
> + printk("%s handle %p\n", __func__, handle);
> if (acpi_bus_get_device(handle,&device)) {
> /*
> * no device created for this object,
> * so we should create one.
> */
> acpi_get_parent(handle,&parent);
> - if (acpi_bus_get_device(parent,&parent_device))
> + if (acpi_bus_get_device(parent,&parent_device)) {
> parent_device = NULL;
> + printk("%s no parent, setting NULL\n", __func__);
> + }
>
> ret = acpi_bus_add(&device, parent_device, handle,
> ACPI_BUS_TYPE_DEVICE);
> @@ -385,8 +391,10 @@ static void hotplug_dock_devices(struct dock_station *ds, u32 event)
> * First call driver specific hotplug functions
> */
> list_for_each_entry(dd,&ds->hotplug_devices, hotplug_list) {
> - if (dd->ops&& dd->ops->handler)
> + if (dd->ops&& dd->ops->handler) {
> + printk("%s handle %p\n", __func__, dd->handle);
> dd->ops->handler(dd->handle, event, dd->context);
> + }
> }
>
> /*
> @@ -1041,6 +1049,7 @@ static int dock_add(acpi_handle handle)
> ret = -ENOMEM;
> goto dock_add_err_unregister;
> }
> + printk("%s adding self as dependent %p)\n", __func__, dd->handle);
> add_dock_dependent_device(dock_station, dd);
>
> dock_station_count++;
>
On Friday 09 October 2009, Danny Feng wrote:
> On 10/04/2009 06:56 AM, Rafael J. Wysocki wrote:
> > On Thursday 01 October 2009, Alex Chiang wrote:
> >> Hi Danny,
> >>
> >> * Danny Feng<[email protected]>:
> >>> Call Trace:
> >>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
> >>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
> >>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
> >>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
> >>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
> >>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
> >>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
> >>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
> >>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
> >>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
> >>> [<ffffffff8106a244>] worker_thread+0x251/0x347
> >>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
> >>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
> >>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
> >>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
> >>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
> >>> [<ffffffff81012cea>] child_rip+0xa/0x20
> >>> [<ffffffff81012650>] ? restore_args+0x0/0x30
> >>> [<ffffffff8106e061>] ? kthread+0x0/0x87
> >>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
> >>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
> >>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
> >>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
> >>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
> >>> RSP<ffff88022ee69aa0>
> >>> CR2: 0000000000000028
> >>> ---[ end trace b5a7793bd9db2a4d ]---
> >>
> >> Can you please reproduce with this debug patch? I'm guessing that
> >> we're dying because we have a NULL parent device, but I'm curious
> >> as to what causes this situation to occur.
> >
> > If we had a NULL parent, acpi_get_parent() would return an error. Also, if we
> > one of the devices is NULL at the PCI level, pci_get_slot() will return NULL.
> > The only possibility left is that one of the buses we find in the ACPI tables
> > doesn't have a secondary PCI bus.
> >
> > I think what happens is that on resume we get a dock notification
> > (via dock_acpi_notifier registered in dock_init()) for a dock station device
> > that is present in the ACPI tables, but not physically accessible at the moment
> > (I guess that falls into the "BIOS bug" category, but we can fix this easily in
> > the kernel).
> >
> > So, IMO, the appended patch is the right fix.
> >
> > Danny, please test it and report back (in particular, please tell us if you see
> > the "Secondary bus not present" message in dmesg).
> Yes, this patch works. I got "ata_piix 0000:00:1f.2: Secondary bus not
> present".
Now that's a puzzle!
Can you please attach the output of acpidump from this machine?
Rafael
On 10/10/2009 05:46 AM, Rafael J. Wysocki wrote:
> On Friday 09 October 2009, Danny Feng wrote:
>> On 10/04/2009 06:56 AM, Rafael J. Wysocki wrote:
>>> On Thursday 01 October 2009, Alex Chiang wrote:
>>>> Hi Danny,
>>>>
>>>> * Danny Feng<[email protected]>:
>>>>> Call Trace:
>>>>> [<ffffffff81254193>] acpi_get_pci_dev+0x106/0x167
>>>>> [<ffffffff8125545a>] acpi_pci_bind+0x1c/0x86
>>>>> [<ffffffff8116230a>] ? sysfs_create_file+0x2a/0x2c
>>>>> [<ffffffff8125141f>] acpi_add_single_object+0x964/0xa0c
>>>>> [<ffffffff812515a7>] acpi_bus_check_add+0xe0/0x138
>>>>> [<ffffffff81251667>] acpi_bus_scan+0x68/0xa0
>>>>> [<ffffffff812516f4>] acpi_bus_add+0x2a/0x2e
>>>>> [<ffffffff81252c59>] hotplug_dock_devices+0x114/0x13e
>>>>> [<ffffffff8125301a>] acpi_dock_deferred_cb+0xbf/0x192
>>>>> [<ffffffff8124d6ca>] acpi_os_execute_deferred+0x29/0x36
>>>>> [<ffffffff8106a244>] worker_thread+0x251/0x347
>>>>> [<ffffffff8106a1ef>] ? worker_thread+0x1fc/0x347
>>>>> [<ffffffff8124d6a1>] ? acpi_os_execute_deferred+0x0/0x36
>>>>> [<ffffffff8106e426>] ? autoremove_wake_function+0x0/0x39
>>>>> [<ffffffff81069ff3>] ? worker_thread+0x0/0x347
>>>>> [<ffffffff8106e0e0>] kthread+0x7f/0x87
>>>>> [<ffffffff81012cea>] child_rip+0xa/0x20
>>>>> [<ffffffff81012650>] ? restore_args+0x0/0x30
>>>>> [<ffffffff8106e061>] ? kthread+0x0/0x87
>>>>> [<ffffffff81012ce0>] ? child_rip+0x0/0x20
>>>>> Code: ff 49 89 fc 41 89 f5 a9 00 ff ff 07 74 11 be 87 00 00 00 48 c7 c7
>>>>> 45 6d 5a 81 e8 f6 2b e3 ff 48 c7 c7 30 ab 68 81 e8 29 77 20 00<49> 8b
>>>>> 5c 24 28 49 83 c4 28 eb 09 44 39 6b 38 74 10 48 89 c3 48
>>>>> RIP [<ffffffff812217e7>] pci_get_slot+0x4c/0x8c
>>>>> RSP<ffff88022ee69aa0>
>>>>> CR2: 0000000000000028
>>>>> ---[ end trace b5a7793bd9db2a4d ]---
>>>>
>>>> Can you please reproduce with this debug patch? I'm guessing that
>>>> we're dying because we have a NULL parent device, but I'm curious
>>>> as to what causes this situation to occur.
>>>
>>> If we had a NULL parent, acpi_get_parent() would return an error. Also, if we
>>> one of the devices is NULL at the PCI level, pci_get_slot() will return NULL.
>>> The only possibility left is that one of the buses we find in the ACPI tables
>>> doesn't have a secondary PCI bus.
>>>
>>> I think what happens is that on resume we get a dock notification
>>> (via dock_acpi_notifier registered in dock_init()) for a dock station device
>>> that is present in the ACPI tables, but not physically accessible at the moment
>>> (I guess that falls into the "BIOS bug" category, but we can fix this easily in
>>> the kernel).
>>>
>>> So, IMO, the appended patch is the right fix.
>>>
>>> Danny, please test it and report back (in particular, please tell us if you see
>>> the "Secondary bus not present" message in dmesg).
>> Yes, this patch works. I got "ata_piix 0000:00:1f.2: Secondary bus not
>> present".
>
> Now that's a puzzle!
>
> Can you please attach the output of acpidump from this machine?
Hi Rafael, please check the attachment, thanks.
>
> Rafael
>