2008-06-03 08:42:18

by Zdenek Kabelac

[permalink] [raw]
Subject: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

Hello


Another backtrace from suspend code path:
(T61, 2GB, C2D, no SD card)
kernel from git 20080603, commit 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2

----
agpgart-intel 0000:00:00.0: LATE suspend
platform bay.0: LATE suspend
platform dock.0: LATE suspend
Extended CMOS year: 2000
hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3]
Back to C!
BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
caller is do_machine_check+0xa9/0x500
Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31

Call Trace:
[<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
[<ffffffff810184d9>] do_machine_check+0xa9/0x500
[<ffffffff81010e7b>] ? init_8259A+0x1b/0x120
[<ffffffff810189d6>] mce_init+0x56/0xf0
[<ffffffff81018a7b>] mce_resume+0xb/0x10
[<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
[<ffffffff81205068>] sysdev_resume+0x58/0x90
[<ffffffff8120aac9>] device_power_up+0x9/0x10
[<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
[<ffffffff8106f6c6>] enter_state+0x146/0x1d0
[<ffffffff8106f80a>] state_store+0xba/0x100
[<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
[<ffffffff81110fea>] sysfs_write_file+0xca/0x140
[<ffffffff810ba00b>] vfs_write+0xcb/0x190
[<ffffffff810ba1c0>] sys_write+0x50/0x90
[<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80

Extended CMOS year: 2000
BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
caller is retrigger_next_event+0x1c/0x100
Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31

Call Trace:
[<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
[<ffffffff81055d3c>] retrigger_next_event+0x1c/0x100
[<ffffffff812f5710>] ? _spin_unlock+0x30/0x60
[<ffffffff810565ae>] hres_timers_resume+0x1e/0x50
[<ffffffff81059657>] timekeeping_resume+0x137/0x170
[<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
[<ffffffff81205068>] sysdev_resume+0x58/0x90
[<ffffffff8120aac9>] device_power_up+0x9/0x10
[<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
[<ffffffff8106f6c6>] enter_state+0x146/0x1d0
[<ffffffff8106f80a>] state_store+0xba/0x100
[<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
[<ffffffff81110fea>] sysfs_write_file+0xca/0x140
[<ffffffff810ba00b>] vfs_write+0xcb/0x190
[<ffffffff810ba1c0>] sys_write+0x50/0x90
[<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80

BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
caller is retrigger_next_event+0x8b/0x100
Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31

Call Trace:
[<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
[<ffffffff81055dab>] retrigger_next_event+0x8b/0x100
[<ffffffff810565ae>] hres_timers_resume+0x1e/0x50
[<ffffffff81059657>] timekeeping_resume+0x137/0x170
[<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
[<ffffffff81205068>] sysdev_resume+0x58/0x90
[<ffffffff8120aac9>] device_power_up+0x9/0x10
[<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
[<ffffffff8106f6c6>] enter_state+0x146/0x1d0
[<ffffffff8106f80a>] state_store+0xba/0x100
[<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
[<ffffffff81110fea>] sysfs_write_file+0xca/0x140
[<ffffffff810ba00b>] vfs_write+0xcb/0x190
[<ffffffff810ba1c0>] sys_write+0x50/0x90
[<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80

platform dock.0: EARLY resume
platform bay.0: EARLY resume
agpgart-intel 0000:00:00.0: EARLY resume
pci 0000:00:02.0: EARLY resume
pci 0000:00:02.1: EARLY resume
e1000e 0000:00:19.0: EARLY resume
----

Zdenek


2008-06-03 11:51:31

by Jiri Kosina

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

On Tue, 3 Jun 2008, Zdenek Kabelac wrote:

> Another backtrace from suspend code path:
> (T61, 2GB, C2D, no SD card)
> kernel from git 20080603, commit 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2
> ----
> agpgart-intel 0000:00:00.0: LATE suspend
> platform bay.0: LATE suspend
> platform dock.0: LATE suspend
> Extended CMOS year: 2000
> hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3]
> Back to C!
> BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
> caller is do_machine_check+0xa9/0x500
> Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31
> Call Trace:
> [<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
> [<ffffffff810184d9>] do_machine_check+0xa9/0x500
> [<ffffffff81010e7b>] ? init_8259A+0x1b/0x120
> [<ffffffff810189d6>] mce_init+0x56/0xf0
> [<ffffffff81018a7b>] mce_resume+0xb/0x10
> [<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
> [<ffffffff81205068>] sysdev_resume+0x58/0x90
> [<ffffffff8120aac9>] device_power_up+0x9/0x10
> [<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
> [<ffffffff8106f6c6>] enter_state+0x146/0x1d0
> [<ffffffff8106f80a>] state_store+0xba/0x100
> [<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
> [<ffffffff81110fea>] sysfs_write_file+0xca/0x140
> [<ffffffff810ba00b>] vfs_write+0xcb/0x190
> [<ffffffff810ba1c0>] sys_write+0x50/0x90
> [<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80

This looks very much like the oops you reported here:
http://lkml.org/lkml/2008/4/7/130

Is this also a virtual machine run under KVM, as it has been in the
aforementioned thread?

Thanks,

--
Jiri Kosina
SUSE Labs

2008-06-03 12:05:28

by Jiri Kosina

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

[ re-introduced LKML to CC, and also added KVM CCs]

On Tue, 3 Jun 2008, Zdenek Kabelac wrote:

> 2008/6/3 Jiri Kosina <[email protected]>:
> > On Tue, 3 Jun 2008, Zdenek Kabelac wrote:
> >
> >> Another backtrace from suspend code path:
> >> (T61, 2GB, C2D, no SD card)
> >> kernel from git 20080603, commit 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2
> >> ----
> >> agpgart-intel 0000:00:00.0: LATE suspend
> >> platform bay.0: LATE suspend
> >> platform dock.0: LATE suspend
> >> Extended CMOS year: 2000
> >> hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3]
> >> Back to C!
> >> BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
> >> caller is do_machine_check+0xa9/0x500
> >> Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31
> >> Call Trace:
> >> [<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
> >> [<ffffffff810184d9>] do_machine_check+0xa9/0x500
> >> [<ffffffff81010e7b>] ? init_8259A+0x1b/0x120
> >> [<ffffffff810189d6>] mce_init+0x56/0xf0
> >> [<ffffffff81018a7b>] mce_resume+0xb/0x10
> >> [<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
> >> [<ffffffff81205068>] sysdev_resume+0x58/0x90
> >> [<ffffffff8120aac9>] device_power_up+0x9/0x10
> >> [<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
> >> [<ffffffff8106f6c6>] enter_state+0x146/0x1d0
> >> [<ffffffff8106f80a>] state_store+0xba/0x100
> >> [<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
> >> [<ffffffff81110fea>] sysfs_write_file+0xca/0x140
> >> [<ffffffff810ba00b>] vfs_write+0xcb/0x190
> >> [<ffffffff810ba1c0>] sys_write+0x50/0x90
> >> [<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80
> >
> > This looks very much like the oops you reported here:
> > http://lkml.org/lkml/2008/4/7/130
> >
> > Is this also a virtual machine run under KVM, as it has been in the
> > aforementioned thread?
>
>
> Ahh yes - you are right , I've completely forget about that old post -
> I've thought that my post are usually getting fixed sooner :)
> So yes - this is actually the same bug which is still not fixed within
> the latest kernel - the machine is running qemu guest (which seems to
> me now somehow also slower)

OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the
resume path. The original bug-report is on
http://lkml.org/lkml/2008/4/7/130

--
Jiri Kosina
SUSE Labs

2008-06-04 09:41:47

by Avi Kivity

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

Jiri Kosina wrote:
> [ re-introduced LKML to CC, and also added KVM CCs]
>
> On Tue, 3 Jun 2008, Zdenek Kabelac wrote:
>
>
>> 2008/6/3 Jiri Kosina <[email protected]>:
>>
>>> On Tue, 3 Jun 2008, Zdenek Kabelac wrote:
>>>
>>>
>>>> Another backtrace from suspend code path:
>>>> (T61, 2GB, C2D, no SD card)
>>>> kernel from git 20080603, commit 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2
>>>> ----
>>>> agpgart-intel 0000:00:00.0: LATE suspend
>>>> platform bay.0: LATE suspend
>>>> platform dock.0: LATE suspend
>>>> Extended CMOS year: 2000
>>>> hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3]
>>>> Back to C!
>>>> BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
>>>> caller is do_machine_check+0xa9/0x500
>>>> Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31
>>>> Call Trace:
>>>> [<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
>>>> [<ffffffff810184d9>] do_machine_check+0xa9/0x500
>>>> [<ffffffff81010e7b>] ? init_8259A+0x1b/0x120
>>>> [<ffffffff810189d6>] mce_init+0x56/0xf0
>>>> [<ffffffff81018a7b>] mce_resume+0xb/0x10
>>>> [<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
>>>> [<ffffffff81205068>] sysdev_resume+0x58/0x90
>>>> [<ffffffff8120aac9>] device_power_up+0x9/0x10
>>>> [<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
>>>> [<ffffffff8106f6c6>] enter_state+0x146/0x1d0
>>>> [<ffffffff8106f80a>] state_store+0xba/0x100
>>>> [<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
>>>> [<ffffffff81110fea>] sysfs_write_file+0xca/0x140
>>>> [<ffffffff810ba00b>] vfs_write+0xcb/0x190
>>>> [<ffffffff810ba1c0>] sys_write+0x50/0x90
>>>> [<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80
>>>>
>>> This looks very much like the oops you reported here:
>>> http://lkml.org/lkml/2008/4/7/130
>>>
>>> Is this also a virtual machine run under KVM, as it has been in the
>>> aforementioned thread?
>>>
>> Ahh yes - you are right , I've completely forget about that old post -
>> I've thought that my post are usually getting fixed sooner :)
>> So yes - this is actually the same bug which is still not fixed within
>> the latest kernel - the machine is running qemu guest (which seems to
>> me now somehow also slower)
>>
>
> OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the
> resume path. The original bug-report is on
> http://lkml.org/lkml/2008/4/7/130
>
>

Wait, is this in a virtual machine, or on a host that's also running a
virtual machine (or has the kvm modules loaded)?

I looked at the kvm host resume path, and it doesn't touch interrupts.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2008-06-04 09:57:22

by Zdenek Kabelac

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

2008/6/4 Avi Kivity <[email protected]>:
> Jiri Kosina wrote:
>>
>> [ re-introduced LKML to CC, and also added KVM CCs]
>>
>> On Tue, 3 Jun 2008, Zdenek Kabelac wrote:
>>
>>
>>>
>>> 2008/6/3 Jiri Kosina <[email protected]>:
>>>
>>>>
>>>> On Tue, 3 Jun 2008, Zdenek Kabelac wrote:
>>>>
>>>>
>>>>>
>>>>> Another backtrace from suspend code path:
>>>>> (T61, 2GB, C2D, no SD card)
>>>>> kernel from git 20080603, commit
>>>>> 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2
>>>>> ----
>>>>> agpgart-intel 0000:00:00.0: LATE suspend
>>>>> platform bay.0: LATE suspend
>>>>> platform dock.0: LATE suspend
>>>>> Extended CMOS year: 2000
>>>>> hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3]
>>>>> Back to C!
>>>>> BUG: using smp_processor_id() in preemptible [00000000] code:
>>>>> pm-suspend/17334
>>>>> caller is do_machine_check+0xa9/0x500
>>>>> Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31
>>>>> Call Trace:
>>>>> [<ffffffff8118347c>] debug_smp_processor_id+0xcc/0xd0
>>>>> [<ffffffff810184d9>] do_machine_check+0xa9/0x500
>>>>> [<ffffffff81010e7b>] ? init_8259A+0x1b/0x120
>>>>> [<ffffffff810189d6>] mce_init+0x56/0xf0
>>>>> [<ffffffff81018a7b>] mce_resume+0xb/0x10
>>>>> [<ffffffff81204fd0>] __sysdev_resume+0x20/0x60
>>>>> [<ffffffff81205068>] sysdev_resume+0x58/0x90
>>>>> [<ffffffff8120aac9>] device_power_up+0x9/0x10
>>>>> [<ffffffff8106f4f7>] suspend_devices_and_enter+0x147/0x1a0
>>>>> [<ffffffff8106f6c6>] enter_state+0x146/0x1d0
>>>>> [<ffffffff8106f80a>] state_store+0xba/0x100
>>>>> [<ffffffff81177ae7>] kobj_attr_store+0x17/0x20
>>>>> [<ffffffff81110fea>] sysfs_write_file+0xca/0x140
>>>>> [<ffffffff810ba00b>] vfs_write+0xcb/0x190
>>>>> [<ffffffff810ba1c0>] sys_write+0x50/0x90
>>>>> [<ffffffff8100c4fb>] system_call_after_swapgs+0x7b/0x80
>>>>>
>>>>
>>>> This looks very much like the oops you reported here:
>>>> http://lkml.org/lkml/2008/4/7/130
>>>>
>>>> Is this also a virtual machine run under KVM, as it has been in the
>>>> aforementioned thread?
>>>>
>>>
>>> Ahh yes - you are right , I've completely forget about that old post -
>>> I've thought that my post are usually getting fixed sooner :)
>>> So yes - this is actually the same bug which is still not fixed within
>>> the latest kernel - the machine is running qemu guest (which seems to
>>> me now somehow also slower)
>>>
>>
>> OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the
>> resume path. The original bug-report is on http://lkml.org/lkml/2008/4/7/130
>>
>>
>
> Wait, is this in a virtual machine, or on a host that's also running a
> virtual machine (or has the kvm modules loaded)?
>
> I looked at the kvm host resume path, and it doesn't touch interrupts.

Oops is from my real-hardware T61 (host) running the kvm-qemu (guest)
and was noticed after suspend.
But everything seemed to work just fine - host&guest continued to
operate normally after resume.

Zdenek

2008-06-04 10:36:32

by Avi Kivity

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

Zdenek Kabelac wrote:
> Oops is from my real-hardware T61 (host) running the kvm-qemu (guest)
> and was noticed after suspend.
> But everything seemed to work just fine - host&guest continued to
> operate normally after resume.
>

Can you reproduce this, with and without the kvm modules loaded?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2008-06-04 10:54:52

by Thomas Gleixner

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

On Tue, 3 Jun 2008, Jiri Kosina wrote:
> > Ahh yes - you are right , I've completely forget about that old post -
> > I've thought that my post are usually getting fixed sooner :)
> > So yes - this is actually the same bug which is still not fixed within
> > the latest kernel - the machine is running qemu guest (which seems to
> > me now somehow also slower)
>
> OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the
> resume path. The original bug-report is on
> http://lkml.org/lkml/2008/4/7/130

sysdev_resume() is supposed to run with interrupts disabled, at least
it was that way when the timekeeping_resume code was written.

Thanks,
tglx

2008-06-04 11:13:23

by Jiri Kosina

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

On Wed, 4 Jun 2008, Thomas Gleixner wrote:

> > OK, so it looks like KVM could be wrongly enabling IRQs/preemption on
> > the resume path. The original bug-report is on
> > http://lkml.org/lkml/2008/4/7/130
> sysdev_resume() is supposed to run with interrupts disabled, at least
> it was that way when the timekeeping_resume code was written.

True. However in Zdenek's case, interrupts/preemption gets enabled
somewhere during the resume, which correctly triggers the warning.

Thanks,

--
Jiri Kosina
SUSE Labs

2008-06-04 11:16:47

by Avi Kivity

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334

Jiri Kosina wrote:
> On Wed, 4 Jun 2008, Thomas Gleixner wrote:
>
>
>>> OK, so it looks like KVM could be wrongly enabling IRQs/preemption on
>>> the resume path. The original bug-report is on
>>> http://lkml.org/lkml/2008/4/7/130
>>>
>> sysdev_resume() is supposed to run with interrupts disabled, at least
>> it was that way when the timekeeping_resume code was written.
>>
>
> True. However in Zdenek's case, interrupts/preemption gets enabled
> somewhere during the resume, which correctly triggers the warning.
>
>

We might add a check to the generic resume controller, to check after
each ->resume() method and warn in case of interrupts enabled. That
will pinpoint the offending driver immediately.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.