LinuxLists.cc - oprofile + hibernation = badness

2008-08-18 20:32:48

Subject: oprofile + hibernation = badness

Hi,

I'm probably crazy to do it, but...

$ opcontrol --start
$ echo disk > /sys/power/state

...leads to lots of badness/strangeness. First, lots of APIC errors:

APIC error on CPU0: 40(40)
[repeat 8 times]
APIC error on CPU1: 40(40)
APIC error on CPU1: 40(40)
APIC error on CPU0: 40(40)

These keep on coming all through the suspend/shutdown sequence, also
intermixing with other messages. I'm guessing oprofile is trying to
NMI CPUs that have been brought down?

Now I get some ACPI Exceptions, but I think that these are unrelated
to starting oprofile, because I have seen them on regular shutdowns as
well:

PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
PM: Shrinking memory... done (0 pages freed)
PM: Freed 0 kbytes in 0.44 seconds (0.00 MB/s)
Suspending console(s) (use no_console_suspend to debug)
ACPI Exception (exoparg2-0444): AE_AML_PACKAGE_LIMIT, Index
(000000007) is beyond end of object [20080609]
ACPI Error (psparse-0530): Method parse/execution failed
[\_SB_.PCI0.IDE0.GTM_] (Node f783bfc0), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0530): Method parse/execution failed
[\_SB_.PCI0.IDE0.CHN0._GTM] (Node f783bb40), AE_AML_PACKAGE_LIMIT
ACPI handle has no context!
serial 00:0d: disabled
serial 00:06: disabled
ehci_hcd 0000:00:1d.7: PCI INT A disabled
uhci_hcd 0000:00:1d.3: PCI INT D disabled
uhci_hcd 0000:00:1d.2: PCI INT C disabled
uhci_hcd 0000:00:1d.1: PCI INT B disabled
uhci_hcd 0000:00:1d.0: PCI INT A disabled
ACPI: Preparing to enter system sleep state S4

After that, I see the message "WQ on CPU0, prefer CPU1" a few times.
This must also be bad.

Now some warnings:

------------[ cut here ]------------
WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328 s
mp_call_function_mask+0x194/0x1a0()
Pid: 3711, comm: bash Not tainted 2.6.27-rc3-00415-g122c9e0 #10
[<c013584f>] warn_on_slowpath+0x4f/0x80
[<c0585ff8>] ? log_sample+0xa8/0xc0
[<c03e4e8d>] ? device_power_down+0x5d/0x110
[<c0586140>] ? oprofile_add_ext_sample+0x40/0xc0
[<c05861f4>] ? oprofile_add_sample+0x34/0x50
[<c0588f5c>] ? p4_check_ctr+0x13c/0x160
[<c0160c54>] smp_call_function_mask+0x194/0x1a0
[<c0587940>] ? nmi_cpu_stop+0x0/0x30
[<c014e04c>] ? __atomic_notifier_call_chain+0x3c/0x50
[<c014e07a>] ? atomic_notifier_call_chain+0x1a/0x20
[<c014e0ad>] ? notify_die+0x2d/0x30
[<c0587940>] ? nmi_cpu_stop+0x0/0x30
[<c0160c90>] smp_call_function+0x30/0x60
[<c0587940>] ? nmi_cpu_stop+0x0/0x30
[<c013a16c>] on_each_cpu+0x2c/0x80
[<c05878d4>] nmi_stop+0x14/0x20
[<c05878f5>] nmi_suspend+0x15/0x20
[<c03dff63>] sysdev_suspend+0xa3/0x1f0
[<c03e4f0c>] device_power_down+0xdc/0x110
[<c016355b>] hibernation_snapshot+0x15b/0x220
[<c013644b>] ? printk+0x1b/0x20
[<c0162742>] ? freeze_processes+0x52/0xa0
[<c0163700>] hibernate+0xe0/0x180
[<c01621b0>] ? state_store+0x0/0xd0
[<c016226f>] state_store+0xbf/0xd0
[<c01621b0>] ? state_store+0x0/0xd0
[<c03618d4>] kobj_attr_store+0x24/0x30
[<c01e66c2>] sysfs_write_file+0xa2/0x100
[<c01a85b6>] vfs_write+0x96/0x130
[<c01e6620>] ? sysfs_write_file+0x0/0x100
[<c01a8afd>] sys_write+0x3d/0x70
[<c01040db>] sysenter_do_call+0x12/0x3f
[<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
=======================
---[ end trace df36f449a70f1c8e ]---
PM: Creating hibernation image:
PM: Need to copy 54048 pages
PM: Hibernation image created (54048 pages copied)
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Intel machine check architecture supported.
BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
caller is intel_p4_mcheck_init+0xb9/0x2b0
Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
[<c037664e>] debug_smp_processor_id+0xce/0xd0
[<c010e919>] intel_p4_mcheck_init+0xb9/0x2b0
[<c010dcd9>] mcheck_init+0x59/0x80
[<c0589636>] restore_processor_state+0x1f6/0x230
[<c01635d1>] hibernation_snapshot+0x1d1/0x220
[<c013644b>] ? printk+0x1b/0x20
[<c0162742>] ? freeze_processes+0x52/0xa0
[<c0163700>] hibernate+0xe0/0x180
[<c01621b0>] ? state_store+0x0/0xd0
[<c016226f>] state_store+0xbf/0xd0
[<c01621b0>] ? state_store+0x0/0xd0
[<c03618d4>] kobj_attr_store+0x24/0x30
[<c01e66c2>] sysfs_write_file+0xa2/0x100
[<c01a85b6>] vfs_write+0x96/0x130
[<c01e6620>] ? sysfs_write_file+0x0/0x100
[<c01a8afd>] sys_write+0x3d/0x70
[<c01040db>] sysenter_do_call+0x12/0x3f
[<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
=======================
Intel machine check reporting enabled on CPU#0.
BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
caller is intel_p4_mcheck_init+0x100/0x2b0
Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
[<c037664e>] debug_smp_processor_id+0xce/0xd0
[<c010e960>] intel_p4_mcheck_init+0x100/0x2b0
[<c010dcd9>] mcheck_init+0x59/0x80
[<c0589636>] restore_processor_state+0x1f6/0x230
[<c01635d1>] hibernation_snapshot+0x1d1/0x220
[<c013644b>] ? printk+0x1b/0x20
[<c0162742>] ? freeze_processes+0x52/0xa0
[<c0163700>] hibernate+0xe0/0x180
[<c01621b0>] ? state_store+0x0/0xd0
[<c016226f>] state_store+0xbf/0xd0
[<c01621b0>] ? state_store+0x0/0xd0
[<c03618d4>] kobj_attr_store+0x24/0x30
[<c01e66c2>] sysfs_write_file+0xa2/0x100
[<c01a85b6>] vfs_write+0x96/0x130
[<c01e6620>] ? sysfs_write_file+0x0/0x100
[<c01a8afd>] sys_write+0x3d/0x70
[<c01040db>] sysenter_do_call+0x12/0x3f
[<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
=======================
CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
caller is intel_p4_mcheck_init+0x119/0x2b0
Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
[<c037664e>] debug_smp_processor_id+0xce/0xd0
[<c010e979>] intel_p4_mcheck_init+0x119/0x2b0
[<c010dcd9>] mcheck_init+0x59/0x80
[<c0589636>] restore_processor_state+0x1f6/0x230
[<c01635d1>] hibernation_snapshot+0x1d1/0x220
[<c013644b>] ? printk+0x1b/0x20
[<c0162742>] ? freeze_processes+0x52/0xa0
[<c0163700>] hibernate+0xe0/0x180
[<c01621b0>] ? state_store+0x0/0xd0
[<c016226f>] state_store+0xbf/0xd0
[<c01621b0>] ? state_store+0x0/0xd0
[<c03618d4>] kobj_attr_store+0x24/0x30
[<c01e66c2>] sysfs_write_file+0xa2/0x100
[<c01a85b6>] vfs_write+0x96/0x130
[<c01e6620>] ? sysfs_write_file+0x0/0x100
[<c01a8afd>] sys_write+0x3d/0x70
[<c01040db>] sysenter_do_call+0x12/0x3f
[<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
=======================
CPU0: Thermal monitoring enabled
Force enabled HPET at resume
BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
caller is retrigger_next_event+0x15/0xb0
Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
[<c037664e>] debug_smp_processor_id+0xce/0xd0
[<c014c6c5>] retrigger_next_event+0x15/0xb0
[<c066d3bf>] ? preempt_schedule+0x3f/0x50
[<c066fe1c>] ? _spin_unlock+0x3c/0x50
[<c014c76a>] hres_timers_resume+0xa/0x10
[<c01509bc>] timekeeping_resume+0x12c/0x180
[<c03df894>] __sysdev_resume+0x14/0x50
[<c03dfdae>] sysdev_resume+0x3e/0x80
[<c03e4f4b>] device_power_up+0xb/0x20
[<c01635b0>] hibernation_snapshot+0x1b0/0x220
[<c013644b>] ? printk+0x1b/0x20
[<c0162742>] ? freeze_processes+0x52/0xa0
[<c0163700>] hibernate+0xe0/0x180
[<c01621b0>] ? state_store+0x0/0xd0
[<c016226f>] state_store+0xbf/0xd0
[<c01621b0>] ? state_store+0x0/0xd0
[<c03618d4>] kobj_attr_store+0x24/0x30
[<c01e66c2>] sysfs_write_file+0xa2/0x100
[<c01a85b6>] vfs_write+0x96/0x130
[<c01e6620>] ? sysfs_write_file+0x0/0x100
[<c01a8afd>] sys_write+0x3d/0x70
[<c01040db>] sysenter_do_call+0x12/0x3f
[<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
=======================
Enabling non-boot CPUs ...

...and then, while writing the image, it looks something like this:

APIC error on CPU1: 40(40)
^H^H^H^H 3%<7>APIC error on CPU1: 40(40)
WQ on CPU0, prefer CPU1
APIC error on CPU1: 40(40)
[8 times]
APIC error on CPU1: 40(40)
WQ on CPU0, prefer CPU1
...

The machine comes back with this spamming the logs (rate of about 2 KB/sec):

WQ on CPU0, prefer CPU1
APIC error on CPU1: 40(40)
[9 times, repeat]

I can still log in with ssh and reboot the machine, which finally
gives this warning:

BUG: sleeping function called from invalid context at kernel/sched.c:4625
in_atomic():1, irqs_disabled():0
INFO: lockdep is turned off.
Pid: 4027, comm: oprofiled Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
[<c012c98e>] __might_sleep+0xee/0x140
[<c066c8d8>] wait_for_common+0x28/0x130
[<c01595bb>] ? trace_hardirqs_on+0xb/0x10
[<c066ff13>] ? _spin_unlock_irqrestore+0x43/0x70
[<c066ca72>] wait_for_completion+0x12/0x20
[<c014715d>] synchronize_rcu+0x2d/0x40
[<c01471f0>] ? wakeme_after_rcu+0x0/0x10
[<c01595bb>] ? trace_hardirqs_on+0xb/0x10
[<c014e156>] atomic_notifier_chain_unregister+0x36/0x50
[<c014e17f>] unregister_die_notifier+0xf/0x20
[<c0587b84>] nmi_shutdown+0x44/0x80
[<c0585d4a>] oprofile_shutdown+0x2a/0x60
[<c0586b50>] event_buffer_release+0x10/0x40
[<c01a8f26>] __fput+0xb6/0x180
[<c01a92a9>] fput+0x19/0x20
[<c01a63c7>] filp_close+0x47/0x70
[<c013741b>] put_files_struct+0x9b/0xb0
[<c0137472>] exit_files+0x42/0x60
[<c01386a2>] do_exit+0x192/0x880
[<c014d3b6>] ? up_read+0x16/0x30
[<c0120493>] ? do_page_fault+0x2e3/0x700
[<c01a7cc0>] ? do_sync_write+0x0/0x110
[<c0138dc1>] do_group_exit+0x31/0x90
[<c0138e2f>] sys_exit_group+0xf/0x20
[<c01040db>] sysenter_do_call+0x12/0x3f
=======================
Sending all processes the KILL signal...

The last time I did this, the machine also came up with a corrupt ext3
root filesystem. But it didn't happen this time, so I guess it could
also be unrelated.

What would be the proper way to fix all this? Should oprofile be able
to profile the suspend/resume code as well? Or should "echo disk" give
-EBUSY if oprofile is running? Or
do we simply insert a huge kludge in the form of mutual exclusion
between oprofile and suspend in the Kconfig?

Thanks,

Vegard

PS: This is not a regression. It's probably always been like that.

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-08-18 20:48:30

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

On Monday, 18 of August 2008, Vegard Nossum wrote:
> Hi,
>
> I'm probably crazy to do it, but...
>
> $ opcontrol --start
> $ echo disk > /sys/power/state
>
> ...leads to lots of badness/strangeness. First, lots of APIC errors:
>
> APIC error on CPU0: 40(40)
> [repeat 8 times]
> APIC error on CPU1: 40(40)
> APIC error on CPU1: 40(40)
> APIC error on CPU0: 40(40)
>
> These keep on coming all through the suspend/shutdown sequence, also
> intermixing with other messages. I'm guessing oprofile is trying to
> NMI CPUs that have been brought down?

They CPUs are taken down after suspending devices. If you boot the
kernel with 'no_console_suspend' in the command line, there will be more
debug information to have a look at.

Also, you can do

# echo core > /sys/power/pm_test

before the '$ echo disk > /sys/power/state' (that only tests the suspend
sequence without actually hibernating) and see what symptoms will be
reproduced.

> Now I get some ACPI Exceptions, but I think that these are unrelated
> to starting oprofile, because I have seen them on regular shutdowns as
> well:
>
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.00 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
> PM: Shrinking memory... done (0 pages freed)
> PM: Freed 0 kbytes in 0.44 seconds (0.00 MB/s)
> Suspending console(s) (use no_console_suspend to debug)
> ACPI Exception (exoparg2-0444): AE_AML_PACKAGE_LIMIT, Index
> (000000007) is beyond end of object [20080609]
> ACPI Error (psparse-0530): Method parse/execution failed
> [\_SB_.PCI0.IDE0.GTM_] (Node f783bfc0), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0530): Method parse/execution failed
> [\_SB_.PCI0.IDE0.CHN0._GTM] (Node f783bb40), AE_AML_PACKAGE_LIMIT
> ACPI handle has no context!
> serial 00:0d: disabled
> serial 00:06: disabled
> ehci_hcd 0000:00:1d.7: PCI INT A disabled
> uhci_hcd 0000:00:1d.3: PCI INT D disabled
> uhci_hcd 0000:00:1d.2: PCI INT C disabled
> uhci_hcd 0000:00:1d.1: PCI INT B disabled
> uhci_hcd 0000:00:1d.0: PCI INT A disabled
> ACPI: Preparing to enter system sleep state S4
>
> After that, I see the message "WQ on CPU0, prefer CPU1" a few times.
> This must also be bad.
>
> Now some warnings:
>
> ------------[ cut here ]------------
> WARNING: at /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328 s
> mp_call_function_mask+0x194/0x1a0()
> Pid: 3711, comm: bash Not tainted 2.6.27-rc3-00415-g122c9e0 #10
> [<c013584f>] warn_on_slowpath+0x4f/0x80
> [<c0585ff8>] ? log_sample+0xa8/0xc0
> [<c03e4e8d>] ? device_power_down+0x5d/0x110
> [<c0586140>] ? oprofile_add_ext_sample+0x40/0xc0
> [<c05861f4>] ? oprofile_add_sample+0x34/0x50
> [<c0588f5c>] ? p4_check_ctr+0x13c/0x160
> [<c0160c54>] smp_call_function_mask+0x194/0x1a0
> [<c0587940>] ? nmi_cpu_stop+0x0/0x30
> [<c014e04c>] ? __atomic_notifier_call_chain+0x3c/0x50
> [<c014e07a>] ? atomic_notifier_call_chain+0x1a/0x20
> [<c014e0ad>] ? notify_die+0x2d/0x30
> [<c0587940>] ? nmi_cpu_stop+0x0/0x30
> [<c0160c90>] smp_call_function+0x30/0x60
> [<c0587940>] ? nmi_cpu_stop+0x0/0x30
> [<c013a16c>] on_each_cpu+0x2c/0x80
> [<c05878d4>] nmi_stop+0x14/0x20
> [<c05878f5>] nmi_suspend+0x15/0x20
> [<c03dff63>] sysdev_suspend+0xa3/0x1f0
> [<c03e4f0c>] device_power_down+0xdc/0x110
> [<c016355b>] hibernation_snapshot+0x15b/0x220
> [<c013644b>] ? printk+0x1b/0x20
> [<c0162742>] ? freeze_processes+0x52/0xa0
> [<c0163700>] hibernate+0xe0/0x180
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c016226f>] state_store+0xbf/0xd0
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c03618d4>] kobj_attr_store+0x24/0x30
> [<c01e66c2>] sysfs_write_file+0xa2/0x100
> [<c01a85b6>] vfs_write+0x96/0x130
> [<c01e6620>] ? sysfs_write_file+0x0/0x100
> [<c01a8afd>] sys_write+0x3d/0x70
> [<c01040db>] sysenter_do_call+0x12/0x3f
> [<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
> =======================
> ---[ end trace df36f449a70f1c8e ]---
> PM: Creating hibernation image:
> PM: Need to copy 54048 pages
> PM: Hibernation image created (54048 pages copied)
> x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
> Intel machine check architecture supported.
> BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
> caller is intel_p4_mcheck_init+0xb9/0x2b0
> Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
> [<c037664e>] debug_smp_processor_id+0xce/0xd0
> [<c010e919>] intel_p4_mcheck_init+0xb9/0x2b0
> [<c010dcd9>] mcheck_init+0x59/0x80
> [<c0589636>] restore_processor_state+0x1f6/0x230
> [<c01635d1>] hibernation_snapshot+0x1d1/0x220
> [<c013644b>] ? printk+0x1b/0x20
> [<c0162742>] ? freeze_processes+0x52/0xa0
> [<c0163700>] hibernate+0xe0/0x180
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c016226f>] state_store+0xbf/0xd0
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c03618d4>] kobj_attr_store+0x24/0x30
> [<c01e66c2>] sysfs_write_file+0xa2/0x100
> [<c01a85b6>] vfs_write+0x96/0x130
> [<c01e6620>] ? sysfs_write_file+0x0/0x100
> [<c01a8afd>] sys_write+0x3d/0x70
> [<c01040db>] sysenter_do_call+0x12/0x3f
> [<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
> =======================
> Intel machine check reporting enabled on CPU#0.
> BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
> caller is intel_p4_mcheck_init+0x100/0x2b0
> Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
> [<c037664e>] debug_smp_processor_id+0xce/0xd0
> [<c010e960>] intel_p4_mcheck_init+0x100/0x2b0
> [<c010dcd9>] mcheck_init+0x59/0x80
> [<c0589636>] restore_processor_state+0x1f6/0x230
> [<c01635d1>] hibernation_snapshot+0x1d1/0x220
> [<c013644b>] ? printk+0x1b/0x20
> [<c0162742>] ? freeze_processes+0x52/0xa0
> [<c0163700>] hibernate+0xe0/0x180
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c016226f>] state_store+0xbf/0xd0
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c03618d4>] kobj_attr_store+0x24/0x30
> [<c01e66c2>] sysfs_write_file+0xa2/0x100
> [<c01a85b6>] vfs_write+0x96/0x130
> [<c01e6620>] ? sysfs_write_file+0x0/0x100
> [<c01a8afd>] sys_write+0x3d/0x70
> [<c01040db>] sysenter_do_call+0x12/0x3f
> [<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
> =======================
> CPU0: Intel P4/Xeon Extended MCE MSRs (24) available
> BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
> caller is intel_p4_mcheck_init+0x119/0x2b0
> Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
> [<c037664e>] debug_smp_processor_id+0xce/0xd0
> [<c010e979>] intel_p4_mcheck_init+0x119/0x2b0
> [<c010dcd9>] mcheck_init+0x59/0x80
> [<c0589636>] restore_processor_state+0x1f6/0x230
> [<c01635d1>] hibernation_snapshot+0x1d1/0x220
> [<c013644b>] ? printk+0x1b/0x20
> [<c0162742>] ? freeze_processes+0x52/0xa0
> [<c0163700>] hibernate+0xe0/0x180
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c016226f>] state_store+0xbf/0xd0
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c03618d4>] kobj_attr_store+0x24/0x30
> [<c01e66c2>] sysfs_write_file+0xa2/0x100
> [<c01a85b6>] vfs_write+0x96/0x130
> [<c01e6620>] ? sysfs_write_file+0x0/0x100
> [<c01a8afd>] sys_write+0x3d/0x70
> [<c01040db>] sysenter_do_call+0x12/0x3f
> [<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
> =======================
> CPU0: Thermal monitoring enabled
> Force enabled HPET at resume
> BUG: using smp_processor_id() in preemptible [00000000] code: bash/3711
> caller is retrigger_next_event+0x15/0xb0
> Pid: 3711, comm: bash Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
> [<c037664e>] debug_smp_processor_id+0xce/0xd0
> [<c014c6c5>] retrigger_next_event+0x15/0xb0
> [<c066d3bf>] ? preempt_schedule+0x3f/0x50
> [<c066fe1c>] ? _spin_unlock+0x3c/0x50
> [<c014c76a>] hres_timers_resume+0xa/0x10
> [<c01509bc>] timekeeping_resume+0x12c/0x180
> [<c03df894>] __sysdev_resume+0x14/0x50
> [<c03dfdae>] sysdev_resume+0x3e/0x80
> [<c03e4f4b>] device_power_up+0xb/0x20
> [<c01635b0>] hibernation_snapshot+0x1b0/0x220
> [<c013644b>] ? printk+0x1b/0x20
> [<c0162742>] ? freeze_processes+0x52/0xa0
> [<c0163700>] hibernate+0xe0/0x180
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c016226f>] state_store+0xbf/0xd0
> [<c01621b0>] ? state_store+0x0/0xd0
> [<c03618d4>] kobj_attr_store+0x24/0x30
> [<c01e66c2>] sysfs_write_file+0xa2/0x100
> [<c01a85b6>] vfs_write+0x96/0x130
> [<c01e6620>] ? sysfs_write_file+0x0/0x100
> [<c01a8afd>] sys_write+0x3d/0x70
> [<c01040db>] sysenter_do_call+0x12/0x3f
> [<c0670000>] ? _read_unlock_irqrestore+0x60/0x70
> =======================
> Enabling non-boot CPUs ...
>
> ...and then, while writing the image, it looks something like this:
>
> APIC error on CPU1: 40(40)
> ^H^H^H^H 3%<7>APIC error on CPU1: 40(40)
> WQ on CPU0, prefer CPU1
> APIC error on CPU1: 40(40)
> [8 times]
> APIC error on CPU1: 40(40)
> WQ on CPU0, prefer CPU1
> ...
>
> The machine comes back with this spamming the logs (rate of about 2 KB/sec):
>
> WQ on CPU0, prefer CPU1
> APIC error on CPU1: 40(40)
> [9 times, repeat]
>
> I can still log in with ssh and reboot the machine, which finally
> gives this warning:
>
> BUG: sleeping function called from invalid context at kernel/sched.c:4625
> in_atomic():1, irqs_disabled():0
> INFO: lockdep is turned off.
> Pid: 4027, comm: oprofiled Tainted: G W 2.6.27-rc3-00415-g122c9e0 #10
> [<c012c98e>] __might_sleep+0xee/0x140
> [<c066c8d8>] wait_for_common+0x28/0x130
> [<c01595bb>] ? trace_hardirqs_on+0xb/0x10
> [<c066ff13>] ? _spin_unlock_irqrestore+0x43/0x70
> [<c066ca72>] wait_for_completion+0x12/0x20
> [<c014715d>] synchronize_rcu+0x2d/0x40
> [<c01471f0>] ? wakeme_after_rcu+0x0/0x10
> [<c01595bb>] ? trace_hardirqs_on+0xb/0x10
> [<c014e156>] atomic_notifier_chain_unregister+0x36/0x50
> [<c014e17f>] unregister_die_notifier+0xf/0x20
> [<c0587b84>] nmi_shutdown+0x44/0x80
> [<c0585d4a>] oprofile_shutdown+0x2a/0x60
> [<c0586b50>] event_buffer_release+0x10/0x40
> [<c01a8f26>] __fput+0xb6/0x180
> [<c01a92a9>] fput+0x19/0x20
> [<c01a63c7>] filp_close+0x47/0x70
> [<c013741b>] put_files_struct+0x9b/0xb0
> [<c0137472>] exit_files+0x42/0x60
> [<c01386a2>] do_exit+0x192/0x880
> [<c014d3b6>] ? up_read+0x16/0x30
> [<c0120493>] ? do_page_fault+0x2e3/0x700
> [<c01a7cc0>] ? do_sync_write+0x0/0x110
> [<c0138dc1>] do_group_exit+0x31/0x90
> [<c0138e2f>] sys_exit_group+0xf/0x20
> [<c01040db>] sysenter_do_call+0x12/0x3f
> =======================
> Sending all processes the KILL signal...
>
> The last time I did this, the machine also came up with a corrupt ext3
> root filesystem. But it didn't happen this time, so I guess it could
> also be unrelated.
>
> What would be the proper way to fix all this? Should oprofile be able
> to profile the suspend/resume code as well? Or should "echo disk" give
> -EBUSY if oprofile is running? Or
> do we simply insert a huge kludge in the form of mutual exclusion
> between oprofile and suspend in the Kconfig?

That last thing is probably the safest to do at the moment.

Apparently nmi_suspend() conflicts with oprofile somehow. Also, the offlining
of non-boot CPUs may confuse it. It would be helpful to check if the CPU
hotplug works with oprofile.

Thanks,
Rafael

2008-08-18 21:08:56

by Vegard Nossum

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

On Mon, Aug 18, 2008 at 10:51 PM, Rafael J. Wysocki <[email protected]> wrote:
> Apparently nmi_suspend() conflicts with oprofile somehow. Also, the offlining
> of non-boot CPUs may confuse it. It would be helpful to check if the CPU
> hotplug works with oprofile.

That is a good suggestion :-)

Here is offlining:

CPU 1 is now offline
lockdep: fixing up alternatives.
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
WQ on CPU0, prefer CPU1
CPU1 attaching NULL sched-domain.
CPU0 attaching sched-domain:
domain 0: span 0 level CPU
groups: 0
WQ on CPU0, prefer CPU1
WQ on CPU0, prefer CPU1
WQ on CPU0, prefer CPU1
[repeat last message indefinitely]

Here is onlining:

Booting processor 1/1 ip 6000
Initializing CPU#1
WQ on CPU0, prefer CPU1
WQ on CPU0, prefer CPU1
Calibrating delay using timer specific routine.. 5986.15 BogoMIPS (lpj=29930790)
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 2048K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
CPU1: Thermal monitoring enabled
x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 05
checking TSC synchronization [CPU#0 -> CPU#1]:
Measured 120 cycles TSC warp between CPUs, turning off TSC clock.
Marking TSC unstable due to check_tsc_sync_source failed
APIC error on CPU1: 00(40)
Clockevents: could not switch to one-shot mode:<7>APIC error on CPU1: 40(40)
lapic is not functional.
Could not switch to high resolution mode on CPU 0
Clockevents: could not switch to one-shot mode: lapic is not functional.
Could not switch to high resolution mode on CPU 1
APIC error on CPU1: 40(40)
[sched domains messages
WQ on CPU0, prefer CPU1
APIC error on CPU1: 40(40)
[repeat last message 9 times]

Then follows this pattern indefinitely:

WQ on CPU0, prefer CPU1
APIC error on CPU1: 40(40)
[repeat last message 9 times]

That's basically the same thing as I saw with suspend. So it can be
reproduced easily with CPU hotplug.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-08-18 21:12:33

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

On Monday, 18 of August 2008, Vegard Nossum wrote:
> On Mon, Aug 18, 2008 at 10:51 PM, Rafael J. Wysocki <[email protected]> wrote:
> > Apparently nmi_suspend() conflicts with oprofile somehow. Also, the offlining
> > of non-boot CPUs may confuse it. It would be helpful to check if the CPU
> > hotplug works with oprofile.
>
> That is a good suggestion :-)
>
> Here is offlining:
>
> CPU 1 is now offline
> lockdep: fixing up alternatives.
> SMP alternatives: switching to UP code
> CPU0 attaching NULL sched-domain.
> WQ on CPU0, prefer CPU1
> CPU1 attaching NULL sched-domain.
> CPU0 attaching sched-domain:
> domain 0: span 0 level CPU
> groups: 0
> WQ on CPU0, prefer CPU1
> WQ on CPU0, prefer CPU1
> WQ on CPU0, prefer CPU1
> [repeat last message indefinitely]
>
> Here is onlining:
>
> Booting processor 1/1 ip 6000
> Initializing CPU#1
> WQ on CPU0, prefer CPU1
> WQ on CPU0, prefer CPU1
> Calibrating delay using timer specific routine.. 5986.15 BogoMIPS (lpj=29930790)
> CPU: Trace cache: 12K uops, L1 D cache: 16K
> CPU: L2 cache: 2048K
> CPU: Physical Processor ID: 0
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#1.
> CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
> CPU1: Thermal monitoring enabled
> x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
> CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 05
> checking TSC synchronization [CPU#0 -> CPU#1]:
> Measured 120 cycles TSC warp between CPUs, turning off TSC clock.
> Marking TSC unstable due to check_tsc_sync_source failed
> APIC error on CPU1: 00(40)
> Clockevents: could not switch to one-shot mode:<7>APIC error on CPU1: 40(40)
> lapic is not functional.
> Could not switch to high resolution mode on CPU 0
> Clockevents: could not switch to one-shot mode: lapic is not functional.
> Could not switch to high resolution mode on CPU 1
> APIC error on CPU1: 40(40)
> [sched domains messages
> WQ on CPU0, prefer CPU1
> APIC error on CPU1: 40(40)
> [repeat last message 9 times]
>
> Then follows this pattern indefinitely:
>
> WQ on CPU0, prefer CPU1
> APIC error on CPU1: 40(40)
> [repeat last message 9 times]
>
> That's basically the same thing as I saw with suspend. So it can be
> reproduced easily with CPU hotplug.

Well, I don't know who's the right person to ask about the CPU hotplug.
Andrew, can you help please?

Rafael

2008-08-18 21:30:33

by Andrew Morton

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

On Mon, 18 Aug 2008 23:15:59 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> On Monday, 18 of August 2008, Vegard Nossum wrote:
> > On Mon, Aug 18, 2008 at 10:51 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > Apparently nmi_suspend() conflicts with oprofile somehow. Also, the offlining
> > > of non-boot CPUs may confuse it. It would be helpful to check if the CPU
> > > hotplug works with oprofile.
> >
> > That is a good suggestion :-)
> >
> > Here is offlining:
> >
> > CPU 1 is now offline
> > lockdep: fixing up alternatives.
> > SMP alternatives: switching to UP code
> > CPU0 attaching NULL sched-domain.
> > WQ on CPU0, prefer CPU1
> > CPU1 attaching NULL sched-domain.
> > CPU0 attaching sched-domain:
> > domain 0: span 0 level CPU
> > groups: 0
> > WQ on CPU0, prefer CPU1
> > WQ on CPU0, prefer CPU1
> > WQ on CPU0, prefer CPU1
> > [repeat last message indefinitely]
> >
> > Here is onlining:
> >
> > Booting processor 1/1 ip 6000
> > Initializing CPU#1
> > WQ on CPU0, prefer CPU1
> > WQ on CPU0, prefer CPU1
> > Calibrating delay using timer specific routine.. 5986.15 BogoMIPS (lpj=29930790)
> > CPU: Trace cache: 12K uops, L1 D cache: 16K
> > CPU: L2 cache: 2048K
> > CPU: Physical Processor ID: 0
> > Intel machine check architecture supported.
> > Intel machine check reporting enabled on CPU#1.
> > CPU1: Intel P4/Xeon Extended MCE MSRs (24) available
> > CPU1: Thermal monitoring enabled
> > x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
> > CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 05
> > checking TSC synchronization [CPU#0 -> CPU#1]:
> > Measured 120 cycles TSC warp between CPUs, turning off TSC clock.
> > Marking TSC unstable due to check_tsc_sync_source failed
> > APIC error on CPU1: 00(40)
> > Clockevents: could not switch to one-shot mode:<7>APIC error on CPU1: 40(40)
> > lapic is not functional.
> > Could not switch to high resolution mode on CPU 0
> > Clockevents: could not switch to one-shot mode: lapic is not functional.
> > Could not switch to high resolution mode on CPU 1
> > APIC error on CPU1: 40(40)
> > [sched domains messages
> > WQ on CPU0, prefer CPU1
> > APIC error on CPU1: 40(40)
> > [repeat last message 9 times]
> >
> > Then follows this pattern indefinitely:
> >
> > WQ on CPU0, prefer CPU1
> > APIC error on CPU1: 40(40)
> > [repeat last message 9 times]
> >
> > That's basically the same thing as I saw with suspend. So it can be
> > reproduced easily with CPU hotplug.
>
> Well, I don't know who's the right person to ask about the CPU hotplug.
> Andrew, can you help please?
>

The CPU hotplug maintainer is basically "everyone", because many
subsystems need to interact correctly with hotplugging, and many
subsystems can break it.

This one looks like a clocksource/apic/resume problem? Probably the
next port of call would be Thomas, with Robert looking on.

2008-08-19 01:13:46

by Andi Kleen

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

From 3967a8b03a30f26a6a5f41b4cc6a6772b5d0ce4f Mon Sep 17 00:00:00 2001
From: Andi Kleen <[email protected]>
Date: Tue, 19 Aug 2008 03:10:03 +0200
Subject: [PATCH] oprofile: fix cpu hotcplug

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/oprofile/nmi_int.c | 40 +++++++++++++++++++++++++++++++++++++---
1 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 529d625..6438c32 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -15,6 +15,7 @@
#include <linux/slab.h>
#include <linux/moduleparam.h>
#include <linux/kdebug.h>
+#include <linux/cpu.h>
#include <asm/nmi.h>
#include <asm/msr.h>
#include <asm/apic.h>
@@ -28,23 +29,48 @@ static DEFINE_PER_CPU(unsigned long, saved_lvtpc);

static int nmi_start(void);
static void nmi_stop(void);
+static void nmi_cpu_start(void *dummy);
+static void nmi_cpu_stop(void *dummy);

/* 0 == registered but off, 1 == registered and on */
static int nmi_enabled = 0;

+#ifdef CONFIG_SMP
+static int oprofile_cpu_notifier(struct notifier_block *b, unsigned long action,
+ void *data)
+{
+ int cpu = (unsigned long)data;
+ switch (action) {
+ case CPU_DOWN_FAILED:
+ case CPU_ONLINE:
+ smp_call_function_single(cpu, nmi_cpu_start, NULL, 0);
+ break;
+ case CPU_DOWN_PREPARE:
+ smp_call_function_single(cpu, nmi_cpu_stop, NULL, 1);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block oprofile_cpu_nb = {
+ .notifier_call = oprofile_cpu_notifier
+};
+#endif
+
#ifdef CONFIG_PM

static int nmi_suspend(struct sys_device *dev, pm_message_t state)
{
+ /* Only one CPU left, just stop that one */
if (nmi_enabled == 1)
- nmi_stop();
+ nmi_cpu_stop(NULL);
return 0;
}

static int nmi_resume(struct sys_device *dev)
{
if (nmi_enabled == 1)
- nmi_start();
+ nmi_cpu_start(NULL);
return 0;
}

@@ -66,6 +92,7 @@ static int __init init_sysfs(void)
error = sysdev_class_register(&oprofile_sysclass);
if (!error)
error = sysdev_register(&device_oprofile);
+
return error;
}

@@ -483,6 +510,9 @@ int __init op_nmi_init(struct oprofile_operations *ops)
}

init_sysfs();
+#ifdef CONFIG_SMP
+ register_cpu_notifier(&oprofile_cpu_nb);
+#endif
using_nmi = 1;
ops->create_files = nmi_create_files;
ops->setup = nmi_setup;
@@ -496,6 +526,10 @@ int __init op_nmi_init(struct oprofile_operations *ops)

void op_nmi_exit(void)
{
- if (using_nmi)
+ if (using_nmi) {
exit_sysfs();
+#ifdef CONFIG_SMP
+ unregister_cpu_notifier(&oprofile_cpu_nb);
+#endif
+ }
}
--
1.5.6

Attachments:

0001-oprofile-fix-cpu-hotcplug.patch (2.53 kB)

2008-08-19 07:12:43

by Vegard Nossum

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

On Tue, Aug 19, 2008 at 3:13 AM, Andi Kleen <[email protected]> wrote:
>> ------------[ cut here ]------------
>> WARNING: at
>> /uio/arkimedes/s29/vegardno/git-working/linux-2.6/kernel/smp.c:328 s
>> mp_call_function_mask+0x194/0x1a0()
>
> The usual problem: the suspend function when interrupts are
> already disabled calls smp_call_function which is not allowed with
> interrupt off. But at this point all the other CPUs should be already
> down anyways, so it should be enough to just drop that.
>
> This patch should fix that problem at least by fixing cpu hotplug&
> suspend support. Untested.
>
> -Andi
>
>
> From 3967a8b03a30f26a6a5f41b4cc6a6772b5d0ce4f Mon Sep 17 00:00:00 2001
> From: Andi Kleen <[email protected]>
> Date: Tue, 19 Aug 2008 03:10:03 +0200
> Subject: [PATCH] oprofile: fix cpu hotcplug
>
> Signed-off-by: Andi Kleen <[email protected]>

This gets rid of the suspend warnings, so it looks to be a step in the
right direction! FWIW, you may add my Tested-by to the patch.

Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036

2008-08-19 09:49:35

by Ingo Molnar

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

* Vegard Nossum <[email protected]> wrote:

> This gets rid of the suspend warnings, so it looks to be a step in the
> right direction! FWIW, you may add my Tested-by to the patch.

i've queued it up in tip/x86/oprofile - thanks guys.

Andi, another issue: your patch had 5 spurious whitespace errors. This
is a reoccuring, many years pattern of behavior from you and you need to
fix your workflow to send less sloppy patches. You should check your
editor's settings (and generally you should take a good look at patches
you send out).

I fixed up the problems of this patch, no need to resend.

Ingo

2008-08-19 12:12:50

by Andi Kleen

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

Ingo Molnar wrote:
> * Vegard Nossum <[email protected]> wrote:
>
>> This gets rid of the suspend warnings, so it looks to be a step in the
>> right direction! FWIW, you may add my Tested-by to the patch.
>
> i've queued it up in tip/x86/oprofile - thanks guys.

I would have thought Robert would take them? He's oprofile
maintainer after all.

>
> Andi, another issue: your patch had 5 spurious whitespace errors. This
> is a reoccuring, many years pattern of behavior from you and you need to
> fix your workflow to send less sloppy patches.

Yes I forgot you define patch cleanness based on white space.

Seriously, It's actually new -- i did this one in git directly instead of
quilt and git-commit doesn't seem to know how to drop them. In my older quilt
workflow they were always automatically dropped since many years. Both emacs
and vim add them and I don't plan to switch editors, sorry.

> I fixed up the problems of this patch, no need to resend.

I hope you fixed the typos in the commit log too (it really was just
a RFC test patch)

-Andi

2008-08-19 12:38:18

by tip-bot for Robert Richter

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

On 19.08.08 14:12:38, Andi Kleen wrote:
> Ingo Molnar wrote:
>> * Vegard Nossum <[email protected]> wrote:
>>> This gets rid of the suspend warnings, so it looks to be a step in the
>>> right direction! FWIW, you may add my Tested-by to the patch.
>> i've queued it up in tip/x86/oprofile - thanks guys.
>
> I would have thought Robert would take them? He's oprofile
> maintainer after all.

Yes, as soon as I have my own repository available this will change.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2008-08-19 12:57:20

by Johannes Weiner

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

Andi Kleen <[email protected]> writes:

> Ingo Molnar wrote:
>> * Vegard Nossum <[email protected]> wrote:
>>
>>> This gets rid of the suspend warnings, so it looks to be a step in
>>> the right direction! FWIW, you may add my Tested-by to the patch.
>>
>> i've queued it up in tip/x86/oprofile - thanks guys.
>
> I would have thought Robert would take them? He's oprofile
> maintainer after all.
>
>>
>> Andi, another issue: your patch had 5 spurious whitespace
>> errors. This is a reoccuring, many years pattern of behavior from
>> you and you need to fix your workflow to send less sloppy patches.
>
> Yes I forgot you define patch cleanness based on white space.
>
> Seriously, It's actually new -- i did this one in git directly instead of
> quilt and git-commit doesn't seem to know how to drop them. In my older quilt
> workflow they were always automatically dropped since many years. Both emacs
> and vim add them and I don't plan to switch editors, sorry.

You probably misconfigured emacs then. I ran checkpatch over your patch
and don't see how and why emacs would add trailing whitespace after a
opening brace.

You can use this to be on the safe side:

(add-hook 'before-save-hook 'delete-trailing-whitespace)

Hannes

2008-08-19 13:18:41

by Andi Kleen

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

Johannes Weiner <[email protected]> writes:
>
> You probably misconfigured emacs then. I ran checkpatch over your patch
> and don't see how and why emacs would add trailing whitespace after a
> opening brace.

I don't know how, but it does. vim does too sometimes (I use both
editors)

>
> You can use this to be on the safe side:
>
> (add-hook 'before-save-hook 'delete-trailing-whitespace)

That drops it on all. This means when I edit something which already
has trailing spaces then the diff is full of noise. That would
be a cure far worse than the disease[1]

Ideal would be if git-commit dropped them on all new lines, similar to
git-apply --whitespace=fix and quilt refresh, but it doesn't seem to know how.

But normally maintainers just set these options (at least I do and Andrew
does too) so it doesn't really matter if the submitted patch has trailing
spaces or not because they disappear on the way up.

And that said I refuse to write any more on this silly bike shed topic.

-Andi

[1] to be honest I'm not sure i are actually a disease since I don't
know of any single problem they cause.

2008-08-19 13:19:31

by Ingo Molnar

[permalink] [raw]

Subject: Re: oprofile + hibernation = badness

* Andi Kleen <[email protected]> wrote:

>> Andi, another issue: your patch had 5 spurious whitespace errors.
>> This is a reoccuring, many years pattern of behavior from you and you
>> need to fix your workflow to send less sloppy patches.
>
> Yes I forgot you define patch cleanness based on white space.
>
> Seriously, It's actually new -- i did this one in git directly instead
> of quilt and git-commit doesn't seem to know how to drop them. In my
> older quilt workflow they were always automatically dropped since many
> years. Both emacs and vim add them and I don't plan to switch editors,
> sorry.

Thousands of other kernel developers get it right and manage to submit
clean patches, so i think you are not trying very hard to actually solve
this problem - and your flippant and obtrusive answers do not make it
very easy to work with you as a contributor.

>> I fixed up the problems of this patch, no need to resend.
>
> I hope you fixed the typos in the commit log too (it really was just a
> RFC test patch)

You've put your signoff in there and Vegard has tested it. The general
principle on lkml is that patches can be marked as do-not-apply-yet by
adding something like this:

Not-Signed-off-by: Andi Kleen <[email protected]>

or by not adding a SOB line at all. In that case i either ask people for
the SOB or apply them at my own risk.

Thanks,

Ingo