2014-10-24 12:54:14

by Prarit Bhargava

[permalink] [raw]
Subject: [PATCH V4] kernel, add bug_on_warn

There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system. Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to the
user.

A much easier method would be a switch to change the WARN() over to a
BUG(). This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a bug_on_warn kernel parameter and
/proc/sys/kernel/bug_on_warn calls BUG() in the warn_slowpath_common()
path. The function will still print out the location of the warning.

An example of the bug_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s location.
After that the new BUG() call is displayed.

WARNING: CPU: 27 PID: 3204 at
/home/rhel7/redhat/debug/dummy-module/dummy-module.c:25 init_dummy+0x28/0x30
[dummy_module]()
bug_on_warn set, calling BUG()...
------------[ cut here ]------------
kernel BUG at kernel/panic.c:434!
invalid opcode: 0000 [#1] SMP
Modules linked in: dummy_module(OE+) sg nfsv3 rpcsec_gss_krb5 nfsv4
dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal intel_powerclamp
coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel igb iTCO_wdt aesni_intel iTCO_vendor_support lrw gf128mul
sb_edac ptp edac_core glue_helper lpc_ich ioatdma pcspkr ablk_helper pps_core
i2c_i801 mfd_core cryptd dca shpchp ipmi_si wmi ipmi_msghandler acpi_cpufreq
nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod cdrom sd_mod
mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper isci ttm
drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror
dm_region_hash dm_log dm_mod
CPU: 27 PID: 3204 Comm: insmod Tainted: G OE 3.17.0+ #19
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS
RMLSDP.86I.00.29.D696.1311111329 11/11/2013
task: ffff880034e75160 ti: ffff8807fc5ac000 task.ti: ffff8807fc5ac000
RIP: 0010:[<ffffffff81076b81>] [<ffffffff81076b81>] warn_slowpath_common+0xc1/0xd0
RSP: 0018:ffff8807fc5afc68 EFLAGS: 00010246
RAX: 0000000000000021 RBX: ffff8807fc5afcb0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88081efee5f8 RDI: ffff88081efee5f8
RBP: ffff8807fc5afc98 R08: 0000000000000096 R09: 0000000000000000
R10: 0000000000000711 R11: ffff8807fc5af93e R12: ffffffffa0424070
R13: 0000000000000019 R14: ffffffffa0423068 R15: 0000000000000009
FS: 00007f2d4b034740(0000) GS:ffff88081efe0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2d4a99f3c0 CR3: 00000007fd88b000 CR4: 00000000001407e0
Stack:
ffff8807fc5afcb8 ffffffff8199f020 ffff88080e396160 0000000000000000
ffffffffa0423040 ffffffffa0425000 ffff8807fc5afd08 ffffffff81076be5
0000000000000008 ffffffffa0424053 ffff880700000018 ffff8807fc5afd18
Call Trace:
[<ffffffffa0423040>] ? dummy_greetings+0x40/0x40 [dummy_module]
[<ffffffff81076be5>] warn_slowpath_fmt+0x55/0x70
[<ffffffffa0423068>] init_dummy+0x28/0x30 [dummy_module]
[<ffffffff81002144>] do_one_initcall+0xd4/0x210
[<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
[<ffffffff810f8889>] load_module+0x16a9/0x1b30
[<ffffffff810f3d30>] ? store_uevent+0x70/0x70
[<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
[<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
[<ffffffff8166ce29>] system_call_fastpath+0x12/0x17
Code: c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c7 20 42 8a 81 31 c0 e8 fc
80 5e 00 eb 80 48 c7 c7 78 42 8a 81 31 c0 e8 ec 80 5e 00 <0f> 0b 66 66 66 66 2e
0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
RIP [<ffffffff81076b81>] warn_slowpath_common+0xc1/0xd0
RSP <ffff8807fc5afc68>
---[ end trace 428218934a12088b ]---

Successfully tested by me.

Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Fabian Frederick <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Prarit Bhargava <[email protected]>

[v2]: add /proc/sys/kernel/bug_on_warn, additional documentation, modify
!slowpath cases
[v3]: use proc_dointvec_minmax() in sysctl handler
[v4]: remove !slowpath cases, and add __read_mostly
---
Documentation/kdump/kdump.txt | 7 +++++++
Documentation/kernel-parameters.txt | 3 +++
Documentation/sysctl/kernel.txt | 12 ++++++++++++
include/linux/kernel.h | 1 +
include/uapi/linux/sysctl.h | 1 +
kernel/panic.c | 21 ++++++++++++++++++++-
kernel/sysctl.c | 9 +++++++++
kernel/sysctl_binary.c | 1 +
8 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 6c0b9f2..a04ed72 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -471,6 +471,13 @@ format. Crash is available on Dave Anderson's site at the following URL:

http://people.redhat.com/~anderson/

+Trigger Kdump on WARN()
+=======================
+
+The kernel parameter, bug_on_warn, calls BUG() in all WARN() paths. This
+will cause a kdump to occur at the BUG() call. In cases where a user
+wants to specify this during runtime, /proc/sys/kernel/bug_on_warn can be
+set to 1 to achieve the same behaviour.

Contact
=======
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 74339c5..aa1d319 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -553,6 +553,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
bttv.pll= See Documentation/video4linux/bttv/Insmod-options
bttv.tuner=

+ bug_on_warn BUG() instead of WARN(). Useful to cause kdump
+ on a WARN().
+
bulk_remove=off [PPC] This parameter disables the use of the pSeries
firmware feature for flushing multiple hpte entries
at a time.
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 57baff5..dcadcdc 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -23,6 +23,7 @@ show up in /proc/sys/kernel:
- auto_msgmni
- bootloader_type [ X86 only ]
- bootloader_version [ X86 only ]
+- bug_on_warn
- callhome [ S390 only ]
- cap_last_cap
- core_pattern
@@ -152,6 +153,17 @@ Documentation/x86/boot.txt for additional information.

==============================================================

+bug_on_warn:
+
+Calls BUG() in the WARN() path when set to 1. This is useful to avoid
+a kernel rebuild when attempting to kdump at the location of a WARN().
+
+0: only WARN(), default behaviour.
+
+1: call BUG() after printing out WARN() location.
+
+==============================================================
+
callhome:

Controls the kernel's callhome behavior in case of a kernel panic.
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3d770f55..fc28bff 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -423,6 +423,7 @@ extern int panic_on_oops;
extern int panic_on_unrecovered_nmi;
extern int panic_on_io_nmi;
extern int sysctl_panic_on_stackoverflow;
+extern int bug_on_warn;
/*
* Only to be used by arch init code. If the user over-wrote the default
* CONFIG_PANIC_TIMEOUT, honor it.
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index 43aaba1..2ba0a58 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -153,6 +153,7 @@ enum
KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
+ KERN_BUG_ON_WARN=77, /* int: call BUG() in WARN() functions */
};


diff --git a/kernel/panic.c b/kernel/panic.c
index d09dc5c..740d9ff 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -33,6 +33,7 @@ static int pause_on_oops;
static int pause_on_oops_flag;
static DEFINE_SPINLOCK(pause_on_oops_lock);
static bool crash_kexec_post_notifiers;
+int bug_on_warn __read_mostly;

int panic_timeout = CONFIG_PANIC_TIMEOUT;
EXPORT_SYMBOL_GPL(panic_timeout);
@@ -420,13 +421,24 @@ static void warn_slowpath_common(const char *file, int line, void *caller,
{
disable_trace_on_warning();

- pr_warn("------------[ cut here ]------------\n");
+ if (!bug_on_warn)
+ pr_warn("------------[ cut here ]------------\n");
pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS()\n",
raw_smp_processor_id(), current->pid, file, line, caller);

if (args)
vprintk(args->fmt, args->args);

+ if (bug_on_warn) {
+ pr_warn("bug_on_warn set, calling BUG()...\n");
+ /*
+ * A flood of WARN()s may occur. Prevent further WARN()s
+ * from panicking the system.
+ */
+ bug_on_warn = 0;
+ BUG();
+ }
+
print_modules();
dump_stack();
print_oops_end_marker();
@@ -501,3 +513,10 @@ static int __init oops_setup(char *s)
return 0;
}
early_param("oops", oops_setup);
+
+static int __init bug_on_warn_setup(char *s)
+{
+ bug_on_warn = 1;
+ return 0;
+}
+early_param("bug_on_warn", bug_on_warn_setup);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4aada6d..818cd31 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1103,6 +1103,15 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec,
},
#endif
+ {
+ .procname = "bug_on_warn",
+ .data = &bug_on_warn,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
{ }
};

diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 9a4f750..28376bf 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = {
{ CTL_INT, KERN_COMPAT_LOG, "compat-log" },
{ CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" },
{ CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" },
+ { CTL_INT, KERN_BUG_ON_WARN, "bug_on_warn" },
{}
};

--
1.7.9.3


2014-10-27 18:06:03

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

Hi Prarit,

On 10/24/2014 08:53 AM, Prarit Bhargava wrote:
> There have been several times where I have had to rebuild a kernel to
> cause a panic when hitting a WARN() in the code in order to get a crash
> dump from a system. Sometimes this is easy to do, other times (such as
> in the case of a remote admin) it is not trivial to send new images to the
> user.panic_on_stackoverflow
>
> A much easier method would be a switch to change the WARN() over to a
> BUG(). This makes debugging easier in that I can now test the actual
> image the WARN() was seen on and I do not have to engage in remote
> debugging.
>
> This patch adds a bug_on_warn kernel parameter and
> /proc/sys/kernel/bug_on_warn calls BUG() in the warn_slowpath_common()
> path. The function will still print out the location of the warning.
>
> An example of the bug_on_warn output:
>
> The first line below is from the WARN_ON() to output the WARN_ON()'s location.
> After that the new BUG() call is displayed.
>
> WARNING: CPU: 27 PID: 3204 at
> /home/rhel7/redhat/debug/dummy-module/dummy-module.c:25 init_dummy+0x28/0x30
> [dummy_module]()
> bug_on_warn set, calling BUG()...
> ------------[ cut here ]------------
> kernel BUG at kernel/panic.c:434!

Seems reasonable-I'm wondering why you just don't call panic() in this
case. The BUG() call at line '434' doesn't at anything since its just being
called from panic.c.

So something like 'panic_on_warn' would seem to be more appropriate
in keeping with things like 'panic_on_oops' or 'panic_on_stackoverflow'.

Thanks,

-Jason

2014-10-27 18:16:03

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn



On 10/27/2014 02:05 PM, Jason Baron wrote:
> Hi Prarit,
>
> On 10/24/2014 08:53 AM, Prarit Bhargava wrote:
>> There have been several times where I have had to rebuild a kernel to
>> cause a panic when hitting a WARN() in the code in order to get a crash
>> dump from a system. Sometimes this is easy to do, other times (such as
>> in the case of a remote admin) it is not trivial to send new images to the
>> user.panic_on_stackoverflow
>>
>> A much easier method would be a switch to change the WARN() over to a
>> BUG(). This makes debugging easier in that I can now test the actual
>> image the WARN() was seen on and I do not have to engage in remote
>> debugging.
>>
>> This patch adds a bug_on_warn kernel parameter and
>> /proc/sys/kernel/bug_on_warn calls BUG() in the warn_slowpath_common()
>> path. The function will still print out the location of the warning.
>>
>> An example of the bug_on_warn output:
>>
>> The first line below is from the WARN_ON() to output the WARN_ON()'s location.
>> After that the new BUG() call is displayed.
>>
>> WARNING: CPU: 27 PID: 3204 at
>> /home/rhel7/redhat/debug/dummy-module/dummy-module.c:25 init_dummy+0x28/0x30
>> [dummy_module]()
>> bug_on_warn set, calling BUG()...
>> ------------[ cut here ]------------
>> kernel BUG at kernel/panic.c:434!
>
> Seems reasonable-I'm wondering why you just don't call panic() in this
> case. The BUG() call at line '434' doesn't at anything since its just being
> called from panic.c.

Hmm ... I didn't even think about that.

>
> So something like 'panic_on_warn' would seem to be more appropriate
> in keeping with things like 'panic_on_oops' or 'panic_on_stackoverflow'.

I like it a lot better that way too :) I'm changing it to panic_on_warn unless
anyone has any strenuous objections.

P.

>
> Thanks,
>
> -Jason
>

2014-10-28 00:01:26

by Yasuaki Ishimatsu

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

(2014/10/24 21:53), Prarit Bhargava wrote:
> There have been several times where I have had to rebuild a kernel to
> cause a panic when hitting a WARN() in the code in order to get a crash
> dump from a system. Sometimes this is easy to do, other times (such as
> in the case of a remote admin) it is not trivial to send new images to the
> user.
>
> A much easier method would be a switch to change the WARN() over to a
> BUG(). This makes debugging easier in that I can now test the actual
> image the WARN() was seen on and I do not have to engage in remote
> debugging.
>
> This patch adds a bug_on_warn kernel parameter and
> /proc/sys/kernel/bug_on_warn calls BUG() in the warn_slowpath_common()
> path. The function will still print out the location of the warning.
>
> An example of the bug_on_warn output:
>
> The first line below is from the WARN_ON() to output the WARN_ON()'s location.
> After that the new BUG() call is displayed.
>
> WARNING: CPU: 27 PID: 3204 at
> /home/rhel7/redhat/debug/dummy-module/dummy-module.c:25 init_dummy+0x28/0x30
> [dummy_module]()
> bug_on_warn set, calling BUG()...
> ------------[ cut here ]------------
> kernel BUG at kernel/panic.c:434!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: dummy_module(OE+) sg nfsv3 rpcsec_gss_krb5 nfsv4
> dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel
> ghash_clmulni_intel igb iTCO_wdt aesni_intel iTCO_vendor_support lrw gf128mul
> sb_edac ptp edac_core glue_helper lpc_ich ioatdma pcspkr ablk_helper pps_core
> i2c_i801 mfd_core cryptd dca shpchp ipmi_si wmi ipmi_msghandler acpi_cpufreq
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sr_mod cdrom sd_mod
> mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper isci ttm
> drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror
> dm_region_hash dm_log dm_mod
> CPU: 27 PID: 3204 Comm: insmod Tainted: G OE 3.17.0+ #19
> Hardware name: Intel Corporation S2600CP/S2600CP, BIOS
> RMLSDP.86I.00.29.D696.1311111329 11/11/2013
> task: ffff880034e75160 ti: ffff8807fc5ac000 task.ti: ffff8807fc5ac000
> RIP: 0010:[<ffffffff81076b81>] [<ffffffff81076b81>] warn_slowpath_common+0xc1/0xd0
> RSP: 0018:ffff8807fc5afc68 EFLAGS: 00010246
> RAX: 0000000000000021 RBX: ffff8807fc5afcb0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff88081efee5f8 RDI: ffff88081efee5f8
> RBP: ffff8807fc5afc98 R08: 0000000000000096 R09: 0000000000000000
> R10: 0000000000000711 R11: ffff8807fc5af93e R12: ffffffffa0424070
> R13: 0000000000000019 R14: ffffffffa0423068 R15: 0000000000000009
> FS: 00007f2d4b034740(0000) GS:ffff88081efe0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2d4a99f3c0 CR3: 00000007fd88b000 CR4: 00000000001407e0
> Stack:
> ffff8807fc5afcb8 ffffffff8199f020 ffff88080e396160 0000000000000000
> ffffffffa0423040 ffffffffa0425000 ffff8807fc5afd08 ffffffff81076be5
> 0000000000000008 ffffffffa0424053 ffff880700000018 ffff8807fc5afd18
> Call Trace:
> [<ffffffffa0423040>] ? dummy_greetings+0x40/0x40 [dummy_module]
> [<ffffffff81076be5>] warn_slowpath_fmt+0x55/0x70
> [<ffffffffa0423068>] init_dummy+0x28/0x30 [dummy_module]
> [<ffffffff81002144>] do_one_initcall+0xd4/0x210
> [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
> [<ffffffff810f8889>] load_module+0x16a9/0x1b30
> [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
> [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
> [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
> [<ffffffff8166ce29>] system_call_fastpath+0x12/0x17
> Code: c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 c7 c7 20 42 8a 81 31 c0 e8 fc
> 80 5e 00 eb 80 48 c7 c7 78 42 8a 81 31 c0 e8 ec 80 5e 00 <0f> 0b 66 66 66 66 2e
> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> RIP [<ffffffff81076b81>] warn_slowpath_common+0xc1/0xd0
> RSP <ffff8807fc5afc68>
> ---[ end trace 428218934a12088b ]---
>
> Successfully tested by me.
>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Rusty Russell <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Fabian Frederick <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Prarit Bhargava <[email protected]>
>
> [v2]: add /proc/sys/kernel/bug_on_warn, additional documentation, modify
> !slowpath cases
> [v3]: use proc_dointvec_minmax() in sysctl handler
> [v4]: remove !slowpath cases, and add __read_mostly
> ---

Looks good to me.
Reviewed-by: Yasuaki Ishimatsu <[email protected]>

Thanks,
Yasuaki Ishimatsu


> Documentation/kdump/kdump.txt | 7 +++++++
> Documentation/kernel-parameters.txt | 3 +++
> Documentation/sysctl/kernel.txt | 12 ++++++++++++
> include/linux/kernel.h | 1 +
> include/uapi/linux/sysctl.h | 1 +
> kernel/panic.c | 21 ++++++++++++++++++++-
> kernel/sysctl.c | 9 +++++++++
> kernel/sysctl_binary.c | 1 +
> 8 files changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
> index 6c0b9f2..a04ed72 100644
> --- a/Documentation/kdump/kdump.txt
> +++ b/Documentation/kdump/kdump.txt
> @@ -471,6 +471,13 @@ format. Crash is available on Dave Anderson's site at the following URL:
>
> http://people.redhat.com/~anderson/
>
> +Trigger Kdump on WARN()
> +=======================
> +
> +The kernel parameter, bug_on_warn, calls BUG() in all WARN() paths. This
> +will cause a kdump to occur at the BUG() call. In cases where a user
> +wants to specify this during runtime, /proc/sys/kernel/bug_on_warn can be
> +set to 1 to achieve the same behaviour.
>
> Contact
> =======
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 74339c5..aa1d319 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -553,6 +553,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> bttv.pll= See Documentation/video4linux/bttv/Insmod-options
> bttv.tuner=
>
> + bug_on_warn BUG() instead of WARN(). Useful to cause kdump
> + on a WARN().
> +
> bulk_remove=off [PPC] This parameter disables the use of the pSeries
> firmware feature for flushing multiple hpte entries
> at a time.
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index 57baff5..dcadcdc 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -23,6 +23,7 @@ show up in /proc/sys/kernel:
> - auto_msgmni
> - bootloader_type [ X86 only ]
> - bootloader_version [ X86 only ]
> +- bug_on_warn
> - callhome [ S390 only ]
> - cap_last_cap
> - core_pattern
> @@ -152,6 +153,17 @@ Documentation/x86/boot.txt for additional information.
>
> ==============================================================
>
> +bug_on_warn:
> +
> +Calls BUG() in the WARN() path when set to 1. This is useful to avoid
> +a kernel rebuild when attempting to kdump at the location of a WARN().
> +
> +0: only WARN(), default behaviour.
> +
> +1: call BUG() after printing out WARN() location.
> +
> +==============================================================
> +
> callhome:
>
> Controls the kernel's callhome behavior in case of a kernel panic.
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 3d770f55..fc28bff 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -423,6 +423,7 @@ extern int panic_on_oops;
> extern int panic_on_unrecovered_nmi;
> extern int panic_on_io_nmi;
> extern int sysctl_panic_on_stackoverflow;
> +extern int bug_on_warn;
> /*
> * Only to be used by arch init code. If the user over-wrote the default
> * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> index 43aaba1..2ba0a58 100644
> --- a/include/uapi/linux/sysctl.h
> +++ b/include/uapi/linux/sysctl.h
> @@ -153,6 +153,7 @@ enum
> KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
> KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
> KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
> + KERN_BUG_ON_WARN=77, /* int: call BUG() in WARN() functions */
> };
>
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d09dc5c..740d9ff 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -33,6 +33,7 @@ static int pause_on_oops;
> static int pause_on_oops_flag;
> static DEFINE_SPINLOCK(pause_on_oops_lock);
> static bool crash_kexec_post_notifiers;
> +int bug_on_warn __read_mostly;
>
> int panic_timeout = CONFIG_PANIC_TIMEOUT;
> EXPORT_SYMBOL_GPL(panic_timeout);
> @@ -420,13 +421,24 @@ static void warn_slowpath_common(const char *file, int line, void *caller,
> {
> disable_trace_on_warning();
>
> - pr_warn("------------[ cut here ]------------\n");
> + if (!bug_on_warn)
> + pr_warn("------------[ cut here ]------------\n");
> pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS()\n",
> raw_smp_processor_id(), current->pid, file, line, caller);
>
> if (args)
> vprintk(args->fmt, args->args);
>
> + if (bug_on_warn) {
> + pr_warn("bug_on_warn set, calling BUG()...\n");
> + /*
> + * A flood of WARN()s may occur. Prevent further WARN()s
> + * from panicking the system.
> + */
> + bug_on_warn = 0;
> + BUG();
> + }
> +
> print_modules();
> dump_stack();
> print_oops_end_marker();
> @@ -501,3 +513,10 @@ static int __init oops_setup(char *s)
> return 0;
> }
> early_param("oops", oops_setup);
> +
> +static int __init bug_on_warn_setup(char *s)
> +{
> + bug_on_warn = 1;
> + return 0;
> +}
> +early_param("bug_on_warn", bug_on_warn_setup);
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 4aada6d..818cd31 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1103,6 +1103,15 @@ static struct ctl_table kern_table[] = {
> .proc_handler = proc_dointvec,
> },
> #endif
> + {
> + .procname = "bug_on_warn",
> + .data = &bug_on_warn,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = &zero,
> + .extra2 = &one,
> + },
> { }
> };
>
> diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
> index 9a4f750..28376bf 100644
> --- a/kernel/sysctl_binary.c
> +++ b/kernel/sysctl_binary.c
> @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = {
> { CTL_INT, KERN_COMPAT_LOG, "compat-log" },
> { CTL_INT, KERN_MAX_LOCK_DEPTH, "max_lock_depth" },
> { CTL_INT, KERN_PANIC_ON_NMI, "panic_on_unrecovered_nmi" },
> + { CTL_INT, KERN_BUG_ON_WARN, "bug_on_warn" },
> {}
> };
>
>

2014-10-28 02:33:38

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

> > Seems reasonable-I'm wondering why you just don't call panic() in this
> > case. The BUG() call at line '434' doesn't at anything since its just being
> > called from panic.c.
>
> Hmm ... I didn't even think about that.
>
> >
> > So something like 'panic_on_warn' would seem to be more appropriate
> > in keeping with things like 'panic_on_oops' or 'panic_on_stackoverflow'.
>
> I like it a lot better that way too :) I'm changing it to panic_on_warn unless
> anyone has any strenuous objections.

I would vote for panic_on_warn, it will make more sense than bug_on_warn.

Thanks
Dave

Subject: Re: Re: [PATCH V4] kernel, add bug_on_warn

(2014/10/28 3:15), Prarit Bhargava wrote:
>
>
> On 10/27/2014 02:05 PM, Jason Baron wrote:
>> Hi Prarit,
>>
>> On 10/24/2014 08:53 AM, Prarit Bhargava wrote:
>>> There have been several times where I have had to rebuild a kernel to
>>> cause a panic when hitting a WARN() in the code in order to get a crash
>>> dump from a system. Sometimes this is easy to do, other times (such as
>>> in the case of a remote admin) it is not trivial to send new images to the
>>> user.panic_on_stackoverflow
>>>
>>> A much easier method would be a switch to change the WARN() over to a
>>> BUG(). This makes debugging easier in that I can now test the actual
>>> image the WARN() was seen on and I do not have to engage in remote
>>> debugging.
>>>
>>> This patch adds a bug_on_warn kernel parameter and
>>> /proc/sys/kernel/bug_on_warn calls BUG() in the warn_slowpath_common()
>>> path. The function will still print out the location of the warning.
>>>
>>> An example of the bug_on_warn output:
>>>
>>> The first line below is from the WARN_ON() to output the WARN_ON()'s location.
>>> After that the new BUG() call is displayed.
>>>
>>> WARNING: CPU: 27 PID: 3204 at
>>> /home/rhel7/redhat/debug/dummy-module/dummy-module.c:25 init_dummy+0x28/0x30
>>> [dummy_module]()
>>> bug_on_warn set, calling BUG()...
>>> ------------[ cut here ]------------
>>> kernel BUG at kernel/panic.c:434!
>>
>> Seems reasonable-I'm wondering why you just don't call panic() in this
>> case. The BUG() call at line '434' doesn't at anything since its just being
>> called from panic.c.

+1, I like calling panic() instead of BUG() :)

Thank you,

>
> Hmm ... I didn't even think about that.
>
>>
>> So something like 'panic_on_warn' would seem to be more appropriate
>> in keeping with things like 'panic_on_oops' or 'panic_on_stackoverflow'.
>
> I like it a lot better that way too :) I'm changing it to panic_on_warn unless
> anyone has any strenuous objections.
>
> P.
>
>>
>> Thanks,
>>
>> -Jason
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-10-28 12:16:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

On Fri, Oct 24, 2014 at 08:53:27AM -0400, Prarit Bhargava wrote:
> There have been several times where I have had to rebuild a kernel to
> cause a panic when hitting a WARN() in the code in order to get a crash
> dump from a system. Sometimes this is easy to do, other times (such as
> in the case of a remote admin) it is not trivial to send new images to the
> user.
>
> A much easier method would be a switch to change the WARN() over to a
> BUG(). This makes debugging easier in that I can now test the actual
> image the WARN() was seen on and I do not have to engage in remote
> debugging.

IMHO this would be better and far more generically done with kdb.
You would need two things:

- Extend the break point command to run another command on a break point.
- Add a command line (or possibly /proc) option to execute some kdb commands at
kernel boot.

Then just set a break point on the warn function and execute magic sysrq c
from kdb.

-Andi

2014-10-28 12:22:56

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn



On 10/28/2014 08:16 AM, Andi Kleen wrote:
> On Fri, Oct 24, 2014 at 08:53:27AM -0400, Prarit Bhargava wrote:
>> There have been several times where I have had to rebuild a kernel to
>> cause a panic when hitting a WARN() in the code in order to get a crash
>> dump from a system. Sometimes this is easy to do, other times (such as
>> in the case of a remote admin) it is not trivial to send new images to the
>> user.
>>
>> A much easier method would be a switch to change the WARN() over to a
>> BUG(). This makes debugging easier in that I can now test the actual
>> image the WARN() was seen on and I do not have to engage in remote
>> debugging.
>
> IMHO this would be better and far more generically done with kdb.
> You would need two things:
>
> - Extend the break point command to run another command on a break point.
> - Add a command line (or possibly /proc) option to execute some kdb commands at
> kernel boot.

I suppose ... but that would mean I would have to explain to an end user the
elaborate process of enabling kdb, inserting a break point, etc. The whole
purpose of this is to let an end user panic on WARN() easily.

Asking an end user to enable kdb is magnitudes worse than asking them to
recompile a kernel.

P.

>
> Then just set a break point on the warn function and execute magic sysrq c
> from kdb.
>
> -Andi
>

2014-10-28 12:29:50

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

On Tue, Oct 28, 2014 at 08:22:16AM -0400, Prarit Bhargava wrote:
>
>
> On 10/28/2014 08:16 AM, Andi Kleen wrote:
> > On Fri, Oct 24, 2014 at 08:53:27AM -0400, Prarit Bhargava wrote:
> >> There have been several times where I have had to rebuild a kernel to
> >> cause a panic when hitting a WARN() in the code in order to get a crash
> >> dump from a system. Sometimes this is easy to do, other times (such as
> >> in the case of a remote admin) it is not trivial to send new images to the
> >> user.
> >>
> >> A much easier method would be a switch to change the WARN() over to a
> >> BUG(). This makes debugging easier in that I can now test the actual
> >> image the WARN() was seen on and I do not have to engage in remote
> >> debugging.
> >
> > IMHO this would be better and far more generically done with kdb.
> > You would need two things:
> >
> > - Extend the break point command to run another command on a break point.
> > - Add a command line (or possibly /proc) option to execute some kdb commands at
> > kernel boot.
>
> I suppose ... but that would mean I would have to explain to an end user the
> elaborate process of enabling kdb, inserting a break point, etc. The whole
> purpose of this is to let an end user panic on WARN() easily.
>
> Asking an end user to enable kdb is magnitudes worse than asking them to
> recompile a kernel.

Agreed. Asking a customer to setup and run kdb and put breakpoints is much
more pain than simply asking to reboot kernel with a command line option.

Thanks
Vivek

2014-10-28 12:44:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

> > I suppose ... but that would mean I would have to explain to an end user the
> > elaborate process of enabling kdb, inserting a break point, etc. The whole
> > purpose of this is to let an end user panic on WARN() easily.
> >
> > Asking an end user to enable kdb is magnitudes worse than asking them to
> > recompile a kernel.
>
> Agreed. Asking a customer to setup and run kdb and put breakpoints is much
> more pain than simply asking to reboot kernel with a command line option.

If you have a command line option to execute kdb commands you still
would only have a command line option, just a slightly longer one.

kdb="on, bp warn_slowpath_common sr c, go"

But it would be a generic facility instead of a special purpose hack.

-Andi

2014-10-28 12:49:40

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn



On 10/28/2014 08:44 AM, Andi Kleen wrote:
>>> I suppose ... but that would mean I would have to explain to an end user the
>>> elaborate process of enabling kdb, inserting a break point, etc. The whole
>>> purpose of this is to let an end user panic on WARN() easily.
>>>
>>> Asking an end user to enable kdb is magnitudes worse than asking them to
>>> recompile a kernel.
>>
>> Agreed. Asking a customer to setup and run kdb and put breakpoints is much
>> more pain than simply asking to reboot kernel with a command line option.
>
> If you have a command line option to execute kdb commands you still
> would only have a command line option, just a slightly longer one.
>
> kdb="on, bp warn_slowpath_common sr c, go"

KDB is not on all kernels. This would require me to go to great lengths to
explain to a user how to set it up, etc., rather than saying "Hi, please add
panic_on_warn as a kernel parameter/echo 1 into /proc/sys/kernel/panic_on_warn
and on the next time you see that WARN() the system will panic and kdump. Send
me that kdump."

>
> But it would be a generic facility instead of a special purpose hack.

Generic to KDB be configured on and it is in no way trivial to use.

P.

2014-10-28 12:56:11

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

On Tue, Oct 28, 2014 at 05:44:25AM -0700, Andi Kleen wrote:
> > > I suppose ... but that would mean I would have to explain to an end user the
> > > elaborate process of enabling kdb, inserting a break point, etc. The whole
> > > purpose of this is to let an end user panic on WARN() easily.
> > >
> > > Asking an end user to enable kdb is magnitudes worse than asking them to
> > > recompile a kernel.
> >
> > Agreed. Asking a customer to setup and run kdb and put breakpoints is much
> > more pain than simply asking to reboot kernel with a command line option.
>
> If you have a command line option to execute kdb commands you still
> would only have a command line option, just a slightly longer one.
>
> kdb="on, bp warn_slowpath_common sr c, go"

So does it already work or proposal is to make something like this work
with kdb?

What about the case of enabling it post boot and using a /sys file for
that.

Thanks
Vivek

2014-10-28 12:56:35

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

> > If you have a command line option to execute kdb commands you still
> > would only have a command line option, just a slightly longer one.
> >
> > kdb="on, bp warn_slowpath_common sr c, go"
>
> KDB is not on all kernels. This would require me to go to great lengths to

Repeating incorrect statements doesn't suddenly make them correct.

Assuming KDB is compiled in the only setup needed would be the line above.

-Andi

2014-10-28 12:59:32

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn

> > kdb="on, bp warn_slowpath_common sr c, go"
>
> So does it already work or proposal is to make something like this work
> with kdb?

It does not work today, the missing pieces are:

- extending kdb= to execute a list of kdb commands
- extending bp to define a list of commands to execute

>
> What about the case of enabling it post boot and using a /sys file for
> that.

This would be another patch, add a sysfs file to run a list of kdb commands.

-Andi

--
[email protected] -- Speaking for myself only

2014-10-28 13:19:41

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [PATCH V4] kernel, add bug_on_warn



On 10/28/2014 08:56 AM, Andi Kleen wrote:
>>> If you have a command line option to execute kdb commands you still
>>> would only have a command line option, just a slightly longer one.
>>>
>>> kdb="on, bp warn_slowpath_common sr c, go"
>>
>> KDB is not on all kernels. This would require me to go to great lengths to
>
> Repeating incorrect statements doesn't suddenly make them correct.

It does when you're introducing unnecessary complexity here Andi. You're not
listening to the original statement: I do not want to provide a modified kernel
to an end user for this simple situation anymore.

>
> Assuming KDB is compiled in the only setup needed would be the line above.

That's a HUGE assumption. It isn't and I'm still stuck in the same mess of
explaining how to recompile a kernel and/or providing a modified kernel to an
end user. That isn't a good solution.

P.

>
> -Andi
>