2011-04-27 18:27:28

by Jeff Mahoney

[permalink] [raw]
Subject: [PATCH] hung_task_timeout: configurable default

This patch allows the default value for sysctl_hung_task_timeout_secs
to be set at build time. The feature carries virtually no overhead,
so it makes sense to keep it enabled. On heavily loaded systems, though,
it can end up triggering stack traces when there is no bug other than
the system being underprovisioned. We use this patch to keep the hung task
facility available but disabled at boot-time.

The default of 120 seconds is preserved. As a note, commit e162b39a may
have accidentally reverted commit fb822db4, which raised the default from
120 seconds to 480 seconds.

Signed-off-by: Jeff Mahoney <[email protected]>
---
kernel/hung_task.c | 3 ++-
lib/Kconfig.debug | 14 ++++++++++++++
2 files changed, 16 insertions(+), 1 deletion(-)

--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -33,7 +33,8 @@ unsigned long __read_mostly sysctl_hung_
/*
* Zero means infinite timeout - no checking done:
*/
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs =
+ CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;

unsigned long __read_mostly sysctl_hung_task_warnings = 10;

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -214,6 +214,20 @@ config DETECT_HUNG_TASK
enabled then all held locks will also be reported. This
feature has negligible overhead.

+config DEFAULT_HUNG_TASK_TIMEOUT
+ int "Default timeout for hung task detection (in seconds)"
+ depends on DETECT_HUNG_TASK
+ default 120
+ help
+ This option controls the default timeout (in seconds) used
+ to determine when a task has become non-responsive and should
+ be considered hung.
+
+ It can be adjusted at runtime via the kernel.hung_task_timeout
+ sysctl or by writing a value to /proc/sys/kernel/hung_task_timeout.
+
+ A timeout of 0 disables the check. The default is 120 seconds.
+
config BOOTPARAM_HUNG_TASK_PANIC
bool "Panic (Reboot) On Hung Tasks"
depends on DETECT_HUNG_TASK
--
Jeff Mahoney
SUSE Labs


2011-04-27 18:36:39

by Mandeep Baines

[permalink] [raw]
Subject: Re: [PATCH] hung_task_timeout: configurable default

Jeff Mahoney ([email protected]) wrote:
> This patch allows the default value for sysctl_hung_task_timeout_secs
> to be set at build time. The feature carries virtually no overhead,
> so it makes sense to keep it enabled. On heavily loaded systems, though,
> it can end up triggering stack traces when there is no bug other than
> the system being underprovisioned. We use this patch to keep the hung task
> facility available but disabled at boot-time.
>

Clever.

> The default of 120 seconds is preserved. As a note, commit e162b39a may
> have accidentally reverted commit fb822db4, which raised the default from
> 120 seconds to 480 seconds.
>
> Signed-off-by: Jeff Mahoney <[email protected]>

Acked-by: Mandeep Singh Baines <[email protected]>

> ---
> kernel/hung_task.c | 3 ++-
> lib/Kconfig.debug | 14 ++++++++++++++
> 2 files changed, 16 insertions(+), 1 deletion(-)
>
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -33,7 +33,8 @@ unsigned long __read_mostly sysctl_hung_
> /*
> * Zero means infinite timeout - no checking done:
> */
> -unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
> +unsigned long __read_mostly sysctl_hung_task_timeout_secs =
> + CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
>
> unsigned long __read_mostly sysctl_hung_task_warnings = 10;
>
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -214,6 +214,20 @@ config DETECT_HUNG_TASK
> enabled then all held locks will also be reported. This
> feature has negligible overhead.
>
> +config DEFAULT_HUNG_TASK_TIMEOUT
> + int "Default timeout for hung task detection (in seconds)"
> + depends on DETECT_HUNG_TASK
> + default 120
> + help
> + This option controls the default timeout (in seconds) used
> + to determine when a task has become non-responsive and should
> + be considered hung.
> +
> + It can be adjusted at runtime via the kernel.hung_task_timeout
> + sysctl or by writing a value to /proc/sys/kernel/hung_task_timeout.
> +
> + A timeout of 0 disables the check. The default is 120 seconds.
> +
> config BOOTPARAM_HUNG_TASK_PANIC
> bool "Panic (Reboot) On Hung Tasks"
> depends on DETECT_HUNG_TASK
> --
> Jeff Mahoney
> SUSE Labs

2011-04-28 10:01:14

by Jeff Mahoney

[permalink] [raw]
Subject: [tip:core/locking] watchdog, hung_task_timeout: Add Kconfig configurable default

Commit-ID: e11feaa1192a079ba8e88a12121e9b12d55d4239
Gitweb: http://git.kernel.org/tip/e11feaa1192a079ba8e88a12121e9b12d55d4239
Author: Jeff Mahoney <[email protected]>
AuthorDate: Wed, 27 Apr 2011 14:27:24 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 28 Apr 2011 09:13:17 +0200

watchdog, hung_task_timeout: Add Kconfig configurable default

This patch allows the default value for sysctl_hung_task_timeout_secs
to be set at build time. The feature carries virtually no overhead,
so it makes sense to keep it enabled. On heavily loaded systems, though,
it can end up triggering stack traces when there is no bug other than
the system being underprovisioned. We use this patch to keep the hung task
facility available but disabled at boot-time.

The default of 120 seconds is preserved. As a note, commit e162b39a may
have accidentally reverted commit fb822db4, which raised the default from
120 seconds to 480 seconds.

Signed-off-by: Jeff Mahoney <[email protected]>
Acked-by: Mandeep Singh Baines <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/hung_task.c | 2 +-
lib/Kconfig.debug | 15 +++++++++++++++
2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 53ead17..ea64012 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -33,7 +33,7 @@ unsigned long __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT;
/*
* Zero means infinite timeout - no checking done:
*/
-unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
+unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;

unsigned long __read_mostly sysctl_hung_task_warnings = 10;

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c768bcd..debbb05 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -238,6 +238,21 @@ config DETECT_HUNG_TASK
enabled then all held locks will also be reported. This
feature has negligible overhead.

+config DEFAULT_HUNG_TASK_TIMEOUT
+ int "Default timeout for hung task detection (in seconds)"
+ depends on DETECT_HUNG_TASK
+ default 120
+ help
+ This option controls the default timeout (in seconds) used
+ to determine when a task has become non-responsive and should
+ be considered hung.
+
+ It can be adjusted at runtime via the kernel.hung_task_timeout
+ sysctl or by writing a value to /proc/sys/kernel/hung_task_timeout.
+
+ A timeout of 0 disables the check. The default is two minutes.
+ Keeping the default should be fine in most cases.
+
config BOOTPARAM_HUNG_TASK_PANIC
bool "Panic (Reboot) On Hung Tasks"
depends on DETECT_HUNG_TASK

2011-05-04 22:06:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] hung_task_timeout: configurable default

On Wed, 27 Apr 2011 14:27:24 -0400
Jeff Mahoney <[email protected]> wrote:

> This patch allows the default value for sysctl_hung_task_timeout_secs
> to be set at build time. The feature carries virtually no overhead,
> so it makes sense to keep it enabled. On heavily loaded systems, though,
> it can end up triggering stack traces when there is no bug other than
> the system being underprovisioned. We use this patch to keep the hung task
> facility available but disabled at boot-time.
>
> The default of 120 seconds is preserved. As a note, commit e162b39a may
> have accidentally reverted commit fb822db4, which raised the default from
> 120 seconds to 480 seconds.

The changelog forgot to tell us why the patch's author considers the
patch to be needed. This happens quite a lot.

> @@ -33,7 +33,8 @@ unsigned long __read_mostly sysctl_hung_
> /*
> * Zero means infinite timeout - no checking done:
> */
> -unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
> +unsigned long __read_mostly sysctl_hung_task_timeout_secs =
> + CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
>

For the life of me I can't understand why you distro guys need to keep
patching the kernel when you could just add a line to your initscripts.

I'm suspecting that lameness is involved.

2011-05-04 22:38:38

by Jeff Mahoney

[permalink] [raw]
Subject: Re: [PATCH] hung_task_timeout: configurable default



On May 4, 2011, at 6:05 PM, Andrew Morton <[email protected]> wrote:

> On Wed, 27 Apr 2011 14:27:24 -0400
> Jeff Mahoney <[email protected]> wrote:
>
>> This patch allows the default value for sysctl_hung_task_timeout_secs
>> to be set at build time. The feature carries virtually no overhead,
>> so it makes sense to keep it enabled. On heavily loaded systems, though,
>> it can end up triggering stack traces when there is no bug other than
>> the system being underprovisioned. We use this patch to keep the hung task
>> facility available but disabled at boot-time.
>>
>> The default of 120 seconds is preserved. As a note, commit e162b39a may
>> have accidentally reverted commit fb822db4, which raised the default from
>> 120 seconds to 480 seconds.
>
> The changelog forgot to tell us why the patch's author considers the
> patch to be needed. This happens quite a lot.
>
>> @@ -33,7 +33,8 @@ unsigned long __read_mostly sysctl_hung_
>> /*
>> * Zero means infinite timeout - no checking done:
>> */
>> -unsigned long __read_mostly sysctl_hung_task_timeout_secs = 120;
>> +unsigned long __read_mostly sysctl_hung_task_timeout_secs =
>> + CONFIG_DEFAULT_HUNG_TASK_TIMEOUT;
>>
>
> For the life of me I can't understand why you distro guys need to keep
> patching the kernel when you could just add a line to your initscripts.
>
> I'm suspecting that lameness is involved.

Good point, and one we actually made ourselves internally after starting to submit these in the last round when we were tweaking more sensitive knobs.

Skip this one and I'll submit a patch fixing the 480s -> 120s regression.

-Jeff

--
Jeff Mahoney
(mobile)-