On Thu 05-08-21 21:47:47, chenguanyou wrote:
> Some kernel threads are always in D state, when we enable hung_task,
> it will misjudge, we should skip these to narrow the scope.
>
> exp mediatek:
> root 435 435 2 0 0 mtk_lpm_monitor_thread 0 D LPM-0
> root 436 436 2 0 0 mtk_lpm_monitor_thread 0 D LPM-1
> root 437 437 2 0 0 mtk_lpm_monitor_thread 0 D LPM-2
> root 438 438 2 0 0 mtk_lpm_monitor_thread 0 D LPM-3
> root 439 439 2 0 0 mtk_lpm_monitor_thread 0 D LPM-4
> root 440 440 2 0 0 mtk_lpm_monitor_thread 0 D LPM-5
> root 441 441 2 0 0 mtk_lpm_monitor_thread 0 D LPM-6
> root 442 442 2 0 0 mtk_lpm_monitor_thread 0 D LPM-7
A similar approch has been proposed in the past (sorry I do not have
links handy) and always deemed a wrong way to approach the problem.
Either those kernel threads should be fixed to use less sleep or
annotate the sleep properly (TASK_IDLE).
> Signed-off-by: chenguanyou <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 10 ++++++++++
> include/linux/sched/sysctl.h | 1 +
> kernel/hung_task.c | 8 ++++++++
> kernel/sysctl.c | 9 +++++++++
> lib/Kconfig.debug | 15 +++++++++++++++
> 5 files changed, 43 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 68b21395a743..3c7c74b26d95 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -405,6 +405,16 @@ This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
>
> -1: report an infinite number of warnings.
>
> +hung_task_filter_kthread
> +========================
> +
> +We should skip kthread when a hung task is detected.
> +This file shows up if ``CONFIG_DEFAULT_HUNG_TASK_FILTER_KTHREAD`` is enabled.
> +
> += =========================================================
> +0 Not skip detect kthread.
> +1 Skip detect kthread.
> += =========================================================
>
> hyperv_record_panic_msg
> =======================
> diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
> index db2c0f34aaaf..2b8b01b57559 100644
> --- a/include/linux/sched/sysctl.h
> +++ b/include/linux/sched/sysctl.h
> @@ -19,6 +19,7 @@ extern unsigned int sysctl_hung_task_panic;
> extern unsigned long sysctl_hung_task_timeout_secs;
> extern unsigned long sysctl_hung_task_check_interval_secs;
> extern int sysctl_hung_task_warnings;
> +extern unsigned int sysctl_hung_task_filter_kthread;
> int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
> void *buffer, size_t *lenp, loff_t *ppos);
> #else
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 396ebaebea3f..74ad75c2dde8 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -48,6 +48,11 @@ unsigned long __read_mostly sysctl_hung_task_timeout_secs = CONFIG_DEFAULT_HUNG_
> */
> unsigned long __read_mostly sysctl_hung_task_check_interval_secs;
>
> +/*
> + * Non-zero means no checking kthread
> + */
> +unsigned int __read_mostly sysctl_hung_task_filter_kthread = CONFIG_DEFAULT_HUNG_TASK_FILTER_KTHREAD;
> +
> int __read_mostly sysctl_hung_task_warnings = 10;
>
> static int __read_mostly did_panic;
> @@ -88,6 +93,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> {
> unsigned long switch_count = t->nvcsw + t->nivcsw;
>
> + if (unlikely(sysctl_hung_task_filter_kthread && t->flags & PF_KTHREAD))
> + return;
> +
> /*
> * Ensure the task is not frozen.
> * Also, skip vfork and any other user process that freezer should skip.
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index d4a78e08f6d8..62067b9db486 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2513,6 +2513,15 @@ static struct ctl_table kern_table[] = {
> .proc_handler = proc_dointvec_minmax,
> .extra1 = &neg_one,
> },
> + {
> + .procname = "hung_task_filter_kthread",
> + .data = &sysctl_hung_task_filter_kthread,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = SYSCTL_ONE,
> + },
> #endif
> #ifdef CONFIG_RT_MUTEXES
> {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 678c13967580..d7063f955987 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1110,6 +1110,21 @@ config DEFAULT_HUNG_TASK_TIMEOUT
> A timeout of 0 disables the check. The default is two minutes.
> Keeping the default should be fine in most cases.
>
> +config DEFAULT_HUNG_TASK_FILTER_KTHREAD
> + int "Default filter kthread for hung task"
> + depends on DETECT_HUNG_TASK
> + range 0 1
> + default 0
> + help
> + This option controls filter kthread uses to determine when
> + a kernel task has become "state=TASK_UNINTERRUPTIBLE" and should be skipped.
> +
> + It can be adjusted at runtime via the kernel.hung_task_filter_kthread
> + sysctl or by writing a value to
> + /proc/sys/kernel/hung_task_filter_kthread.
> +
> + A filter of 1 disables the check.
> +
> config BOOTPARAM_HUNG_TASK_PANIC
> bool "Panic (Reboot) On Hung Tasks"
> depends on DETECT_HUNG_TASK
> --
> 2.17.1
--
Michal Hocko
SUSE Labs
> Either those kernel threads should be fixed to use less sleep or
> annotate the sleep properly (TASK_IDLE).
The API for debugging when we no need care kernel threads state.
Guanyou.Chen
On Thu, 5 Aug 2021 22:47:20 +0800 chenguanyou <[email protected]> wrote:
> > Either those kernel threads should be fixed to use less sleep or
> > annotate the sleep properly (TASK_IDLE).
>
> The API for debugging when we no need care kernel threads state.
>
Please explain this point in more detail?
> Please explain this point in more detail?
In my work,the rootcause of more deadlocks often occurs in user threads.
exp:
PID: 711 TASK: ffffffc13eb71d00 CPU: 0 COMMAND: "[email protected]"
#0 [ffffff80251cbcb0] __switch_to at ffffff80080866c4
#1 [ffffff80251cbd20] __schedule at ffffff80090c0940
#2 [ffffff80251cbd80] schedule_preempt_disabled at ffffff80090c0e4c
#3 [ffffff80251cbde0] __mutex_lock at ffffff80090c2e58
#4 [ffffff80251cbe40] __mutex_lock_slowpath at ffffff80090c1f78
#5 [ffffff80251cbe50] mutex_lock at ffffff80090c1f60
#6 [ffffff80251cbe60] __fdget_pos at ffffff800829ac84
#7 [ffffff80251cbe90] sys_write at ffffff8008270550
#8 [ffffff80251cbff0] el0_svc_naked at ffffff8008083fbc
PID: 843 TASK: ffffffc135832b80 CPU: 2 COMMAND: "[email protected]"
#0 [ffffff802554bb30] __switch_to at ffffff80080866c4
#1 [ffffff802554bba0] __schedule at ffffff80090c0940
#2 [ffffff802554bc00] schedule at ffffff80090c0d54
#3 [ffffff802554bc50] xxx_sensor_show at ffffff8008bc043c
#4 [ffffff802554bc80] dev_attr_show at ffffff8008668ce0
#5 [ffffff802554bca0] sysfs_kf_seq_show at ffffff8008314e04
#6 [ffffff802554bce0] kernfs_seq_show at ffffff8008314314
#7 [ffffff802554bd10] seq_read at ffffff80082a250c
#8 [ffffff802554bd70] kernfs_fop_read at ffffff80083135b8
#9 [ffffff802554be20] __vfs_read at ffffff800826fcc0
#10 [ffffff802554be40] vfs_read at ffffff800826ff08
#11 [ffffff802554be90] sys_read at ffffff80082704c4
#12 [ffffff802554bff0] el0_svc_naked at ffffff8008083fbc
The rootcause is deadlock caused by using same fd, and 843's file ops is block type;
If we want to trigger panic in the first time through hungtask,
must be avoid detect kernel threads on some platforms("mediatek"),
because they("kernel threads") cause misjudgments.
Guanyou.Chen
On Sat 07-08-21 21:16:00, chenguanyou wrote:
> > Please explain this point in more detail?
>
> In my work,the rootcause of more deadlocks often occurs in user threads.
>
> exp:
> PID: 711 TASK: ffffffc13eb71d00 CPU: 0 COMMAND: "[email protected]"
> #0 [ffffff80251cbcb0] __switch_to at ffffff80080866c4
> #1 [ffffff80251cbd20] __schedule at ffffff80090c0940
> #2 [ffffff80251cbd80] schedule_preempt_disabled at ffffff80090c0e4c
> #3 [ffffff80251cbde0] __mutex_lock at ffffff80090c2e58
> #4 [ffffff80251cbe40] __mutex_lock_slowpath at ffffff80090c1f78
> #5 [ffffff80251cbe50] mutex_lock at ffffff80090c1f60
> #6 [ffffff80251cbe60] __fdget_pos at ffffff800829ac84
> #7 [ffffff80251cbe90] sys_write at ffffff8008270550
> #8 [ffffff80251cbff0] el0_svc_naked at ffffff8008083fbc
>
> PID: 843 TASK: ffffffc135832b80 CPU: 2 COMMAND: "[email protected]"
> #0 [ffffff802554bb30] __switch_to at ffffff80080866c4
> #1 [ffffff802554bba0] __schedule at ffffff80090c0940
> #2 [ffffff802554bc00] schedule at ffffff80090c0d54
> #3 [ffffff802554bc50] xxx_sensor_show at ffffff8008bc043c
> #4 [ffffff802554bc80] dev_attr_show at ffffff8008668ce0
> #5 [ffffff802554bca0] sysfs_kf_seq_show at ffffff8008314e04
> #6 [ffffff802554bce0] kernfs_seq_show at ffffff8008314314
> #7 [ffffff802554bd10] seq_read at ffffff80082a250c
> #8 [ffffff802554bd70] kernfs_fop_read at ffffff80083135b8
> #9 [ffffff802554be20] __vfs_read at ffffff800826fcc0
> #10 [ffffff802554be40] vfs_read at ffffff800826ff08
> #11 [ffffff802554be90] sys_read at ffffff80082704c4
> #12 [ffffff802554bff0] el0_svc_naked at ffffff8008083fbc
>
> The rootcause is deadlock caused by using same fd, and 843's file ops is block type;
> If we want to trigger panic in the first time through hungtask,
> must be avoid detect kernel threads on some platforms("mediatek"),
> because they("kernel threads") cause misjudgments.
This still suggests that the primary purpose of the interface is to
paper over real problems that should be fixed instead.
--
Michal Hocko
SUSE Labs
> This still suggests that the primary purpose of the interface is to
> paper over real problems that should be fixed instead.
I know, but i don't care kernel threads state because of it doesn't neet to fixed.
The API only for debugging.
Guanyou.Chen
On Mon 09-08-21 19:52:38, chenguanyou wrote:
> > This still suggests that the primary purpose of the interface is to
> > paper over real problems that should be fixed instead.
>
> I know, but i don't care kernel threads state because of it doesn't neet to fixed.
> The API only for debugging.
Then I am afraid you have to keep this a local non-upstream feature in
your kernel. This doesn't look like an upstream material to me.
--
Michal Hocko
SUSE Labs
> Then I am afraid you have to keep this a local non-upstream feature in
> your kernel. This doesn't look like an upstream material to me.
Thank you for your reply.
Guanyou.Chen