Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp15119283rwb; Mon, 28 Nov 2022 08:10:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf5UQAD0hnf4V6M9qc+XqtxUyIwJ+QT2Sz1zS0iBXrSvZIn0pe3TLdIwahZOxUCJbQcbbL4Y X-Received: by 2002:a05:6402:294d:b0:467:6b55:3cf5 with SMTP id ed13-20020a056402294d00b004676b553cf5mr48900001edb.22.1669651803134; Mon, 28 Nov 2022 08:10:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669651803; cv=none; d=google.com; s=arc-20160816; b=I7upRszhaVfGIZH/NXsTq0zdHZusm69ZdbhSvx5WhnuRpGoNTt7IFnBgtrt6hguG95 JDsA5VWXN8MLvSMkP+OZ3KIA3yzoBAPbNvktBg6OsfIfJN2mhbuTADLcGjo12XcxGnfG oLYXwLRrtoiWi3NK9e+wDnPzUYcHzWbpETtdWaMFZLvIHiduYOdIbVMjjWxVpPdNZ74z b8Rtid5/KdMjcwJ+kBtQ1Rr8g9QQr8rp4RLTeFn3ErByIO3RlRwEk5qCXUJ4V9KKhDEA zsVcSCTMGLRPbLLOWfH1BpLInoulTGhkJL+uRmDvG5uNUI6PVG7fNCDPcEZVJW6qo4KR cuSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:to:from:date:dkim-signature; bh=VRG51kohfD387vXDAcVs+Eyr77C58RyR3DMFs8PVI4A=; b=kxwBlYL/IanKbs2uebGpGpkUIlWJAWHVaW0Bl3lRdPrjJV9CX7pEj1L/tA0+Q7N1fJ BNufOzgHkYn05opWPM8eGkGl6SUYGHD+7cB6utwUUdOfCdGJ8CqFrCWaPEW8XXoJ0LK7 tt37PayvfwfJRxtL3sA5nDFs0LU1oUM2LNRfHM2DSkMTfJM0axbK50JFg9yuEJjVlgON 1tyH5qmL4MZVAw82FwqjCbqDbfx8Mssr8OffkBZoH1SiBvHnAfGB4s9AcC+sN9gR3o0k aZ0rjhuuuLClPMU7w0qk7Arolq0970oQmzQFkc+GZxdU9mRN7rIi1hd05sKZTeyWQVwN 0vaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b="e/yyG3rr"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ae12-20020a17090725cc00b00781a47397b1si117053ejc.502.2022.11.28.08.09.40; Mon, 28 Nov 2022 08:10:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b="e/yyG3rr"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232323AbiK1PsV (ORCPT + 84 others); Mon, 28 Nov 2022 10:48:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232424AbiK1PsH (ORCPT ); Mon, 28 Nov 2022 10:48:07 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57EA7A198 for ; Mon, 28 Nov 2022 07:48:06 -0800 (PST) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id EEABC21B20; Mon, 28 Nov 2022 15:48:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1669650484; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VRG51kohfD387vXDAcVs+Eyr77C58RyR3DMFs8PVI4A=; b=e/yyG3rrXwEYTSvYmM+E7f0MEBAib6h35woMeQccMbv/thoByid5rYbE9BECzm36r3Yjg/ z172bP3IpjSuhitnCyUsduUCWzwhRBbkvvjGWb1zrg369pIS2aQ/qtH1yu8pibIhDc6Yvg kiV5WqLId8+6TJJzlb/DD+fKej9RpVc= Received: from suse.cz (unknown [10.100.201.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id D46D02C143; Mon, 28 Nov 2022 15:48:04 +0000 (UTC) Date: Mon, 28 Nov 2022 16:48:04 +0100 From: Petr Mladek To: akpm@linux-foundation.org, peterz@infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] kernel/hung_task: print real_parent->comm, pid in check_hung_task Message-ID: References: <20221124112526.GA21832@didi-ThinkCentre-M930t-N000> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221124112526.GA21832@didi-ThinkCentre-M930t-N000> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 2022-11-24 19:25:26, Tio Zhang wrote: > We can avoid a hung task by fixing the process who causes it. > But sometimes it is difficult to find out which service > the bad process belongs to by only knowing its pid and comm. > Since userspace tools to learn who launches the bad process > do not always work when we get a hung task, > it is helpful printing the parent by kernel. Could you please be more specific how the information about the parent helped to debug the problem? Was it really important who started the process? Was it related to some cgroup limits or permissions? > Signed-off-by: Tio Zhang > --- > kernel/hung_task.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index c71889f3f3fc..33543d27bd5c 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -89,6 +89,7 @@ static struct notifier_block panic_block = { > > static void check_hung_task(struct task_struct *t, unsigned long timeout) > { > + struct task_struct *p = t->real_parent; IMHO, this should be read using rcu_dereference(t->real_parent). Note that check_hung_task() is already called under rcu_read_lock() from check_hung_uninterruptible_tasks(). > unsigned long switch_count = t->nvcsw + t->nivcsw; > > /* > @@ -129,8 +130,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout) > if (sysctl_hung_task_warnings) { > if (sysctl_hung_task_warnings > 0) > sysctl_hung_task_warnings--; > - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", > - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ); > + pr_err("INFO: task %s:%d, parent %s:%d blocked for more than %ld seconds.\n", > + t->comm, t->pid, p->comm, p->pid, (jiffies - t->last_switch_time) / HZ); IMHO, this is a wrong place. The formulation creates more harm than good. It might confuse people that both processes are blocked. Or it makes the feeling that the parent somehow created the deadlock. But if I get it correctly, the information about the parent is needed only in special situations where only a particular parent triggers the lockup. > pr_err(" %s %s %.*s\n", > print_tainted(), init_utsname()->release, > (int)strcspn(init_utsname()->version, " "), Alternative solution would be to print the parent in sched_show_task() that is called here as well. sched_show_task() prints many useful information that might be useful for debugging. And the parent is just yet another information that might bu useful. Also sched_show_task() is called in more situations where this information might be useful as well. Best Regards, Petr