Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp2582215rwb; Mon, 15 Aug 2022 07:48:17 -0700 (PDT) X-Google-Smtp-Source: AA6agR4ZK+h47bN+FEE4uiOeZucoskbfXZAr8uXn5+r/3+DWvCNQuYmCytoJHWjGjNbqWUWZNGdY X-Received: by 2002:a63:e109:0:b0:419:c3bc:b89 with SMTP id z9-20020a63e109000000b00419c3bc0b89mr14033779pgh.176.1660574897484; Mon, 15 Aug 2022 07:48:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660574897; cv=none; d=google.com; s=arc-20160816; b=O04jT/TfBrNGZOtj7MzVkPb3ei9t0Uaoq/MqwQ4S/No78yaNXzAc4Qwsa8bpj5SkCs Yh+RjvI7sduv7bTj9qfbGp1yLrtR6ZCWhIhvKBW6GHZOMOgWbdYBrkcpbvChs/mnFRAf cXgVOS0R6L6SNqO9O0nrBRwfDwfyrJ9opLuQVhp5F3vieNla47K0MZNH0nrfLOXatRAM pRRn3nKDbtK8EEUm9tjhm9h+/rdTSDEBpAL8CFllxcJcoAhX9fj+gMvo03MEDVlJp+iN BB/Nu25EYtz44TuEcqAnYEcT5xvrCUp07c6ZUHHOPkTCY3+cLxP4eUSCfH39Hxq7e5Is zLKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=UfV7UrK+irne6GAgk+wBFpB/DtHiJX9EFsLGUAqb2oc=; b=QYJ/kz5VV9SUfxCO9T/EjLqXYbpmr6qhC/9s1antN7QjHQUtFZEJXoNjZnHyfAacUy d34feLP4LXHuYOZoJszg4t8Bk/CfMV7KAJtzjxzJGW7CMiEcBLSupWTTvjgVt8PvHLm2 IcqYyYrtXKwbAlKe+bUM2q7tbTjZKad8qWwgb35he14rDikTqvHPL2ShdnXVjD+ttEPH ZPzjT7YWSLXG81bdX9uIDer0jZ/x276qyY0ZgqhtHk5uUqOfryoKpbVDojvVpaSXMJSl wdl69yp3adepvoi1CmTaFfZMbv0Ul1voEG2UMDws8IVS6ir2rhlmfm2ZH5EEEa5nu6MB saJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=EhnMpH5B; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y200-20020a6264d1000000b0052b1d70991csi9612330pfb.242.2022.08.15.07.48.05; Mon, 15 Aug 2022 07:48:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=EhnMpH5B; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241767AbiHOOl6 (ORCPT + 99 others); Mon, 15 Aug 2022 10:41:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232690AbiHOOl5 (ORCPT ); Mon, 15 Aug 2022 10:41:57 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15A091573C; Mon, 15 Aug 2022 07:41:55 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B776537515; Mon, 15 Aug 2022 14:41:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1660574513; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=UfV7UrK+irne6GAgk+wBFpB/DtHiJX9EFsLGUAqb2oc=; b=EhnMpH5BKOHqbs8xeGCnOVw6+0G5OZe6Rpr/ISxuKHe1aC4L+Cuo8/AmPJBLlCB+gTo4t/ 9I2Pmg67OxeHq8NM939vc6VCUizI76SkVeYA3xQM7cXRPPkvS/pI+3x8wzfoEyUD+oWN2K 8j2xxcpxiph5iXhtzwsFVEkQWGZiV5A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1660574513; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=UfV7UrK+irne6GAgk+wBFpB/DtHiJX9EFsLGUAqb2oc=; b=mDogXZxBHBrBcAEs5GLQP00EbvzxXZDg3S8XmtsksFVNs1rShDdZsCGASwooYRz3qtgJ1E 3oI+E49ly/rwTVAA== Received: from suse.de (unknown [10.163.43.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 2476F2C1C6; Mon, 15 Aug 2022 14:41:51 +0000 (UTC) Date: Mon, 15 Aug 2022 15:41:43 +0100 From: Mel Gorman To: Ingo Molnar Cc: Linus Torvalds , David Hildenbrand , linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, Andrew Morton , Greg Kroah-Hartman , Axel Rasmussen , Peter Xu , Hugh Dickins , Andrea Arcangeli , Matthew Wilcox , Vlastimil Babka , John Hubbard , Jason Gunthorpe , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt Subject: Re: [PATCH v2] sched/all: Change all BUG_ON() instances in the scheduler to WARN_ON_ONCE() Message-ID: <20220815144143.zjsiamw5y22bvgki@suse.de> References: <20220808073232.8808-1-david@redhat.com> <1a48d71d-41ee-bf39-80d2-0102f4fe9ccb@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 12, 2022 at 11:29:18AM +0200, Ingo Molnar wrote: > From: Ingo Molnar > Date: Thu, 11 Aug 2022 08:54:52 +0200 > Subject: [PATCH] sched/all: Change all BUG_ON() instances in the scheduler to WARN_ON_ONCE() > > There's no good reason to crash a user's system with a BUG_ON(), > chances are high that they'll never even see the crash message on > Xorg, and it won't make it into the syslog either. > > By using a WARN_ON_ONCE() we at least give the user a chance to report > any bugs triggered here - instead of getting silent hangs. > > None of these WARN_ON_ONCE()s are supposed to trigger, ever - so we ignore > cases where a NULL check is done via a BUG_ON() and we let a NULL > pointer through after a WARN_ON_ONCE(). > > There's one exception: WARN_ON_ONCE() arguments with side-effects, > such as locking - in this case we use the return value of the > WARN_ON_ONCE(), such as in: > > - BUG_ON(!lock_task_sighand(p, &flags)); > + if (WARN_ON_ONCE(!lock_task_sighand(p, &flags))) > + return; > > Suggested-by: Linus Torvalds > Signed-off-by: Ingo Molnar > Link: https://lore.kernel.org/r/YvSsKcAXISmshtHo@gmail.com > --- > kernel/sched/autogroup.c | 3 ++- > kernel/sched/core.c | 2 +- > kernel/sched/cpupri.c | 2 +- > kernel/sched/deadline.c | 26 +++++++++++++------------- > kernel/sched/fair.c | 10 +++++----- > kernel/sched/rt.c | 2 +- > kernel/sched/sched.h | 6 +++--- > 7 files changed, 26 insertions(+), 25 deletions(-) > > diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c > index fa9ce9d83683..a286e726eb4b 100644 > --- a/kernel/sched/cpupri.c > +++ b/kernel/sched/cpupri.c > @@ -147,7 +147,7 @@ int cpupri_find_fitness(struct cpupri *cp, struct task_struct *p, > int task_pri = convert_prio(p->prio); > int idx, cpu; > > - BUG_ON(task_pri >= CPUPRI_NR_PRIORITIES); > + WARN_ON_ONCE(task_pri >= CPUPRI_NR_PRIORITIES); > > for (idx = 0; idx < task_pri; idx++) { > Should the return value be used here to clamp task_pri to CPUPRI_NR_PRIORITIES? task_pri is used for index which in __cpupri_find then accesses beyond the end of an array and the future behaviour will be very unpredictable. > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index 0ab79d819a0d..962b169b05cf 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -2017,7 +2017,7 @@ static struct task_struct *pick_task_dl(struct rq *rq) > return NULL; > > dl_se = pick_next_dl_entity(dl_rq); > - BUG_ON(!dl_se); > + WARN_ON_ONCE(!dl_se); > p = dl_task_of(dl_se); > > return p; It's a somewhat redundant check, it'll NULL pointer dereference shortly afterwards but it'll be a bit more obvious. > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 914096c5b1ae..28f10dccd194 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2600,7 +2600,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags, > if (!join) > return; > > - BUG_ON(irqs_disabled()); > + WARN_ON_ONCE(irqs_disabled()); > double_lock_irq(&my_grp->lock, &grp->lock); > > for (i = 0; i < NR_NUMA_HINT_FAULT_STATS * nr_node_ids; i++) { Recoverable with a goto no_join. It'll be a terrible recovery because there is no way IRQs should be disabled here. Something else incredibly bad happened before this would fire. > @@ -7279,7 +7279,7 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_ > return; > > find_matching_se(&se, &pse); > - BUG_ON(!pse); > + WARN_ON_ONCE(!pse); > > cse_is_idle = se_is_idle(se); > pse_is_idle = se_is_idle(pse); Similar to pick_task_dl. > @@ -10134,7 +10134,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, > goto out_balanced; > } > > - BUG_ON(busiest == env.dst_rq); > + WARN_ON_ONCE(busiest == env.dst_rq); > > schedstat_add(sd->lb_imbalance[idle], env.imbalance); > goto out if it triggers? It'll just continue to be unbalanced. > @@ -10430,7 +10430,7 @@ static int active_load_balance_cpu_stop(void *data) > * we need to fix it. Originally reported by > * Bjorn Helgaas on a 128-CPU setup. > */ > - BUG_ON(busiest_rq == target_rq); > + WARN_ON_ONCE(busiest_rq == target_rq); > > /* Search for an sd spanning us and the target CPU. */ > rcu_read_lock(); goto out_unlock if it fires? For the rest, I didn't see obvious recovery paths that would allow the system to run predictably. Any of them firing will have unpredictable consequences (e.g. move_queued_task firing would be fun if it was a per-cpu kthread). Depending on which warning triggers, the remaining life of the system may be very short but maybe long enough to be logged even if system locks up shortly afterwards. -- Mel Gorman SUSE Labs