Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3584004rdh; Thu, 28 Sep 2023 16:56:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHOfzsAP00kQUcn3ALrBcfJiCYJq+roZWtn25c8UhaDeiUNVCkJnRQkEIwGXnJa+73apHdM X-Received: by 2002:a17:90a:e38a:b0:26f:b228:faea with SMTP id b10-20020a17090ae38a00b0026fb228faeamr2447083pjz.18.1695945408686; Thu, 28 Sep 2023 16:56:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695945408; cv=none; d=google.com; s=arc-20160816; b=butjAsrABC9Mg1QkSnUGY3upm5uicA8GcaMb7J0p12PYxN40odu4bSsWvs4G5bfptb CzUa8mBaIaxTKp6ZQAi8PkyQJ1htClalLQ6CccY2kJsFT/wil1E8KsrYe97hV/8I6wie 9M6ybYvLfyf/5+NGpr2ZYhY5umS7fIMNeXuWnat9VHot0hw8tu/LT8GgePeWtpt683xz OgQMY1MGzUqLbWvpK6yyP7cntQPigbK+0t4RnpUZCI8NCemvDe+ajR/NCZnAnYf5ioYK UV4s2d7mD/xN3MdA3ZwN7I1KlmpwLWdUWlt9NfbkWhwQstgrigJoikRXmieB9zO6J104 O7jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=oe1hnQ8O830GEjIaYbkhYp049qinjisAloma8VEM6yo=; fh=qrssI6r8xzmATeKK8E03mSo8TY+f43Gvaknuag/mjoc=; b=knQg6xSwWBpwjpHrqMEmyKIX1xL29l//45kKVjuXxGzv1GOWcJRc2RUXPHr9tcrrQQ lifOLjm4GqIGnf33jUPcxAI+vyskkc/Z6xQT9p0ECZ1cEXkDE1zKh5pgKtKrJC5tQl6k DmRJSPS5HO4NZXL3gWYNuAdrM3mk1DnnFbmh5NDbCaeRGMfd9XWYZj/W/H7emAiyPYZT NiGV6lsqeR2jK+luq6IMLRoJT+/JItqzX9uokwvEwAHSh+dKeo+y0GoEH5NGsZdwWVea mzY+ODkmeleW04W2VA9tXejrhYqibJnzvMI+e+8Ar/ZLXu/H0kFawPp0s8iWT1lmhfrD jVyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="n1L6U/qB"; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id om12-20020a17090b3a8c00b00274274bf0ecsi305870pjb.61.2023.09.28.16.56.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 16:56:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="n1L6U/qB"; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id AE4ED835C201; Thu, 28 Sep 2023 14:11:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232303AbjI1VLQ (ORCPT + 99 others); Thu, 28 Sep 2023 17:11:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232052AbjI1VLP (ORCPT ); Thu, 28 Sep 2023 17:11:15 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 55A101A2; Thu, 28 Sep 2023 14:11:12 -0700 (PDT) Date: Thu, 28 Sep 2023 21:11:10 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695935470; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oe1hnQ8O830GEjIaYbkhYp049qinjisAloma8VEM6yo=; b=n1L6U/qBJfah3tECd9qQQtOPJ+fbVsqTdcwU2zV/XnQ7PJhm1HpvstxL/bdNGrvU3LpLCp 0Z7j3OTQWQA+ZdcNE9qlqci2n1BvGRz7c9kPeACQ5F4G+WUdTEqJYKlkLlwIVORZ8VFaUV j3NZmYknkZI/JpUmrtKTMeEefd/fAGtdv8s9u+IYGY4+U3ZwDJ2RDXZx71kO25ykOTlacS e5sE7ZQ5roOsoKMIIVU1ZqoEUeg2lAK+t+nDL2Jyb63UYQ88HPbCv02sH2Ej/2f6e2S6FB 54thRvmJ9AmxP6z36/UMP/S8fjHY2mhqHukHyprqtLLKBfFuPE1kkZEECfUh3w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695935470; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oe1hnQ8O830GEjIaYbkhYp049qinjisAloma8VEM6yo=; b=sI/kCc2n9pPftBQJdDM20sOsXVUVhVgFlQ3QPrc5vTCLEJTzlwu5PNMezk5JTeugJIViH6 bH12Y2/z35QR52Dw== From: "tip-bot2 for Joel Fernandes (Google)" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] sched/rt: Fix live lock between select_fallback_rq() and RT push Cc: "Joel Fernandes (Google)" , Ingo Molnar , "Paul E. McKenney" , stable@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20230923011409.3522762-1-joel@joelfernandes.org> References: <20230923011409.3522762-1-joel@joelfernandes.org> MIME-Version: 1.0 Message-ID: <169593547011.27769.15927547566549866294.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 28 Sep 2023 14:11:37 -0700 (PDT) The following commit has been merged into the sched/urgent branch of tip: Commit-ID: fc09027786c900368de98d03d40af058bcb01ad9 Gitweb: https://git.kernel.org/tip/fc09027786c900368de98d03d40af058bcb01ad9 Author: Joel Fernandes (Google) AuthorDate: Sat, 23 Sep 2023 01:14:08 Committer: Ingo Molnar CommitterDate: Thu, 28 Sep 2023 22:58:13 +02:00 sched/rt: Fix live lock between select_fallback_rq() and RT push During RCU-boost testing with the TREE03 rcutorture config, I found that after a few hours, the machine locks up. On tracing, I found that there is a live lock happening between 2 CPUs. One CPU has an RT task running, while another CPU is being offlined which also has an RT task running. During this offlining, all threads are migrated. The migration thread is repeatedly scheduled to migrate actively running tasks on the CPU being offlined. This results in a live lock because select_fallback_rq() keeps picking the CPU that an RT task is already running on only to get pushed back to the CPU being offlined. It is anyway pointless to pick CPUs for pushing tasks to if they are being offlined only to get migrated away to somewhere else. This could also add unwanted latency to this task. Fix these issues by not selecting CPUs in RT if they are not 'active' for scheduling, using the cpu_active_mask. Other parts in core.c already use cpu_active_mask to prevent tasks from being put on CPUs going offline. With this fix I ran the tests for days and could not reproduce the hang. Without the patch, I hit it in a few hours. Signed-off-by: Joel Fernandes (Google) Signed-off-by: Ingo Molnar Tested-by: Paul E. McKenney Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230923011409.3522762-1-joel@joelfernandes.org --- kernel/sched/cpupri.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/sched/cpupri.c b/kernel/sched/cpupri.c index a286e72..42c40cf 100644 --- a/kernel/sched/cpupri.c +++ b/kernel/sched/cpupri.c @@ -101,6 +101,7 @@ static inline int __cpupri_find(struct cpupri *cp, struct task_struct *p, if (lowest_mask) { cpumask_and(lowest_mask, &p->cpus_mask, vec->mask); + cpumask_and(lowest_mask, lowest_mask, cpu_active_mask); /* * We have to ensure that we have at least one bit