Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp254927ybd; Tue, 25 Jun 2019 20:50:43 -0700 (PDT) X-Google-Smtp-Source: APXvYqz17exnLPXmrvqIvzplhlC/1QmfGaCygv6z+MC26Dep1rO9aYrhMhHY4wz8yxIcxiXCvwnl X-Received: by 2002:a63:dd53:: with SMTP id g19mr589144pgj.3.1561521042916; Tue, 25 Jun 2019 20:50:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561521042; cv=none; d=google.com; s=arc-20160816; b=kmzhPqVNjqKkF8/PTh31qLxw6U4vVYVwT03wmHCfCBVRWgskY03bQFRBnwBEib5yhy lpnMOugkigYY1WUPAu0grSrw0syhzXFkf+QBXJp52OGcINCxN1fHoEeZ/d4wpmCaWbGt KmZaAYHY7Xw4AgU/yiAfuqwUFxCdpTDkcTtZoS5Zb3Y6yVO5z6JVy33Rk9SxwOqCYtPg dB9QRLkXrsYGw8FwhzwULGVu5axfsFZqO32XAVVLVBypIudwfN0WhPQ0uORE8pQsbRAW yiWgizzxsKThuTKcNIe/hZ24Ldn4+7WjTvuZlaestQaz4siTJrgdwrR9W2/eBp08Nq4C LqQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KiBH+eyyUggqIArYiWjOq2LwB/j1ISQBz72DH1LfMjg=; b=HAC9/pq1Ggs+ZLf1B1YcT8pe4KskQYVAMCrXCSzk/EO5a1PelOejA7feVFTr2vup39 XJyFfTVJCr1pygTH+R0HlLmDq+8KQMVCpsMaedrrjFq3lDDLF90REr1XUc3SZ9i5F+Wx 0X/WsBiez800sMX8qgEHT5dtRsgChErMcrvumuqSWAJl5pELUvvbKuog+XFtWQT+40gf 2A9qrV9kftqdrDILRCruq6UNWXD+xT/39BG/cfyt2v44ZBgg31xEjMopkh4itin34zzw ZYymNZJv7GdRa38/NbUIG4JDQX3lCRMMafxz7zIGH0vz1BMTSvmMPdHp3hIu+9eWiJ/a i2bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=kVeh1Y5c; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z4si299891plo.297.2019.06.25.20.50.26; Tue, 25 Jun 2019 20:50:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=kVeh1Y5c; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727351AbfFZDnb (ORCPT + 99 others); Tue, 25 Jun 2019 23:43:31 -0400 Received: from mail.kernel.org ([198.145.29.99]:54584 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727317AbfFZDnY (ORCPT ); Tue, 25 Jun 2019 23:43:24 -0400 Received: from sasha-vm.mshome.net (mobile-107-77-172-74.mobile.att.net [107.77.172.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 82BD821655; Wed, 26 Jun 2019 03:43:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1561520602; bh=CTgbSb/STtDheisGfsL3f2cO9SoZaxgdEaueJD9i2Z0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kVeh1Y5cYmS6lYKXK0JkoSVsXdEOUj+odj8m+8uSwjLwu/BeoWznadoKY/0PXZuLO h2ChXYMqwjjoCouByQ9tYryqz59ZeV0K7ujgPn9lFwAxo0bPXBGtoDxz1yxAN7B27L MGGeCAslSNQdfVJjk7DnITTCGSKS5oqRkO8IF2t4= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Joel Savitz , Waiman Long , Phil Auld , Peter Zijlstra , Tejun Heo , Sasha Levin , cgroups@vger.kernel.org Subject: [PATCH AUTOSEL 5.1 46/51] cpuset: restore sanity to cpuset_cpus_allowed_fallback() Date: Tue, 25 Jun 2019 23:41:02 -0400 Message-Id: <20190626034117.23247-46-sashal@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190626034117.23247-1-sashal@kernel.org> References: <20190626034117.23247-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Joel Savitz [ Upstream commit d477f8c202d1f0d4791ab1263ca7657bbe5cf79e ] In the case that a process is constrained by taskset(1) (i.e. sched_setaffinity(2)) to a subset of available cpus, and all of those are subsequently offlined, the scheduler will set tsk->cpus_allowed to the current value of task_cs(tsk)->effective_cpus. This is done via a call to do_set_cpus_allowed() in the context of cpuset_cpus_allowed_fallback() made by the scheduler when this case is detected. This is the only call made to cpuset_cpus_allowed_fallback() in the latest mainline kernel. However, this is not sane behavior. I will demonstrate this on a system running the latest upstream kernel with the following initial configuration: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffff Cpus_allowed_list: 0-63 (Where cpus 32-63 are provided via smt.) If we limit our current shell process to cpu2 only and then offline it and reonline it: # taskset -p 4 $$ pid 2272's current affinity mask: ffffffffffffffff pid 2272's new affinity mask: 4 # echo off > /sys/devices/system/cpu/cpu2/online # dmesg | tail -3 [ 2195.866089] process 2272 (bash) no longer affine to cpu2 [ 2195.872700] IRQ 114: no longer affine to CPU2 [ 2195.879128] smpboot: CPU 2 is now offline # echo on > /sys/devices/system/cpu/cpu2/online # dmesg | tail -1 [ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4 We see that our current process now has an affinity mask containing every cpu available on the system _except_ the one we originally constrained it to: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,fffffffb Cpus_allowed_list: 0-1,3-63 This is not sane behavior, as the scheduler can now not only place the process on previously forbidden cpus, it can't even schedule it on the cpu it was originally constrained to! Other cases result in even more exotic affinity masks. Take for instance a process with an affinity mask containing only cpus provided by smt at the moment that smt is toggled, in a configuration such as the following: # taskset -p f000000000 $$ # grep -i cpu /proc/$$/status Cpus_allowed: 000000f0,00000000 Cpus_allowed_list: 36-39 A double toggle of smt results in the following behavior: # echo off > /sys/devices/system/cpu/smt/control # echo on > /sys/devices/system/cpu/smt/control # grep -i cpus /proc/$$/status Cpus_allowed: ffffff00,ffffffff Cpus_allowed_list: 0-31,40-63 This is even less sane than the previous case, as the new affinity mask excludes all smt-provided cpus with ids less than those that were previously in the affinity mask, as well as those that were actually in the mask. With this patch applied, both of these cases end in the following state: # grep -i cpu /proc/$$/status Cpus_allowed: ffffffff,ffffffff Cpus_allowed_list: 0-63 The original policy is discarded. Though not ideal, it is the simplest way to restore sanity to this fallback case without reinventing the cpuset wheel that rolls down the kernel just fine in cgroup v2. A user who wishes for the previous affinity mask to be restored in this fallback case can use that mechanism instead. This patch modifies scheduler behavior by instead resetting the mask to task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy mode. I tested the cases above on both modes. Note that the scheduler uses this fallback mechanism if and only if _every_ other valid avenue has been traveled, and it is the last resort before calling BUG(). Suggested-by: Waiman Long Suggested-by: Phil Auld Signed-off-by: Joel Savitz Acked-by: Phil Auld Acked-by: Waiman Long Acked-by: Peter Zijlstra (Intel) Signed-off-by: Tejun Heo Signed-off-by: Sasha Levin --- kernel/cgroup/cpuset.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 4834c4214e9c..6c9deb2cc687 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3255,10 +3255,23 @@ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) spin_unlock_irqrestore(&callback_lock, flags); } +/** + * cpuset_cpus_allowed_fallback - final fallback before complete catastrophe. + * @tsk: pointer to task_struct with which the scheduler is struggling + * + * Description: In the case that the scheduler cannot find an allowed cpu in + * tsk->cpus_allowed, we fall back to task_cs(tsk)->cpus_allowed. In legacy + * mode however, this value is the same as task_cs(tsk)->effective_cpus, + * which will not contain a sane cpumask during cases such as cpu hotplugging. + * This is the absolute last resort for the scheduler and it is only used if + * _every_ other avenue has been traveled. + **/ + void cpuset_cpus_allowed_fallback(struct task_struct *tsk) { rcu_read_lock(); - do_set_cpus_allowed(tsk, task_cs(tsk)->effective_cpus); + do_set_cpus_allowed(tsk, is_in_v2_mode() ? + task_cs(tsk)->cpus_allowed : cpu_possible_mask); rcu_read_unlock(); /* -- 2.20.1