Received: by 2002:ac0:a591:0:0:0:0:0 with SMTP id m17-v6csp1414039imm; Thu, 5 Jul 2018 22:57:27 -0700 (PDT) X-Google-Smtp-Source: AAOMgpc2Nwpt42Rdg/gSF7g2uaFKoTMw1z4qmqnTN0AhAiSIAkMjE74EBDCf8kgUXzG/4vDEKO3r X-Received: by 2002:a62:ce81:: with SMTP id y123-v6mr9381336pfg.95.1530856646950; Thu, 05 Jul 2018 22:57:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530856646; cv=none; d=google.com; s=arc-20160816; b=G/fGKTwWhlXqWetvC4gn1PSVRB2IdasGHifwR48RWbFUSNA6F3pVNhBHFyMWLV9CyQ pU20lsnK6QTeQmMGcU8fpzCzRp/6Hu3VnKQwk15q+k02nYnAujwv+4CnVEGioYUfUGat qmdKP0DHOV3gGPYQ9uuV72lDGCYQcLYRFZKqWjP1k22RCdYd5aaTefg46QQeqE/WAZOy ltE1UMM+O5OEyZOsJ65fPGHwU8sU56H4/qBf4yGeXP3raviAbUAmg7ESiTOpNZHaCCGu X6V0XFXxCwgJ6v62ufW1QiUyQo5XXMQES2O8VVoyNjMW8eLw9HmLqyIT5TuIxoLBOLdj 5Cpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=+ZuqMdSt3xOGhHQAgfSt9W6/NRQC+IWq5aceR2j9zVQ=; b=qkKVUJatNgqy0Oksej9zxgveol3toNlrxfixa+fJ6hqBV23YLxZIkEPwzYfqGLTVB/ 0v5MPBfxqHzPH2+22QwvCocYe7I/riBaTCndpyiV/Eq/HVjRF96yF/FgbW6BfDZ6KHkM VXpNT6HZbsljZ0jH/8u0YhzBLeEyUpgfANYBTMA9T+Bz7zEA5HzkN3k/0MOdUIADjPFN QoGTV/pnsyXQX1sTOL82PR6A23pfyuSSjl9x7UEUjKg7E6JuxlsYzSEla/yOUytSrkL1 +0vX4W1F0TlFh/FvdZA3cg1CLEfyFIVtLHo+joJGCwYqj8jKJRQAaT/bzhb8B5f1WCtr eUwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b59-v6si7590602plc.335.2018.07.05.22.57.13; Thu, 05 Jul 2018 22:57:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934614AbeGFFwb (ORCPT + 99 others); Fri, 6 Jul 2018 01:52:31 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:33858 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934578AbeGFFw0 (ORCPT ); Fri, 6 Jul 2018 01:52:26 -0400 Received: from localhost (D57D388D.static.ziggozakelijk.nl [213.125.56.141]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id DAFC3BA2; Fri, 6 Jul 2018 05:52:25 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Paul Burton , "Peter Zijlstra (Intel)" , Linus Torvalds , Thomas Gleixner , Ingo Molnar , Sasha Levin Subject: [PATCH 4.14 58/61] sched/core: Require cpu_active() in select_task_rq(), for user tasks Date: Fri, 6 Jul 2018 07:47:22 +0200 Message-Id: <20180706054714.558088084@linuxfoundation.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180706054712.332416244@linuxfoundation.org> References: <20180706054712.332416244@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Paul Burton [ Upstream commit 7af443ee1697607541c6346c87385adab2214743 ] select_task_rq() is used in a few paths to select the CPU upon which a thread should be run - for example it is used by try_to_wake_up() & by fork or exec balancing. As-is it allows use of any online CPU that is present in the task's cpus_allowed mask. This presents a problem because there is a period whilst CPUs are brought online where a CPU is marked online, but is not yet fully initialized - ie. the period where CPUHP_AP_ONLINE_IDLE <= state < CPUHP_ONLINE. Usually we don't run any user tasks during this window, but there are corner cases where this can happen. An example observed is: - Some user task A, running on CPU X, forks to create task B. - sched_fork() calls __set_task_cpu() with cpu=X, setting task B's task_struct::cpu field to X. - CPU X is offlined. - Task A, currently somewhere between the __set_task_cpu() in copy_process() and the call to wake_up_new_task(), is migrated to CPU Y by migrate_tasks() when CPU X is offlined. - CPU X is onlined, but still in the CPUHP_AP_ONLINE_IDLE state. The scheduler is now active on CPU X, but there are no user tasks on the runqueue. - Task A runs on CPU Y & reaches wake_up_new_task(). This calls select_task_rq() with cpu=X, taken from task B's task_struct, and select_task_rq() allows CPU X to be returned. - Task A enqueues task B on CPU X's runqueue, via activate_task() & enqueue_task(). - CPU X now has a user task on its runqueue before it has reached the CPUHP_ONLINE state. In most cases, the user tasks that schedule on the newly onlined CPU have no idea that anything went wrong, but one case observed to be problematic is if the task goes on to invoke the sched_setaffinity syscall. The newly onlined CPU reaches the CPUHP_AP_ONLINE_IDLE state before the CPU that brought it online calls stop_machine_unpark(). This means that for a portion of the window of time between CPUHP_AP_ONLINE_IDLE & CPUHP_ONLINE the newly onlined CPU's struct cpu_stopper has its enabled field set to false. If a user thread is executed on the CPU during this window and it invokes sched_setaffinity with a CPU mask that does not include the CPU it's running on, then when __set_cpus_allowed_ptr() calls stop_one_cpu() intending to invoke migration_cpu_stop() and perform the actual migration away from the CPU it will simply return -ENOENT rather than calling migration_cpu_stop(). We then return from the sched_setaffinity syscall back to the user task that is now running on a CPU which it just asked not to run on, and which is not present in its cpus_allowed mask. This patch resolves the problem by having select_task_rq() enforce that user tasks run on CPUs that are active - the same requirement that select_fallback_rq() already enforces. This should ensure that newly onlined CPUs reach the CPUHP_AP_ACTIVE state before being able to schedule user tasks, and also implies that bringup_wait_for_ap() will have called stop_machine_unpark() which resolves the sched_setaffinity issue above. I haven't yet investigated them, but it may be of interest to review whether any of the actions performed by hotplug states between CPUHP_AP_ONLINE_IDLE & CPUHP_AP_ACTIVE could have similar unintended effects on user tasks that might schedule before they are reached, which might widen the scope of the problem from just affecting the behaviour of sched_setaffinity. Signed-off-by: Paul Burton Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20180526154648.11635-2-paul.burton@mips.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/sched/core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1573,8 +1573,7 @@ int select_task_rq(struct task_struct *p * [ this allows ->select_task() to simply return task_cpu(p) and * not worry about this generic constraint ] */ - if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) || - !cpu_online(cpu))) + if (unlikely(!is_cpu_allowed(p, cpu))) cpu = select_fallback_rq(task_cpu(p), p); return cpu;