Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1120051iob; Thu, 12 May 2022 11:32:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxf4F67S+iCFtGUcLfDCaCpdwW7U3sQrj3b0znJDSc9aGn3RY1qhMtK8Qxmsy+dsybo49Ch X-Received: by 2002:a17:907:6e07:b0:6f4:d185:9f57 with SMTP id sd7-20020a1709076e0700b006f4d1859f57mr1088196ejc.668.1652380373071; Thu, 12 May 2022 11:32:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652380373; cv=none; d=google.com; s=arc-20160816; b=H3BUj7EApDTiefbXIITfrOxCHo/N7TMm7Db5wRYiX5l5Ruszwph7gGrgfKdYB4MiWO c6+vGUufNqfD9CIzilxf7tFYtn1WPoUCnPBvL8aE5X53JJc6lW25cWe7x33hvdqc28C+ 2cNE9MUR5TBmNaQMLxM9KkFWVsCG3gVv9wdXpj3g6fnZSLt6K30c74c0UbjHLyCkpsfH 1hcrme86IZ+TjiLwdbvYCJX9DrMTScNZsn/qBQBHgCYopDaGR1TYl+jmzbkoyFyfhrSd BxPAjIR5xjGKDnjWWsmjX0SwuPTiTjDO4Po51jGH2PkQQZvFgEEuOk+2p3HUti1c/bcB LWBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:abuse-reports-to:tuid:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from; bh=gr1Fywcllnxf6x1DByww62nZBBzwEfPJ1quw/wwNp08=; b=FvSj1yBP7EsIYsAquv49EItuDn3GFRd5Ey3f2WtVPo5+5yKzTpx87ZseFBFI98kanS ozkNbvG5kdASdBbIBna8sQBccs6+EEmIlO2WcpnngHbmIVDiUYH4N73cjBKjPV3XuWk+ 086cR9XtFwU1iYNqQFkTJM6N3jE4gSOvpNkgoTzQ5Ppm12FskpYs96uOuQtXtZ2LoR9B IhBVrf/YpxQ8afORzqZSY4zZ22PAH1pWXGZ7vQJuHCs9e/ThNMWYLhxtzO3rIUf+0AM6 pfpd4KtAaflES9a7qNwutNMiyz044en03cw2df0/X1hN3ylFhXaB3Q+FfsXO6ttmNIhh A13Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o18-20020a170906975200b006fa16afa43fsi85921ejy.62.2022.05.12.11.32.25; Thu, 12 May 2022 11:32:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346721AbiELFiz (ORCPT + 99 others); Thu, 12 May 2022 01:38:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234095AbiELFix (ORCPT ); Thu, 12 May 2022 01:38:53 -0400 Received: from support.corp-email.com (support.corp-email.com [222.73.234.235]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D2CE21687C for ; Wed, 11 May 2022 22:38:44 -0700 (PDT) Received: from ([114.119.32.142]) by support.corp-email.com ((D)) with ASMTP (SSL) id GAY00136; Thu, 12 May 2022 13:38:36 +0800 Received: from localhost.localdomain (172.16.34.28) by GCY-MBS-28.TCL.com (10.136.3.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Thu, 12 May 2022 13:38:36 +0800 From: Rokudo Yan To: , , CC: , Subject: [PATCH] sched/core: fix the order of update sched_reset_on_fork and policy Date: Thu, 12 May 2022 13:38:16 +0800 Message-ID: <20220512053816.27687-1-wu-yan@tcl.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [172.16.34.28] X-ClientProxiedBy: GCY-EXS-09.TCL.com (10.74.128.159) To GCY-MBS-28.TCL.com (10.136.3.28) tUid: 202251213383689c2c2f6db613b8e7646a1c0d3f54138 X-Abuse-Reports-To: service@corp-email.com Abuse-Reports-To: service@corp-email.com X-Complaints-To: service@corp-email.com X-Report-Abuse-To: service@corp-email.com X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When child process is forked during updating scheduler policy of parent process, there is a small time window that child copy inconsistent policy parameters from the parent, which cause unexpected high priority to unprivileged child. This may cause unexpected behavior of child process(even hog the cpu and hang the system in some scenario). Resource manager(privileged) App task(unprivileged) sched_setscheduler(p, policy=SCHED_FIFO|SCHED_RESET_ON_FORK) p->policy is SCHED_FIFO ... p->sched_reset_on_fork is 1 ... clone sched_setscheduler(p,policy=SCHED_NORMAL) -kernel_clone -_sched_setscheduler -copy_process -__sched_setscheduler -dup_task_struct set p->sched_reset_on_fork to 0 copy p's task struct we get ... policy = SCHED_FIFO sched_reset_on_fork = 0 set p->policy = SCHED_NORMAL this leak FIFO priority to to child task. Signed-off-by: Rokudo Yan Cc: Tang Ding --- kernel/fork.c | 28 +++++++++++++++++++++++++++- kernel/sched/core.c | 20 +++++++++++++++++++- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index f1e89007f228..90f3c3f59316 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -871,12 +871,19 @@ void set_task_stack_end_magic(struct task_struct *tsk) *stackend = STACK_END_MAGIC; /* for overflow detection */ } +static inline bool has_rt_dl_policy(struct task_struct *tsk) +{ + int policy = tsk->policy; + return policy == SCHED_FIFO || policy == SCHED_RR || + policy == SCHED_DEADLINE; +} + static struct task_struct *dup_task_struct(struct task_struct *orig, int node) { struct task_struct *tsk; unsigned long *stack; struct vm_struct *stack_vm_area __maybe_unused; - int err; + int err, reset_on_fork; if (node == NUMA_NO_NODE) node = tsk_fork_get_node(orig); @@ -893,7 +900,26 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) stack_vm_area = task_stack_vm_area(tsk); + reset_on_fork = orig->sched_reset_on_fork; + /* + * Match the barrier before 'sched_reset_on_fork = 0' in __sched_setscheduler, + * this guarantees that if we see 'sched_reset_on_fork = 0' we must also see + * 'policy = SCHED_NORMAL' when the child is forked during updating parent to + * noraml class by sched_setscheduler(SCHED_NORMAL) + */ + smp_rmb(); err = arch_dup_task_struct(tsk, orig); + tsk->sched_reset_on_fork = reset_on_fork; + if (!reset_on_fork && has_rt_dl_policy(tsk)) { + /* + * Match the barrier after 'sched_reset_on_fork = 1' in __sched_setscheduler, + * this guarantees that if we see 'policy=SCHED_FIFO' we must also see + * 'sched_reset_on_fork = 1' when the child is forked during updating parent + * to rt class by sched_setscheduler(SCHED_FIFO|SCHED_RESET_ON_FORK) + */ + smp_rmb(); + tsk->sched_reset_on_fork = orig->sched_reset_on_fork; + } /* * arch_dup_task_struct() clobbers the stack-related fields. Make diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1eec4925b8c6..8d12f29fd888 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7447,7 +7447,6 @@ static int __sched_setscheduler(struct task_struct *p, goto unlock; } - p->sched_reset_on_fork = reset_on_fork; oldprio = p->prio; newprio = __normal_prio(policy, attr->sched_priority, attr->sched_nice); @@ -7472,6 +7471,15 @@ static int __sched_setscheduler(struct task_struct *p, put_prev_task(rq, p); prev_class = p->sched_class; + if (reset_on_fork && !p->sched_reset_on_fork) { + p->sched_reset_on_fork = 1; + /* + * Make sure sched_reset_on_fork(1) visible before updating sched policy + * to avoid rt/deadline priority leak to unprivileged child process if + * it is forked during sched policy change of parent process. + */ + smp_wmb(); + } if (!(attr->sched_flags & SCHED_FLAG_KEEP_PARAMS)) { __setscheduler_params(p, attr); @@ -7479,6 +7487,16 @@ static int __sched_setscheduler(struct task_struct *p, } __setscheduler_uclamp(p, attr); + if (!reset_on_fork && p->sched_reset_on_fork) { + /* + * Make sure p's sched policy visible before reset sched_reset_on_fork(0) + * to avoid rt/deadline priority leak to unprivileged child process if + * it is forked during sched policy change of parent process. + */ + smp_wmb(); + p->sched_reset_on_fork = 0; + } + if (queued) { /* * We enqueue to tail when the priority of a task is -- 2.25.1