Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5552749pxb; Mon, 14 Feb 2022 01:41:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJxUGdjKto5KJ8a6TqypBwjDj7nj5YZYx6v3sf/UVnF2LHL9Foab9sXpMBMwHd7yNP+21V2X X-Received: by 2002:a17:90a:65c7:: with SMTP id i7mr13713185pjs.116.1644831707890; Mon, 14 Feb 2022 01:41:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644831707; cv=none; d=google.com; s=arc-20160816; b=BovIlrFOky66YLewOVEJjcIfzm8DQ8UyN9zEug1bWVjKkNPTB6yphP8dVL/fKJYRbG TqzhZPOvOc07qQJK/O2x9svzkexvI8S1yie3BaPljl+Ub9sS/UqW758VLOxcEWidfbLZ 2QKqcnACr3lDyZPusxKKpiGD1DUgdBvuxi8Qd5Jl3aDPkdHxrkLuqhvhKqBFK5reF1ZY wm0jDadEnJCY5vybzjj5+4u15mgWFgWolLIxtEUYs/dadXSC/FwYnRgGATU9Al3eqtCk TIzPgFP3m6SL6mWbDTbFJiixXPod3SOKz2ZPacTtz99uqCd/t5cdtN9DJ9/wOjDlAqYd RjGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=hopjPdEQ21YAHQe/Ftjkj8oLhBkLX1H2j1JGz1hcasA=; b=H1JQnbBsQ4Y1YORMU0odPdnC0T2WnlhiDkGxcLmkwDxg9PzxIuUA9MVy7BBgmdqP96 8feH9RboOwdZN+boCMkrtFJFEA6I7WW26l9pnrKmzFayvXryihhL1E/r8lKQb6KL5P39 H6CIA/cdvMqJaXqEb2xoV+cHKEaHjkbJk6l89lT5KwWo1E+Q1XLA9QjHa0N59zWvCgKQ FLYfTNy9Q9UTWADAw6LrTGfJ2VDyh3okHIRH2aGIfnMsDdnoqNdTF03Z7rZ4zJhO2h0T CYy6/WBpp7LwuhPPBY2sgQpJ3qVL0b8099aSu+3Hdm8SqyFtm/km7k7PqM6pXmih0kh/ b96A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=TxlqycyE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h190si26622450pgc.874.2022.02.14.01.41.32; Mon, 14 Feb 2022 01:41:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=TxlqycyE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242759AbiBNJRW (ORCPT + 99 others); Mon, 14 Feb 2022 04:17:22 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:34852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242765AbiBNJRU (ORCPT ); Mon, 14 Feb 2022 04:17:20 -0500 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6205606F4 for ; Mon, 14 Feb 2022 01:17:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=hopjPdEQ21YAHQe/Ftjkj8oLhBkLX1H2j1JGz1hcasA=; b=TxlqycyEZ5xy6Mq5bdtvzih7js eQj+5RrowwgFsWO9TkJBTxBHEWXXAGnXar5ukf1B8/o4rTea8kbKhOohVLnBZHownybjdFErsmSvg Cxjcp45iyjzC+r2/nl7m54gLuW2uiFGPm/wbBvih/0TE2kT4OoYEmUq6Q6wop1bn3hFp2lg1/P2g6 MYWrgXpsvFSnFLFvWaZK2GmKu/2hr4U5oSgK8krXNY168K4KplwuR+jG9vxbO36tpp3UI6j7dsD/V UaXpnqeXzVruQCsgzG5O/atMr+cASXvp0QdspLjch05oC4A8jbaYEptLJ1lyjEjtFCW3uRNbnJIFi r8oTPBQA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nJXTf-009sL7-1l; Mon, 14 Feb 2022 09:16:59 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id B0A403002C5; Mon, 14 Feb 2022 10:16:57 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 9235B203C074A; Mon, 14 Feb 2022 10:16:57 +0100 (CET) Date: Mon, 14 Feb 2022 10:16:57 +0100 From: Peter Zijlstra To: Linus Torvalds Cc: Borislav Petkov , Tadeusz Struk , x86-ml , lkml , zhangqiao22@huawei.com, tj@kernel.org, dietmar.eggemann@arm.com Subject: Re: [GIT PULL] sched/urgent for 5.17-rc4 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 14, 2022 at 09:45:22AM +0100, Peter Zijlstra wrote: > On Sun, Feb 13, 2022 at 10:02:22AM -0800, Linus Torvalds wrote: > > On Sun, Feb 13, 2022 at 4:37 AM Borislav Petkov wrote: > > > > > > Tadeusz Struk (1): > > > sched/fair: Fix fault in reweight_entity > > > > I've pulled this, but this really smells bad to me. > > > > If set_load_weight() can see a process that hasn't even had the > > runqueue pointer set yet, then what keeps *others* from the same > > thing? > > Urgh, I think you're right, the moment we enter the pidhash and become > visible we should be complete. That means the previous commit > (4ef0c5c6b5ba) is buggered... Let me try and make sense of all that > cgroup stuff again :-( Zhang, Tadeusz, TJ, how does this look? --- include/linux/sched/task.h | 4 ++-- kernel/fork.c | 9 ++++++++- kernel/sched/core.c | 34 +++++++++++++++++++++------------- 3 files changed, 31 insertions(+), 16 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index b9198a1b3a84..e84e54d1b490 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -54,8 +54,8 @@ extern asmlinkage void schedule_tail(struct task_struct *prev); extern void init_idle(struct task_struct *idle, int cpu); extern int sched_fork(unsigned long clone_flags, struct task_struct *p); -extern void sched_post_fork(struct task_struct *p, - struct kernel_clone_args *kargs); +extern void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs); +extern void sched_post_fork(struct task_struct *p); extern void sched_dead(struct task_struct *p); void __noreturn do_task_dead(void); diff --git a/kernel/fork.c b/kernel/fork.c index d75a528f7b21..05faebafe2b5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2266,6 +2266,13 @@ static __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_put_pidfd; + /* + * Now that the cgroups are pinned, re-clone the parent cgroup and put + * the new task on the correct runqueue. All this *before* the task + * becomes visible. + */ + sched_cgroup_fork(p, args); + /* * From this point on we must avoid any synchronous user-space * communication until we take the tasklist-lock. In particular, we do @@ -2376,7 +2383,7 @@ static __latent_entropy struct task_struct *copy_process( write_unlock_irq(&tasklist_lock); proc_fork_connector(p); - sched_post_fork(p, args); + sched_post_fork(p); cgroup_post_fork(p, args); perf_event_fork(p); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fcf0c180617c..dd97a42b1eee 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1214,9 +1214,8 @@ int tg_nop(struct task_group *tg, void *data) } #endif -static void set_load_weight(struct task_struct *p) +static void set_load_weight(struct task_struct *p, bool update_load) { - bool update_load = !(READ_ONCE(p->__state) & TASK_NEW); int prio = p->static_prio - MAX_RT_PRIO; struct load_weight *load = &p->se.load; @@ -4407,7 +4406,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) p->static_prio = NICE_TO_PRIO(0); p->prio = p->normal_prio = p->static_prio; - set_load_weight(p); + set_load_weight(p, false); /* * We don't need the reset flag anymore after the fork. It has @@ -4425,6 +4424,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) init_entity_runnable_average(&p->se); + #ifdef CONFIG_SCHED_INFO if (likely(sched_info_on())) memset(&p->sched_info, 0, sizeof(p->sched_info)); @@ -4440,18 +4440,23 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) return 0; } -void sched_post_fork(struct task_struct *p, struct kernel_clone_args *kargs) +void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs) { unsigned long flags; -#ifdef CONFIG_CGROUP_SCHED - struct task_group *tg; -#endif + /* + * Because we're not yet on the pid-hash, p->pi_lock isn't strictly + * required yet, but lockdep gets upset if rules are violated. + */ raw_spin_lock_irqsave(&p->pi_lock, flags); #ifdef CONFIG_CGROUP_SCHED - tg = container_of(kargs->cset->subsys[cpu_cgrp_id], - struct task_group, css); - p->sched_task_group = autogroup_task_group(p, tg); + if (1) { + struct task_group *tg; + tg = container_of(kargs->cset->subsys[cpu_cgrp_id], + struct task_group, css); + tg = autogroup_task_group(p, tg); + p->sched_task_group = autogroup_task_group(p, tg); + } #endif rseq_migrate(p); /* @@ -4462,7 +4467,10 @@ void sched_post_fork(struct task_struct *p, struct kernel_clone_args *kargs) if (p->sched_class->task_fork) p->sched_class->task_fork(p); raw_spin_unlock_irqrestore(&p->pi_lock, flags); +} +void sched_post_fork(struct task_struct *p) +{ uclamp_post_fork(p); } @@ -6922,7 +6930,7 @@ void set_user_nice(struct task_struct *p, long nice) put_prev_task(rq, p); p->static_prio = NICE_TO_PRIO(nice); - set_load_weight(p); + set_load_weight(p, true); old_prio = p->prio; p->prio = effective_prio(p); @@ -7213,7 +7221,7 @@ static void __setscheduler_params(struct task_struct *p, */ p->rt_priority = attr->sched_priority; p->normal_prio = normal_prio(p); - set_load_weight(p); + set_load_weight(p, true); } /* @@ -9446,7 +9454,7 @@ void __init sched_init(void) #endif } - set_load_weight(&init_task); + set_load_weight(&init_task, false); /* * The boot idle thread does lazy MMU switching as well: