Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1001007ybt; Tue, 7 Jul 2020 05:37:51 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyCZE+HB4sTgWwjvupvY8dHXptKiJ65clUVMGT+h8Q2ygRC/8TTOMQ64el3vk9WNJp74t/t X-Received: by 2002:a17:906:fa9b:: with SMTP id lt27mr44982961ejb.365.1594125471104; Tue, 07 Jul 2020 05:37:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594125471; cv=none; d=google.com; s=arc-20160816; b=qRtUPQ/p3BgbIvmyzWt85Zd9lKa8WwaXbd6/cfu6A22YHul0GiupgtIL3Q5vJ+HQkF LzYvXDMu1sBiQ0sNfAXobpE8dTxiAmnrg1XwvcossKBAsndkx5V6ME+bMQefqMciP76f LyDBV3x6u+PXUmoWfGT508EtLX4lJArgmcOkhhZGNXI31FVdnoPKwTmdCkDK/nI3/Wtr D20kTRW7OFxO4lfddBLQNhKXFZR15FnMQ92uUv7CTaL4lX/MlXKuoKWfREywE1BHnw6T pODEvBmKx3PoRVVSx46BEjIXJ9hjOo0Bp2OYT4/K66fPM8DBZ3Vi3zZLT9IS8L7jegY+ My8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=u+khM31y09STkffUfu0Qab8SEyDF/yQX4vVgUS557WA=; b=Id/HDaOiU4Q11kPSm+A31tA5en/4nXSwO3LFWiGqX1yRnIS/Pg6TFhfS7+nRW+Nvro kfuU/Ts5DqC9ZEyansdFn2CdaPzY412OxN2eg6SManijgvCHFKKi0I9i1Ci1d4twO0TZ y3QAg9KpFrVxzX2HzDMjGwIy1uQHOfYhH5Kvk1QTG1DYjSJG97fwy4j4n5EUV0sNo0OM PGoQ/8X/xYXodnb7beBFqcjDolkLQDWj4e0hnDpT/8Z7Jn84QyQsNAllDuPOjWcYHDHi pq0ijGeRQrgFQYrTbxswHx52yvke+nE/iGCIGOQp3vw0A/yd/frhQx2v+2nq4WVwpOP0 UpxQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v14si15642974edq.527.2020.07.07.05.37.27; Tue, 07 Jul 2020 05:37:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727951AbgGGMgq (ORCPT + 99 others); Tue, 7 Jul 2020 08:36:46 -0400 Received: from foss.arm.com ([217.140.110.172]:46300 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725944AbgGGMgq (ORCPT ); Tue, 7 Jul 2020 08:36:46 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7763D1FB; Tue, 7 Jul 2020 05:36:45 -0700 (PDT) Received: from e107158-lin.cambridge.arm.com (e107158-lin.cambridge.arm.com [10.1.195.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 11C433F71E; Tue, 7 Jul 2020 05:36:42 -0700 (PDT) Date: Tue, 7 Jul 2020 13:36:40 +0100 From: Qais Yousef To: Valentin Schneider Cc: Ingo Molnar , Peter Zijlstra , Doug Anderson , Jonathan Corbet , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Luis Chamberlain , Kees Cook , Iurii Zaikin , Quentin Perret , Patrick Bellasi , Pavan Kondeti , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Message-ID: <20200707123640.lahojmq2s4byhkhl@e107158-lin.cambridge.arm.com> References: <20200706142839.26629-1-qais.yousef@arm.com> <20200706142839.26629-2-qais.yousef@arm.com> <20200707093447.4t6eqjy4fkt747fo@e107158-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20171215 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/07/20 12:30, Valentin Schneider wrote: > > On 07/07/20 10:34, Qais Yousef wrote: > > On 07/06/20 16:49, Valentin Schneider wrote: > >> > >> On 06/07/20 15:28, Qais Yousef wrote: > >> > CC: linux-fsdevel@vger.kernel.org > >> > --- > >> > > >> > Peter > >> > > >> > I didn't do the > >> > > >> > read_lock(&taslist_lock); > >> > smp_mb__after_spinlock(); > >> > read_unlock(&tasklist_lock); > >> > > >> > dance you suggested on IRC as it didn't seem necessary. But maybe I missed > >> > something. > >> > > >> > >> So the annoying bit with just uclamp_fork() is that it happens *before* the > >> task is appended to the tasklist. This means without too much care we > >> would have (if we'd do a sync at uclamp_fork()): > >> > >> CPU0 (sysctl write) CPU1 (concurrent forker) > >> > >> copy_process() > >> uclamp_fork() > >> p.uclamp_min = state > >> state = foo > >> > >> for_each_process_thread(p, t) > >> update_state(t); > >> list_add(p) > >> > >> i.e. that newly forked process would entirely sidestep the update. Now, > >> with Peter's suggested approach we can be in a much better situation. If we > >> have this in the sysctl update: > >> > >> state = foo; > >> > >> read_lock(&taslist_lock); > >> smp_mb__after_spinlock(); > >> read_unlock(&tasklist_lock); > >> > >> for_each_process_thread(p, t) > >> update_state(t); > >> > >> While having this in the fork: > >> > >> write_lock(&tasklist_lock); > >> list_add(p); > >> write_unlock(&tasklist_lock); > >> > >> sched_post_fork(p); // state re-read here; probably wants an mb first > >> > >> Then we can no longer miss an update. If the forked p doesn't see the new > >> value, it *must* have been added to the tasklist before the updater loops > >> over it, so the loop will catch it. If it sees the new value, we're done. > > > > uclamp_fork() has nothing to do with the race. If copy_process() duplicates the > > task_struct of an RT task, it'll copy the old value. > > > > Quite so; my point was if we were to use uclamp_fork() as to re-read the value. > > > I'd expect the newly introduced sched_post_fork() (also in copy_process() after > > the list update) to prevent this race altogether. > > > > Now we could end up with a problem if for_each_process_thread() doesn't see the > > newly forked task _after_ sched_post_fork(). Hence my question to Peter. > > > > > >> > >> AIUI, the above strategy doesn't require any use of RCU. The update_state() > >> and sched_post_fork() can race, but as per the above they should both be > >> writing the same value. > > > > for_each_process_thread() must be protected by either tasklist_lock or > > rcu_read_lock(). > > > > Right > > > The other RCU logic I added is not to protect against the race above. I > > describe the other race condition in a comment. > > I take it that's the one in uclamp_sync_util_min_rt_default()? Correct. > > __setscheduler_uclamp() can't be preempted as we hold task_rq_lock(). It > can indeed race with the sync though, but again with the above suggested > setup it would either: > - see the old value, but be guaranteed to be iterated over later by the > updater > - see the new value AFAIU rcu_read_lock() is light weight. So having the protection applied is more robust against future changes. > > sched_post_fork() being preempted out is a bit more annoying, but what > prevents us from making that bit preempt-disabled? preempt_disable() is not friendly to RT and heavy handed approach IMO. > > I have to point out I'm assuming here updaters are serialized, which does > seem to be see the case (cf. uclamp_mutex). Correct. Thanks -- Qais Yousef