Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp956588ybt; Tue, 7 Jul 2020 04:31:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyyII3ilakZTQGroXG7qxHpwqtnj0qj7OeBWSeKZuOTc5OxQE/VnfQP17LBdNv3K71Ccb4D X-Received: by 2002:a17:906:1d1b:: with SMTP id n27mr48723724ejh.272.1594121489127; Tue, 07 Jul 2020 04:31:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594121489; cv=none; d=google.com; s=arc-20160816; b=xoW3FxSMsJH7BZR211BqLhCaV6ghbhlQiaZWyCspiyrwpWw8u/U5gO8nMg41O5qlEF hNhwLK6F1y8JP01AsVnNfvQpHLf0M1MSqesYDAkF+jmZOxWPLVUFxcNSPltw8zk9usrR QDgr7/s0gGg7s4wjdBjC7PGbrxxO/FLHpF+S+Se5qFhbPjg/jb783ltlwVVCf5O1iSk+ wbVkAYJNeU98UlKNRbEmKlDwIsLAPu2aZeF3vQfV+5gSsx53ufJXUJbKrOvb/CoFb8Wb SwvjiUbU2s42K8BptFg15o2B9qK8nTtNu4La4wI/7BXdXVkDEpgAGwk+HoTg8SNyVYEv zJiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:in-reply-to :subject:cc:to:from:user-agent:references; bh=QFcUFL1uwf/8yVjwRpfU8SZg9NzfLtw2EVQHr4riLtE=; b=KicPrpxOOIrCy2EZJKFTWM2vQvsFCiVa1xEhiw0Gowpw4eQ20sSpD/PYaM1Ncp24Ao yuzxUeeUaW+Mm7TIz73+wLR2iOkLcOLMI92tnj9xxSN2khVQvH/37B7hNwyL8GwGh8ma u0w55vmpE/PefwY0o+49jb4mJsn1T+nQCu/tmtk1jg+0dvvKSJaHHboXMAmMUOgirpur tDFwglBpwLlYekxCDRnSc3s9jAtqWmc8X0/vCp1ee6/ghwulCT3twkUwqAoL01ZBh9R6 KFIVNmGu+8qmHyUL+oPP1pGNURt60lJntImsd44qrjxraCUI3vDfmt1D8J9w+Z3dJUuE TaSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e18si13156706eji.614.2020.07.07.04.31.05; Tue, 07 Jul 2020 04:31:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727072AbgGGLay (ORCPT + 99 others); Tue, 7 Jul 2020 07:30:54 -0400 Received: from foss.arm.com ([217.140.110.172]:42068 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726540AbgGGLay (ORCPT ); Tue, 7 Jul 2020 07:30:54 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1BF831FB; Tue, 7 Jul 2020 04:30:53 -0700 (PDT) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A8F6D3F71E; Tue, 7 Jul 2020 04:30:50 -0700 (PDT) References: <20200706142839.26629-1-qais.yousef@arm.com> <20200706142839.26629-2-qais.yousef@arm.com> <20200707093447.4t6eqjy4fkt747fo@e107158-lin.cambridge.arm.com> User-agent: mu4e 0.9.17; emacs 26.3 From: Valentin Schneider To: Qais Yousef Cc: Ingo Molnar , Peter Zijlstra , Doug Anderson , Jonathan Corbet , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Luis Chamberlain , Kees Cook , Iurii Zaikin , Quentin Perret , Patrick Bellasi , Pavan Kondeti , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v6 1/2] sched/uclamp: Add a new sysctl to control RT default boost value In-reply-to: <20200707093447.4t6eqjy4fkt747fo@e107158-lin.cambridge.arm.com> Date: Tue, 07 Jul 2020 12:30:48 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/07/20 10:34, Qais Yousef wrote: > On 07/06/20 16:49, Valentin Schneider wrote: >> >> On 06/07/20 15:28, Qais Yousef wrote: >> > CC: linux-fsdevel@vger.kernel.org >> > --- >> > >> > Peter >> > >> > I didn't do the >> > >> > read_lock(&taslist_lock); >> > smp_mb__after_spinlock(); >> > read_unlock(&tasklist_lock); >> > >> > dance you suggested on IRC as it didn't seem necessary. But maybe I missed >> > something. >> > >> >> So the annoying bit with just uclamp_fork() is that it happens *before* the >> task is appended to the tasklist. This means without too much care we >> would have (if we'd do a sync at uclamp_fork()): >> >> CPU0 (sysctl write) CPU1 (concurrent forker) >> >> copy_process() >> uclamp_fork() >> p.uclamp_min = state >> state = foo >> >> for_each_process_thread(p, t) >> update_state(t); >> list_add(p) >> >> i.e. that newly forked process would entirely sidestep the update. Now, >> with Peter's suggested approach we can be in a much better situation. If we >> have this in the sysctl update: >> >> state = foo; >> >> read_lock(&taslist_lock); >> smp_mb__after_spinlock(); >> read_unlock(&tasklist_lock); >> >> for_each_process_thread(p, t) >> update_state(t); >> >> While having this in the fork: >> >> write_lock(&tasklist_lock); >> list_add(p); >> write_unlock(&tasklist_lock); >> >> sched_post_fork(p); // state re-read here; probably wants an mb first >> >> Then we can no longer miss an update. If the forked p doesn't see the new >> value, it *must* have been added to the tasklist before the updater loops >> over it, so the loop will catch it. If it sees the new value, we're done. > > uclamp_fork() has nothing to do with the race. If copy_process() duplicates the > task_struct of an RT task, it'll copy the old value. > Quite so; my point was if we were to use uclamp_fork() as to re-read the value. > I'd expect the newly introduced sched_post_fork() (also in copy_process() after > the list update) to prevent this race altogether. > > Now we could end up with a problem if for_each_process_thread() doesn't see the > newly forked task _after_ sched_post_fork(). Hence my question to Peter. > >> >> AIUI, the above strategy doesn't require any use of RCU. The update_state() >> and sched_post_fork() can race, but as per the above they should both be >> writing the same value. > > for_each_process_thread() must be protected by either tasklist_lock or > rcu_read_lock(). > Right > The other RCU logic I added is not to protect against the race above. I > describe the other race condition in a comment. I take it that's the one in uclamp_sync_util_min_rt_default()? __setscheduler_uclamp() can't be preempted as we hold task_rq_lock(). It can indeed race with the sync though, but again with the above suggested setup it would either: - see the old value, but be guaranteed to be iterated over later by the updater - see the new value sched_post_fork() being preempted out is a bit more annoying, but what prevents us from making that bit preempt-disabled? I have to point out I'm assuming here updaters are serialized, which does seem to be see the case (cf. uclamp_mutex). > Basically another updater on a > different cpu via fork() and sched_setattr() might read an old value and get > preempted. The rcu synchronization will ensure concurrent updaters have > finished before iterating the list. > > Thanks