Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 163C1C7EE23 for ; Wed, 1 Mar 2023 07:32:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbjCAHcM (ORCPT ); Wed, 1 Mar 2023 02:32:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229590AbjCAHcK (ORCPT ); Wed, 1 Mar 2023 02:32:10 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70F6A39B85 for ; Tue, 28 Feb 2023 23:31:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677655885; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=huiTwHgDQStD+M4CgUQLO4CEet/PJO5tAO4wYQhZ83U=; b=Q9pVsrebw1qz7n5FCGfsCAmjYfNIaPeFIt+G0RBXq2TtA1530e3pwz9KgsHVqSeMFMe6Mj YHiaJeU1WKk/nHhcVu+hmwzsLLfupPFZXcPx3236GmsQd08gwnp24yBUYw4RYZyC4K3oiz 140VUGbboXHGQ8J/5b+cqZ/sJTGr2CU= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-594-tnSnQ9V-PbiNTmjuBdTQhQ-1; Wed, 01 Mar 2023 02:31:24 -0500 X-MC-Unique: tnSnQ9V-PbiNTmjuBdTQhQ-1 Received: by mail-wm1-f71.google.com with SMTP id u19-20020a05600c00d300b003eb24ce58ceso4283357wmm.9 for ; Tue, 28 Feb 2023 23:31:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=huiTwHgDQStD+M4CgUQLO4CEet/PJO5tAO4wYQhZ83U=; b=KLjJoqLCn6LQaIGcwgQq3YMuJPVpGq1aGsRznQt/WJ0vrv7ox1FCyF9rgO8g08k+Sc 5mzaAYZaOMIfvcOl4aeYvsXHrjDqK7DMz4jfq5A6G6Dsb7UcrIlALg2+KnCaOx/uOrC1 YwM9YT6tsCFjjhwocC45+9Q3SARJy6JVmDmW9otpKFa6P2eMwZmSyitgASLvq3rmAbhD MdSnXv3EowxF2Yj6DRiur+KhkgbpCQ4dCbS4uIE597r/8/CWigvJkj25VosiJyjFoNnf wCfalp/1uRjKP13rdu2v+gxaVITEqK2cPNFXf+yAassXTxZwBwbhClayYsV20nsKlmpb i5lg== X-Gm-Message-State: AO0yUKW3d+bxADpvm8OA2aSYP3QkQU9m4Fr3c5DavsyP/NkQko47zzf0 Yaoy5zG28nKt/S0kPQhMdsLEVx7brAH9cpUSleP75n6t+ezQHQ/cQLaC0MAc4aAFcKlsLaKArBR LSTELYOfNqLCRcqtmmFO9Q6xY X-Received: by 2002:adf:f603:0:b0:2c5:4db8:3dde with SMTP id t3-20020adff603000000b002c54db83ddemr3571356wrp.70.1677655883405; Tue, 28 Feb 2023 23:31:23 -0800 (PST) X-Google-Smtp-Source: AK7set/W8Me3oWoLep3XRAVhyE9lkl0eIQed4Xsa6ilM5XT8imnytf9cj4NExqyL25EBmHcU6xRzWg== X-Received: by 2002:adf:f603:0:b0:2c5:4db8:3dde with SMTP id t3-20020adff603000000b002c54db83ddemr3571324wrp.70.1677655883050; Tue, 28 Feb 2023 23:31:23 -0800 (PST) Received: from localhost.localdomain ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id fj10-20020a05600c0c8a00b003e7c89b3514sm18812818wmb.23.2023.02.28.23.31.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Feb 2023 23:31:22 -0800 (PST) Date: Wed, 1 Mar 2023 08:31:20 +0100 From: Juri Lelli To: Qais Yousef Cc: Dietmar Eggemann , Peter Zijlstra , Ingo Molnar , Waiman Long , Steven Rostedt , tj@kernel.org, linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Zefan Li , linux-s390@vger.kernel.org, x86@kernel.org Subject: Re: [PATCH v3] sched: cpuset: Don't rebuild root domains on suspend-resume Message-ID: References: <20230206221428.2125324-1-qyousef@layalina.io> <20230223153859.37tqoqk33oc6tv7o@airbuntu> <5f087dd8-3e39-ce83-fe24-afa5179c05d9@arm.com> <20230227205725.dipvh3i7dvyrv4tv@airbuntu> <5a1e58bf-7eb2-bd7a-7e19-7864428a2b83@arm.com> <20230228174627.vja5aejq27dsta2u@airbuntu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230228174627.vja5aejq27dsta2u@airbuntu> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 28/02/23 17:46, Qais Yousef wrote: > On 02/28/23 15:09, Dietmar Eggemann wrote: > > > > IIUC you're suggesting to introduce some new mechanism to detect if hotplug has > > > lead to a cpu to disappear or not and use that instead? Are you saying I can > > > use arch_update_cpu_topology() for that? Something like this? > > > > > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > > > index e5ddc8e11e5d..60c3dcf06f0d 100644 > > > --- a/kernel/cgroup/cpuset.c > > > +++ b/kernel/cgroup/cpuset.c > > > @@ -1122,7 +1122,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[], > > > { > > > mutex_lock(&sched_domains_mutex); > > > partition_sched_domains_locked(ndoms_new, doms_new, dattr_new); > > > - if (update_dl_accounting) > > > + if (arch_update_cpu_topology()) > > > update_dl_rd_accounting(); > > > mutex_unlock(&sched_domains_mutex); > > > } > > > > No, this is not what I meant. I'm just saying the: > > > > partition_sched_domains_locked() > > new_topology = arch_update_cpu_topology(); > > > > has to be considered here as well since we do a > > `dl_clear_root_domain(rd)` (1) in partition_sched_domains_locked() for > > !new_topology. > > Ah you're referring to the dl_clear_root_domain() call there. I thought this > doesn't trigger. > > > > > And (1) requires the `update_tasks_root_domain()` to happen later. > > > > So there are cases now, e.g. `rebuild_sched_domains_energy()` in which > > `new_topology=0` and `update_dl_accounting=false` which now clean the rd > > but don't do a new DL accounting anymore. > > rebuild_root_domains() itself cleans the `default root domain`, not the > > other root domains which could exists as well. > > > > Example: Switching CPUfreq policy [0,3-5] performance to schedutil (slow > > switching, i.e. we have sugov:X DL task(s)): > > > > [ 862.479906] CPU4 partition_sched_domains_locked() new_topology=0 > > [ 862.499073] Workqueue: events rebuild_sd_workfn > > [ 862.503646] Call trace: > > ... > > [ 862.520789] partition_sched_domains_locked+0x6c/0x670 > > [ 862.525962] rebuild_sched_domains_locked+0x204/0x8a0 > > [ 862.531050] rebuild_sched_domains+0x2c/0x50 > > [ 862.535351] rebuild_sd_workfn+0x38/0x54 <-- ! > > ... > > [ 862.554047] CPU4 dl_clear_root_domain() rd->span=0-5 total_bw=0 > > def_root_domain=0 <-- ! > > [ 862.561597] CPU4 dl_clear_root_domain() rd->span= total_bw=0 > > def_root_domain=1 > > [ 862.568960] CPU4 dl_add_task_root_domain() [sugov:0 1801] > > total_bw=104857 def_root_domain=0 rd=0xffff0008015f0000 <-- ! > > > > The dl_clear_root_domain() of the def_root_domain and the > > dl_add_task_root_domain() to the rd in use won't happen. > > > > [sugov:0 1801] is only a simple example here. I could have spawned a > > couple of DL tasks before this to illustrate the issue more obvious. > > > > --- > > > > The same seems to happen during suspend/resume (system with 2 frequency > > domains, both with slow switching schedutil CPUfreq gov): > > > > [ 27.735821] CPU5 partition_sched_domains_locked() new_topology=0 > > ... > > [ 27.735864] Workqueue: events cpuset_hotplug_workfn > > [ 27.735894] Call trace: > > ... > > [ 27.735984] partition_sched_domains_locked+0x6c/0x670 > > [ 27.736004] rebuild_sched_domains_locked+0x204/0x8a0 > > [ 27.736026] cpuset_hotplug_workfn+0x254/0x52c <-- ! > > ... > > [ 27.736155] CPU5 dl_clear_root_domain() rd->span=0-5 total_bw=0 > > def_root_domain=0 <-- ! > > [ 27.736178] CPU5 dl_clear_root_domain() rd->span= total_bw=0 > > def_root_domain=1 > > [ 27.736296] CPU5 dl_add_task_root_domain() [sugov:0 80] <-- ! > > total_bw=104857 def_root_domain=0 rd=0xffff000801728000 > > [ 27.736318] CPU5 dl_add_task_root_domain() [sugov:1 81] > > total_bw=209714 def_root_domain=0 rd=0xffff000801728000 <-- ! > > ... > > > > > I am not keen on this. arm64 seems to just read a value without a side effect. > > > > Arm64 (among others) sets `update_topology=1` before > > `rebuild_sched_domains()` and `update_topology=0` after it in > > update_topology_flags_workfn(). This then makes `new_topology=1` in > > partition_sched_domains_locked(). > > > > > But x86 does reset this value so we can't read it twice in the same call tree > > > and I'll have to extract it. > > > > > > The better solution that was discussed before is to not iterate through every > > > task in the system and let cpuset track when dl tasks are added to it and do > > > smarter iteration. ATM even if there are no dl tasks in the system we'll > > > blindly go through every task in the hierarchy to update nothing. > > > > Yes, I can see the problem. And IMHO this solution approach seems to be > > better than parsing update_dl_accounting` through the stack of involved > > functions. > > The best I can do is protect this dl_clear_root_domain() too. I really don't > have my heart in this but trying my best to help, but it has taken a lot of my > time already and would prefer to hand over to Juri to address this regression > if what I am proposing is not good enough. > > FWIW, there are 0 dl tasks in the system where this was noticed. And this delay > is unbounded because it'll depend on how many tasks there are in the hierarchy. Not ignoring you guys here, but it turns out I'm quite bogged down with other stuff at the moment. :/ So, apologies and I'll try to get to this asap. Thanks a lot for all your efforts and time reviewing so far! Best, Juri