Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68C6EC6FA99 for ; Tue, 7 Mar 2023 22:17:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229998AbjCGWRm (ORCPT ); Tue, 7 Mar 2023 17:17:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229820AbjCGWRj (ORCPT ); Tue, 7 Mar 2023 17:17:39 -0500 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A275AAA73D for ; Tue, 7 Mar 2023 14:17:37 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id z11so9054286pfh.4 for ; Tue, 07 Mar 2023 14:17:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678227457; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MZPDIXIYKXniSn7Cewj9Ohui8u8k77FZgGbBf+MYE2I=; b=H564rYcr021ZaO8vXlD0OPJNwRxXvsmm/vqUrqRtpwiFQvN40keui/j6Bp3KmULdu0 ivF+sTy4V6jvb6d37Mz5mve8tFjbJKLWl5yMUyEVgpSK1EcV63guC/PFNzt+wwIT798o WqdQoR3xShV1oG19aLBwWl/pXw4vKAwK9izs0QOL6MyM6i2SEA9H6wcRRCDyVzSNu6tI HBPukjF+XFb10s4toX3I8sppLqirPA5S7hSYGqsWOioAHyTdu1xTh7tiFAfi2YImRMms V74ZL12YdBf/AQet8JvGIrBeausWDblgLUn0+XRaTe5HAGxvBBcQqG8kXpTe00SfoBWk 582g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678227457; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MZPDIXIYKXniSn7Cewj9Ohui8u8k77FZgGbBf+MYE2I=; b=SbMvYjLUDgcGhssfmaaaiB4Z4CrCUy6zsGdo2/LMmBKbybCNUk/njW5aX0l8J3xpjC CsC8VcPk0vKBP5YPsDTzSuJHxQT7+E++Wt12kd6g+odbXi5NlX7Hu/dkBsydflsEurIG 7Ip5bF73uTcQ/fxM7cbSxulqEdINjQ0VbxSmVGiZy/xp6rtnixNV3e9SH6hecX5VYJg3 +e1v4NQxH8H295QpsiWi0A/s7tHTXhjluAmxrQ8NIp1tUTCqVGBP91tQsmCvyYSN0Wmt 5mE+oR9E58y3NfsutHs+4mX839707gFW2bovkgeL3It0AkgDBnd4IUkWvmQw2IQn6Ad+ qnyA== X-Gm-Message-State: AO0yUKWo65zAFn/MUSaVaPI3JCGcaXeZt7WAfqzUbSU1MhmWoquBKZU0 PvkAx6toZ2fCCyGZNQcjb4F59jwn3GfCRFEA+Fhlzw== X-Google-Smtp-Source: AK7set8bTAGAzcozR0mHlx2H4rby0W+fDsvyU0VDW8yH6kyQ72u7VVUu/0hrchLg1UOauYaAFQgZM89VmNF8LcPa1WU= X-Received: by 2002:a62:870e:0:b0:5a9:c954:563e with SMTP id i14-20020a62870e000000b005a9c954563emr6678943pfe.6.1678227456770; Tue, 07 Mar 2023 14:17:36 -0800 (PST) MIME-Version: 1.0 References: <20230206221428.2125324-1-qyousef@layalina.io> <1f2cf8ea-a9d7-5245-0f69-eb8be9f64afc@redhat.com> In-Reply-To: <1f2cf8ea-a9d7-5245-0f69-eb8be9f64afc@redhat.com> From: Hao Luo Date: Tue, 7 Mar 2023 14:17:25 -0800 Message-ID: Subject: Re: [PATCH v3] sched: cpuset: Don't rebuild root domains on suspend-resume To: Waiman Long Cc: Qais Yousef , Peter Zijlstra , Ingo Molnar , Juri Lelli , Steven Rostedt , tj@kernel.org, linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, Dietmar Eggemann , cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Zefan Li , linux-s390@vger.kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 7, 2023 at 1:13=E2=80=AFPM Waiman Long wro= te: > > On 3/7/23 16:06, Hao Luo wrote: > > On Tue, Mar 7, 2023 at 12:09=E2=80=AFPM Waiman Long wrote: > >> On 3/7/23 14:56, Hao Luo wrote: > >>> On Mon, Feb 6, 2023 at 2:15=E2=80=AFPM Qais Yousef wrote: > >>>> Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounti= ng information") > >>>> enabled rebuilding root domain on cpuset and hotplug operations to > >>>> correct deadline accounting. > >>>> > >>>> Rebuilding root domain is a slow operation and we see 10+ of ms dela= ys > >>>> on suspend-resume because of that (worst case captures 20ms which > >>>> happens often). > >>>> > >>>> Since nothing is expected to change on suspend-resume operation; ski= p > >>>> rebuilding the root domains to regain the some of the time lost. > >>>> > >>>> Achieve this by refactoring the code to pass whether dl accoutning n= eeds > >>>> an update to rebuild_sched_domains(). And while at it, rename > >>>> rebuild_root_domains() to update_dl_rd_accounting() which I believe = is > >>>> a more representative name since we are not really rebuilding the ro= ot > >>>> domains, but rather updating dl accounting at the root domain. > >>>> > >>>> Some users of rebuild_sched_domains() will skip dl accounting update > >>>> now: > >>>> > >>>> * Update sched domains when relaxing the domain level in c= puset > >>>> which only impacts searching level in load balance > >>>> * update sched domains when cpufreq governor changes and w= e need > >>>> to create the perf domains > >>>> > >>>> Users in arch/x86 and arch/s390 are left with the old behavior. > >>>> > >>>> Debugged-by: Rick Yiu > >>>> Signed-off-by: Qais Yousef (Google) > >>>> --- > >>> Hi Qais, > >>> > >>> Thank you for reporting this. We observed the same issue in our > >>> production environment. Rebuild_root_domains() is also called under > >>> cpuset_write_resmask, which handles writing to cpuset.cpus. Under > >>> production workloads, on a 4.15 kernel, we observed the median latenc= y > >>> of writing cpuset.cpus at 3ms, p99 at 7ms. Now the median becomes > >>> 60ms, p99 at >100ms. Writing cpuset.cpus is a fairly frequent and > >>> critical path in production, but blindly traversing every task in the > >>> system is not scalable. And its cost is really unnecessary for users > >>> who don't use deadline tasks at all. > >> The rebuild_root_domains() function shouldn't be called when updating > >> cpuset.cpus unless it is a partition root. Is it? > >> > > I think it's because we were using the legacy hierarchy. I'm not > > familiar with cpuset partition though. > > In legacy hierarchy, changing cpuset.cpus shouldn't lead to the calling > of rebuild_root_domains() unless you play with cpuset.sched_load_balance > file by changing it to 0 in the right cpusets. If you are touching > cpuset.sched_load_balance, you shouldn't change cpuset.cpus that often. > Actually, I think it's the opposite. If I understand the code correctly[1], it looks like rebuild_root_domains is called when LOAD_BALANCE _is_ set and sched_load_balance is 1 by default. Is that condition a bug? I don't think we updated cpuset.sched_load_balance. [1] https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L1= 677