Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 473F7C678D5 for ; Tue, 7 Mar 2023 20:00:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231298AbjCGUAd (ORCPT ); Tue, 7 Mar 2023 15:00:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230404AbjCGUAC (ORCPT ); Tue, 7 Mar 2023 15:00:02 -0500 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85993A4017 for ; Tue, 7 Mar 2023 11:57:04 -0800 (PST) Received: by mail-pf1-x42f.google.com with SMTP id fd25so8844192pfb.1 for ; Tue, 07 Mar 2023 11:57:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678219021; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=z6rHvMXIA4AIeoeeLB3IEmq2VgMRhSF8AcuEoI2eaFM=; b=SmNn57DznzxCb2rHg2w5G6Wwu/VPVRbuMuzsj3JJwTLtowa/di0sdFh1lEY9hWkppW LMr00m0niYjTp/NvBBkbJDOAlVIDAqss+BlUgBmBNzR2q1y65S4H6CYNtQsAbPkz3JEs 4IyhoCYe3flWXhSppj+HxxmjRzkmPhS9KlWoBpuvpS8R+xqxWAFTHoGmluamtIyFgD5q hOz4F3ItjvnWDQKUPQJYQp3+ENL+/lNa721um/fr07NWJQXHgmW73D8fKogveNrll7P6 FKYjXIuoyZEh7p9ba8nDIqqXWuyRAwK0JEejeJFx/4toCtxF0SeyxXSLwLaxNxb0TSgX Cg9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678219021; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z6rHvMXIA4AIeoeeLB3IEmq2VgMRhSF8AcuEoI2eaFM=; b=ww+IYA9K2yBXsYx6ZO7gJfCG84dyBhAsjspV5g/hQ9MIbe7bzBTrz2J8jDJ0cUtyhy RDEc6UuznomXj7NNfG+El3SdjyT8oqOfFo7vG3KxSaYHLsZbsr24uuRtAudz2wYc57on iYH7AM41TolqjiIUXFbiPXly7H3dycPT0o59SfjpHaODoCw0bAmTwlX2q6rb43jhabFo hHawgs5jTjQv1/qu2/AsxCWCmHm2BJOBF/NmsXvSvOYr07zhZKk6E+nVjlw7+bRWgq0E TXNkchwQc1I0zB3o1URQSGqlQ8Am8RTnMHSG/Xhn87cpuWbLaOoITmMWxfT8DvJ7meSJ x0rg== X-Gm-Message-State: AO0yUKVi+4dCva64cbYKFoI70UZj0HwmZ9JG8aKLJDAAuTVLKpWAfs/X mbGQVOPFkeBtRR69B8+QB8Bmy43Qjkl0kLB0mlqubA== X-Google-Smtp-Source: AK7set9asmjLkxmVz+rrzfq5No/evYVmU8HTu/DQCTbPNSDJM0gnFub0whJn1YEHSWfr8/3Hp1h0mXIrbuC/QXV8LGE= X-Received: by 2002:a62:f801:0:b0:5eb:e0e0:d650 with SMTP id d1-20020a62f801000000b005ebe0e0d650mr6505410pfh.6.1678219020640; Tue, 07 Mar 2023 11:57:00 -0800 (PST) MIME-Version: 1.0 References: <20230206221428.2125324-1-qyousef@layalina.io> In-Reply-To: <20230206221428.2125324-1-qyousef@layalina.io> From: Hao Luo Date: Tue, 7 Mar 2023 11:56:49 -0800 Message-ID: Subject: Re: [PATCH v3] sched: cpuset: Don't rebuild root domains on suspend-resume To: Qais Yousef Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Waiman Long , Steven Rostedt , tj@kernel.org, linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, Dietmar Eggemann , cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Zefan Li , linux-s390@vger.kernel.org, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 6, 2023 at 2:15=E2=80=AFPM Qais Yousef wr= ote: > > Commit f9a25f776d78 ("cpusets: Rebuild root domain deadline accounting in= formation") > enabled rebuilding root domain on cpuset and hotplug operations to > correct deadline accounting. > > Rebuilding root domain is a slow operation and we see 10+ of ms delays > on suspend-resume because of that (worst case captures 20ms which > happens often). > > Since nothing is expected to change on suspend-resume operation; skip > rebuilding the root domains to regain the some of the time lost. > > Achieve this by refactoring the code to pass whether dl accoutning needs > an update to rebuild_sched_domains(). And while at it, rename > rebuild_root_domains() to update_dl_rd_accounting() which I believe is > a more representative name since we are not really rebuilding the root > domains, but rather updating dl accounting at the root domain. > > Some users of rebuild_sched_domains() will skip dl accounting update > now: > > * Update sched domains when relaxing the domain level in cpuset > which only impacts searching level in load balance > * update sched domains when cpufreq governor changes and we need > to create the perf domains > > Users in arch/x86 and arch/s390 are left with the old behavior. > > Debugged-by: Rick Yiu > Signed-off-by: Qais Yousef (Google) > --- Hi Qais, Thank you for reporting this. We observed the same issue in our production environment. Rebuild_root_domains() is also called under cpuset_write_resmask, which handles writing to cpuset.cpus. Under production workloads, on a 4.15 kernel, we observed the median latency of writing cpuset.cpus at 3ms, p99 at 7ms. Now the median becomes 60ms, p99 at >100ms. Writing cpuset.cpus is a fairly frequent and critical path in production, but blindly traversing every task in the system is not scalable. And its cost is really unnecessary for users who don't use deadline tasks at all. > > The better solution that was discussed before is to not iterate through e= very > task in the system and let cpuset track when dl tasks are added to it and= do > smarter iteration. ATM even if there are no dl tasks in the system we'll > blindly go through every task in the hierarchy to update nothing. > Great idea. This works. I tried this solution. With that, 98% of cpuset.cpu writes are now within 7ms. Though there are still long tails, they are caused by another issue, irrelevant to rebuild_root_domains(). Your suggestion of update_dl_rd_accounting() also makes sense to me. Hao