Received: by 2002:a05:622a:1442:b0:3a5:28ea:c4b9 with SMTP id v2csp825066qtx; Mon, 31 Oct 2022 15:02:16 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6UYrKkNhRl3SiOOK1WMYDwrVuxpsnqd7WJY9Fm7yieFs+6vrjDhUg2teytFAPZ19jn1QwO X-Received: by 2002:a05:6402:1c0a:b0:463:3cda:3750 with SMTP id ck10-20020a0564021c0a00b004633cda3750mr10017813edb.341.1667253736211; Mon, 31 Oct 2022 15:02:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667253736; cv=none; d=google.com; s=arc-20160816; b=EsmQ2DzMgnmb4xz5zpG4TLOojWc4zAibdg0e2qFxQfeyGgIOlIKWjeFauGT92EYzGM 0mfKc9lqsbzluT6NtSOwQqInttGwKoT5Wap7XEi1LXTv4grBObr3+QCQp7QZHtN/VTYF iqptGTxtu+GE3QX4/HGFLE/nn60KwRtNgBlWnB1nj4wUCZJTRMs+cIM+sWn3uVui4mMJ m2jMq0nQx9XOs93R0JOTvsm5HUMzAf1WqfQxy+LMr8PmgdJ5qlXtV+4HpW2+zxTxKZ1o S/5jtHmsvTj9M8Dwmu5sAjL1743mZ0/tG0g4StvFOBwSUYA4bQwTpCAHOKNht3mEKWKT LzsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=uvYZNQDziAHvtHngpIip5+ezNIS5JG1pvAy9dd4TxpE=; b=j9vxzYltWeOGyUvE3S0lo+98xfDsdArKCe6GUxcpw38WBR7zHUWy+LqxOYO8ePRE98 YQclxr6wO3c7qvkJpvob2X90RKuXcU5aj3JthPaKKgwTBP3BocHTe9oKFVzPSLtDid0L Ic2+Of9QVfoT86nFeETE3yLqAFeSqWwmOekSZkcbIUVk4IKq4r7H4ClxkIV7z09oaJqm 1Zblo74qBYMfRkNChj2gAM3EMAGtLnLiIU9v3oXfpWc+FWMUBq6L8TZPMcc2MtJ/i7nj 65y7vdJeB3Ao5LTUc3lwf1xiIqqRmECevhJ8Spa54SZd5IEv3FRAwAA+3xy7grtz+7x9 VUEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DfJlDLVL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qb26-20020a1709077e9a00b0078e093ae419si8917776ejc.8.2022.10.31.15.01.49; Mon, 31 Oct 2022 15:02:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DfJlDLVL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229919AbiJaV4U (ORCPT + 99 others); Mon, 31 Oct 2022 17:56:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229477AbiJaV4S (ORCPT ); Mon, 31 Oct 2022 17:56:18 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36D4E140CA for ; Mon, 31 Oct 2022 14:56:18 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id l22-20020a17090a3f1600b00212fbbcfb78so17129029pjc.3 for ; Mon, 31 Oct 2022 14:56:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=uvYZNQDziAHvtHngpIip5+ezNIS5JG1pvAy9dd4TxpE=; b=DfJlDLVLgorokxki2nr8OZDvus4mEGdtGrLBfEMqr8bKzjbkxiC/oPs5WocqyL7oNY 9zSjcbNxoKpQzJDJnJ8F98odXZg7OKKk4ZSIVhpGR7WxAZt3Q110UOLmgtsPIWJx1xUh yuD9pZIqpwkTHfm4XQxf9Q01/RQ6hOMPsxQBhIcDAikCJQQZq0d3bw+41mgDpN9AHQQX lnomkO+ZOSvJwzqxt+JcJIfNMgUhIkSPuRy0+vZlpgEods02PuT/kfPYrFqaJUTH8A6+ 0q1CfYK52FGE2xXuI18qtH/t5hXFm1mpY0o0FUpAhFrGpqbT4rAzPiCJGQozMwSGnh3j IaeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=uvYZNQDziAHvtHngpIip5+ezNIS5JG1pvAy9dd4TxpE=; b=w+sCh+k6TT++UWDZ5VuZ7YiAPA2E3gWj78GFMhhro+KO5+NXqu8b84a2tKz83BgkwV RYboxHQJDosPQuEUUdk+aBwHIyLTz0rFqC0P3xG/jOaZoILhwyTsxL/36D54y5xUUNc2 dDApCiNqVTYUvTNP1ie7Ko1+wmM4x/hpeHISZTV/3KuHoKP8u2QkchPbsDDwwN2Nmm3z RDXEzCv69F7m2aN6z13XnTukwUwOBWGntFxOKv00V39KpygZcf5l0fITeLi967/SewrZ iMD5o2T5VTvYe3ZRLTafSrSXUQq674NIRcUqJyluXr2MzMWFFglKw/+T3EubZ7JRWFv7 z15g== X-Gm-Message-State: ACrzQf3ZuivXma5M8weWE1hkaTvv0vnFRxY/BFENZRnJOkSXjYhkVw0+ Je7dm2WDlJ+kojGgNrz0C+EwOQ== X-Received: by 2002:a17:90a:680d:b0:213:d200:e992 with SMTP id p13-20020a17090a680d00b00213d200e992mr9893924pjj.220.1667253377626; Mon, 31 Oct 2022 14:56:17 -0700 (PDT) Received: from bsegall-glaptop.localhost (c-67-188-112-16.hsd1.ca.comcast.net. [67.188.112.16]) by smtp.gmail.com with ESMTPSA id b9-20020aa78ec9000000b0056abfa74eddsm5235937pfr.147.2022.10.31.14.56.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Oct 2022 14:56:16 -0700 (PDT) From: Benjamin Segall To: Peter Zijlstra Cc: Josh Don , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Tejun Heo Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth References: <20221026224449.214839-1-joshdon@google.com> Date: Mon, 31 Oct 2022 14:56:13 -0700 In-Reply-To: (Peter Zijlstra's message of "Mon, 31 Oct 2022 14:04:15 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter Zijlstra writes: > On Wed, Oct 26, 2022 at 03:44:49PM -0700, Josh Don wrote: >> CFS bandwidth currently distributes new runtime and unthrottles cfs_rq's >> inline in an hrtimer callback. Runtime distribution is a per-cpu >> operation, and unthrottling is a per-cgroup operation, since a tg walk >> is required. On machines with a large number of cpus and large cgroup >> hierarchies, this cpus*cgroups work can be too much to do in a single >> hrtimer callback: since IRQ are disabled, hard lockups may easily occur. >> Specifically, we've found this scalability issue on configurations with >> 256 cpus, O(1000) cgroups in the hierarchy being throttled, and high >> memory bandwidth usage. >> >> To fix this, we can instead unthrottle cfs_rq's asynchronously via a >> CSD. Each cpu is responsible for unthrottling itself, thus sharding the >> total work more fairly across the system, and avoiding hard lockups. > > So, TJ has been complaining about us throttling in kernel-space, causing > grief when we also happen to hold a mutex or some other resource and has > been prodding us to only throttle at the return-to-user boundary. > > Would this be an opportune moment to do this? That is, what if we > replace this CSD with a task_work that's ran on the return-to-user path > instead? This is unthrottle, not throttle, but it would probably be straightfoward enough to do what you said for throttle. I'd expect this to not help all that much though, because throttle hits the entire cfs_rq, not individual threads. I'm currently trying something more invasive, which doesn't throttle a cfs_rq while it has any kernel tasks, and prioritizes kernel tasks / ses containing kernel tasks when a cfs_rq "should" be throttled. "Invasive" is a key word though, as it needs to do the sort of h_nr_kernel_tasks tracking on put_prev/set_next in ways we currently only need to do on enqueue/dequeue.