Received: by 2002:a05:622a:1442:b0:3a5:28ea:c4b9 with SMTP id v2csp822917qtx; Mon, 31 Oct 2022 14:59:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5A1TNn8P+rnV93VLjMQlmh6y3yDmisJy3mpK9Ru0MUf0Zg1M27UVxckDfAQ2A2ihDgwA/j X-Received: by 2002:a05:6402:f2a:b0:461:eff7:bae8 with SMTP id i42-20020a0564020f2a00b00461eff7bae8mr15848239eda.322.1667253543135; Mon, 31 Oct 2022 14:59:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667253543; cv=none; d=google.com; s=arc-20160816; b=zBCpgrYq1fqc8vEu+dNQVYzVjnm/hdpLwgT9MQAK1g7nTq5LRwJL771lLwJCyBeU7M w1NTIJBFRmCsUhUqf4glXuc7y/WClqNEJxqMDCExePBrMAVGGpDeEHKkvVf5oEly8yQG T8oujyLz+eaLAMJYfOS5HPMmTuIPN4QAzK58ZckzP+jIIvPmsufh0GBf5HgRrf80z7A9 FSKb6YqUAiPqLbrkohd0fXVZ2an4r2b/GgitBVywbuc7JCl5DXNuiYx6eZx+ch45Cnua PDoENaZrsanbTnyn6ZMy0NfHcg4LReDlB8SmZ2k+Zj7iNmhBEFTyZzMLDOMF6aVMgPGR mYsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=DFK676x3FEk6RbPboSTijjACbO8bnzzHJh+h7Usxze4=; b=uaUDcq8luH5xLVrFLN2xnXTm330JOHjM/OfwxWexbah3md7dgHktR5u2LtRyHG8d4q NjNX8U0DjDkFdtVn9Ub3isFXMB6kQQweymRCdP1dbbUzq1Bt4vuhghAbMUD4z5L/Izt9 /Ga2BHYv97E35mReqPYro4ytAaknKBOB2KOPK1tqhfUEosdufPxc87Y7/2HFDUuzXROo GJkRUI01INOJdTDXpuVOOu8r/8t4LTipCnysudufDsqY77z7SFy8qDcnVt2ICR+Mtjve rqzibAVUvazyHVXvD91o+a3QqNYA1qcgqoi1jJWkwzDAWvEHeYI8zaBJB940EH2T4AdD Y/XQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rIVWeYtE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eb14-20020a0564020d0e00b0045d25cf222csi10015600edb.362.2022.10.31.14.58.39; Mon, 31 Oct 2022 14:59:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rIVWeYtE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229955AbiJaVW6 (ORCPT + 99 others); Mon, 31 Oct 2022 17:22:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229757AbiJaVW5 (ORCPT ); Mon, 31 Oct 2022 17:22:57 -0400 Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DADF81163 for ; Mon, 31 Oct 2022 14:22:55 -0700 (PDT) Received: by mail-lj1-x22a.google.com with SMTP id d3so18360743ljl.1 for ; Mon, 31 Oct 2022 14:22:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DFK676x3FEk6RbPboSTijjACbO8bnzzHJh+h7Usxze4=; b=rIVWeYtEcrMCgiUEgkdpn86mpWaJ1JVEaBwh5Zhvs+WQIiqoVYjYsYKE5Cq1vKGbHa nMrTlJxn0ZiKau/FxUWo3QNTYjktFHPW1NYG20d6TCdoz4cUHjiCe0kgaCxjqdYHQfF5 KfmNb6OWOnPyU1vRsYjsoO5Tf/9/PsXeY4/g6aUOgMxmZEm4pQ+xXOmu20YHf3FD2RZj WbaMx2JbbuuzELJhuHLieIZx8HYHPdEtLit+d6s4iVLOryZFW2T5cM34jNIywyH86iuO rqn1dFfNXUoLpLbfb4Yw3amLw1hfYvCqLVfGmmioHLeoRbrLKrkNdqfYT1cs2a0EyzrA rfjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DFK676x3FEk6RbPboSTijjACbO8bnzzHJh+h7Usxze4=; b=ufa5VItwR5jJnD125ewRRKSIax8Ngw9Ei2POGYVpHzzGSf8LBTShfulgCU0NX80QWO 3YBMy6WnGKn2JqIJ8UAQEAPcHaCdoHsxCdXrsZiBsvvFiKFIOnBvRXwi8nsjvXnvgxdG NJ1sDXYp7Jzx19xoNS9R0nUVHNyIdYD2nlZLkE9QFX+8XA5LZkdyaraNC02Q4/iALOwN r2DACf2pbCwiNhtaHtUKqD4lZfvh4+9CWpwgkQgU5ud/ADUkdMM3FyeraoCAj6nwy3lV vpM+KPfII7b8uVdtOmEa614mZRNqKWuiGzJIQCPOpeMl/y4lPaAg0eR92RR+5zLAnQwl JaHg== X-Gm-Message-State: ACrzQf37oZDlR/Y6yR9NNfiHmwE+Y9D5uwMWZzEjlqYEmu6Wvi1+02XN MisMAHdwoRVcglE0gzppyOCbwwiwGbEWabqJklCXgA== X-Received: by 2002:a05:651c:1073:b0:277:35ca:5eef with SMTP id y19-20020a05651c107300b0027735ca5eefmr6154643ljm.150.1667251373957; Mon, 31 Oct 2022 14:22:53 -0700 (PDT) MIME-Version: 1.0 References: <20221026224449.214839-1-joshdon@google.com> In-Reply-To: From: Josh Don Date: Mon, 31 Oct 2022 14:22:42 -0700 Message-ID: Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth To: Peter Zijlstra Cc: Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Tejun Heo , Joel Fernandes Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey Peter, On Mon, Oct 31, 2022 at 6:04 AM Peter Zijlstra wrote: > > On Wed, Oct 26, 2022 at 03:44:49PM -0700, Josh Don wrote: > > CFS bandwidth currently distributes new runtime and unthrottles cfs_rq's > > inline in an hrtimer callback. Runtime distribution is a per-cpu > > operation, and unthrottling is a per-cgroup operation, since a tg walk > > is required. On machines with a large number of cpus and large cgroup > > hierarchies, this cpus*cgroups work can be too much to do in a single > > hrtimer callback: since IRQ are disabled, hard lockups may easily occur. > > Specifically, we've found this scalability issue on configurations with > > 256 cpus, O(1000) cgroups in the hierarchy being throttled, and high > > memory bandwidth usage. > > > > To fix this, we can instead unthrottle cfs_rq's asynchronously via a > > CSD. Each cpu is responsible for unthrottling itself, thus sharding the > > total work more fairly across the system, and avoiding hard lockups. > > So, TJ has been complaining about us throttling in kernel-space, causing > grief when we also happen to hold a mutex or some other resource and has > been prodding us to only throttle at the return-to-user boundary. Yea, we've been having similar priority inversion issues. It isn't limited to CFS bandwidth though, such problems are also pretty easy to hit with configurations of shares, cpumasks, and SCHED_IDLE. I've chatted with the folks working on the proxy execution patch series, and it seems like that could be a better generic solution to these types of issues. Throttle at return-to-user seems only mildly beneficial, and then only really with preemptive kernels. Still pretty easy to get inversion issues, e.g. a thread holding a kernel mutex wake back up into a hierarchy that is currently throttled, or a thread holding a kernel mutex exists in the hierarchy being throttled but is currently waiting to run. > Would this be an opportune moment to do this? That is, what if we > replace this CSD with a task_work that's ran on the return-to-user path > instead? The above comment is about when we throttle, whereas this patch is about the unthrottle case. I think you're asking why don't we unthrottle using e.g. task_work assigned to whatever the current task is? That would work around the issue of keeping IRQ disabled for long periods, but still forces one cpu to process everything, which can take quite a while. Thanks, Josh