Received: by 2002:a05:622a:1442:b0:3a5:28ea:c4b9 with SMTP id v2csp823307qtx; Mon, 31 Oct 2022 14:59:52 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6lUyzAqR8c14PXo4TQmUp2gv6NrFvQ71nIkfvx9xMprxta0rInuH7ucg9EdsMYYnwu2MaM X-Received: by 2002:a17:907:d04:b0:76e:e208:27ba with SMTP id gn4-20020a1709070d0400b0076ee20827bamr15327191ejc.652.1667253592168; Mon, 31 Oct 2022 14:59:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667253592; cv=none; d=google.com; s=arc-20160816; b=mOw+wFAA5LD5UM1ZKgnMk3xyYL+tJTASrozgx11gP2q71PVHVQepYmO1Hr7SmSwO0F gGgxIgTbwR6gbh/5iM5xdbeKQmuWt35M/YLrMerAabhHVuldMP/RyEjYYi8FwvQ0qV3y 3OH8P84MXNMLG/kFivrAFHfXyW1DGelFd92CB1shO21zmXSzPchL6HAcmaiH0UjJSRiM vi9Twr37sPGIpu4+S7QGteqW4JflEHLRXUs9D0r/lzQUEEzCty6XAu/v2ETdgEHM4Jl0 n+Zd9MaqZdmELf5VuKNPAIHqyUkStSCSKq2xUgCqc3pVILqLhIfLIEI79qYeraZxp/LM 70Vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=j1xYWQhEOetE1UWjkft/KCrJ0U86urDet3ZNm0FcuuU=; b=SYqYUo76q+qs9Eqvf1fI0HEe+SYoa4dQo6vBUB4D/HqkhjP01WmjiAG+KFuaM6f+P4 VI62zargKP8P2eW21l0ZiHj21dd8XQnrpTDl2IXf61FZ86Go5ejPBiHN9jjOnokds8uL Xdi+AK/lGVkYCgojxIkDtAQvt8RjbgOnBTSd19WZA4HItQIU9RcxcMfga5uBbD1aNjrr HzIq2SMAYksnZ1M9kbIhOvctziNd+oLFCoGsrMluJHar2cXp11K/QA3kTLxALVO8rMsa T1DqrHRTvTUW1Ro4TAQWyOEBCkPU0La9oiRdN6jaPr2hOw5DTBvUbpMsvxkaXZO9Q864 j0Qg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=XucucEtt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qb26-20020a1709077e9a00b0078e093ae419si8917776ejc.8.2022.10.31.14.59.28; Mon, 31 Oct 2022 14:59:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=XucucEtt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229651AbiJaVuU (ORCPT + 98 others); Mon, 31 Oct 2022 17:50:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229477AbiJaVuR (ORCPT ); Mon, 31 Oct 2022 17:50:17 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E23110FD3 for ; Mon, 31 Oct 2022 14:50:17 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id p3so11911990pld.10 for ; Mon, 31 Oct 2022 14:50:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=j1xYWQhEOetE1UWjkft/KCrJ0U86urDet3ZNm0FcuuU=; b=XucucEttMFSGKGNpgn4+HpGJhxrGqFtSlstdOScwKv5QY7MUMGimhqOe7dGThcs1Ze Pavbvi1EkR44lJNuCkXMK7MBbvUpX1Z0RKJzwKMkAdrJjpJSyM2Lbw8PtbO+MmcMzAWS WrYxEKWHPyAnGU6TnyNF+4TjT5j32/QVh22xtpHIawLntSrB3SJbKgxxh8kp6oUWJKND wFPTKBqtc2rJsiR0Jn1PMX8HUC31gbazfFJGvrwFyYDFR11a1MEOibPIC5LN3OEmcKkT l8f/IJxC3/uxF2v3CyVZbl0XBpdmAoWDlqT7e4mmrdOCh6RS63LfP0PKXwmyfoeMiYyN RlQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=j1xYWQhEOetE1UWjkft/KCrJ0U86urDet3ZNm0FcuuU=; b=P8zw2ztylTEsxTSqVkbNZzdlzZiGqXGyDy7ckJGUJe4ZfZAkdPydHNTzzlE7DDzE3H Fdp7J44qFYZUVMgdbETFDsxcQqwwzaTEdsJSRBm0omaUAXd54WCiM/OHfwpJnP6VDr2o uinX2bgslp+vRwTpyxCzIo65jT7vVZA5EZXaP41LwqfNLHkI+yicDlTEUMArt2GYKFaj igYBwh8em8j73JPfMezQzgEThJW9HyNfQADYjxUhBSJX46Mp+ZjigRWlRMET+MzRf4qm m0ovXEP7QpPwq1dmOP17QWQRt1M8YhMHxxoXIqTOkM4SVaIhY+DwQXhyzFj0yP5+KxuF wAyg== X-Gm-Message-State: ACrzQf15mliOXDwYlQYZtmIHnIzGRbH/wEYbLF2N+3hXbSr6pUWxyAme lkM84jQcVmCzTIde1SNZAbkdt/fTPGs= X-Received: by 2002:a17:902:f78a:b0:184:f2e2:a5fa with SMTP id q10-20020a170902f78a00b00184f2e2a5famr16307133pln.161.1667253016509; Mon, 31 Oct 2022 14:50:16 -0700 (PDT) Received: from localhost ([2620:10d:c090:400::5:ba13]) by smtp.gmail.com with ESMTPSA id nk21-20020a17090b195500b0020af2bab83fsm4720330pjb.23.2022.10.31.14.50.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Oct 2022 14:50:14 -0700 (PDT) Sender: Tejun Heo Date: Mon, 31 Oct 2022 11:50:12 -1000 From: Tejun Heo To: Josh Don Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Joel Fernandes Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth Message-ID: References: <20221026224449.214839-1-joshdon@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Mon, Oct 31, 2022 at 02:22:42PM -0700, Josh Don wrote: > > So, TJ has been complaining about us throttling in kernel-space, causing > > grief when we also happen to hold a mutex or some other resource and has > > been prodding us to only throttle at the return-to-user boundary. > > Yea, we've been having similar priority inversion issues. It isn't > limited to CFS bandwidth though, such problems are also pretty easy to > hit with configurations of shares, cpumasks, and SCHED_IDLE. I've We need to distinguish between work-conserving and non-work-conserving control schemes. Work-conserving ones - such as shares and idle - shouldn't affect the aggregate amount of work the system can perform. There may be local and temporary priority inversions but they shouldn't affect the throughput of the system and the scheduler should be able to make the eventual resource distribution conform to the configured targtes. CPU affinity and bw control are not work conserving and thus cause a different class of problems. While it is possible to slow down a system with overly restrictive CPU affinities, it's a lot harder to do so severely compared to BW control because no matter what you do, there's still at least one CPU which can make full forward progress. BW control, it's really easy to stall the entire system almost completely because we're giving userspace the ability to stall tasks for an arbitrary amount of time at random places in the kernel. This is what cgroup1 freezer did which had exactly the same problems. > chatted with the folks working on the proxy execution patch series, > and it seems like that could be a better generic solution to these > types of issues. Care to elaborate? > Throttle at return-to-user seems only mildly beneficial, and then only > really with preemptive kernels. Still pretty easy to get inversion > issues, e.g. a thread holding a kernel mutex wake back up into a > hierarchy that is currently throttled, or a thread holding a kernel > mutex exists in the hierarchy being throttled but is currently waiting > to run. I don't follow. If you only throttle at predefined safe spots, the easiest place being the kernel-user boundary, you cannot get system-wide stalls from BW restrictions, which is something the kernel shouldn't allow userspace to cause. In your example, a thread holding a kernel mutex waking back up into a hierarchy that is currently throttled should keep running in the kernel until it encounters such safe throttling point where it would have released the kernel mutex and then throttle. Thanks. -- tejun