Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp710741rwi; Mon, 31 Oct 2022 06:44:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4rXKLG8YbejWVSNDD+LJC8Xk7lb+wqARULJ2pbp4Kp2najfNA5uqVQ3kI9A6YB3I+zhtpj X-Received: by 2002:aa7:d5ce:0:b0:463:699c:95b3 with SMTP id d14-20020aa7d5ce000000b00463699c95b3mr4000681eds.398.1667223881099; Mon, 31 Oct 2022 06:44:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667223881; cv=none; d=google.com; s=arc-20160816; b=q9n0w8uxsbPJKGwfONnt3QT88dccDvVlTX61faZTu8M/suiu8vnAIZz4Ljq8NfgnE3 YFX0o5coIsY+E2WTdt9tH8YP9Yzg6ZPXrD9cPqJyw/D+MmQHUYgpjfJ5eVwjZbwyKZAt /Xx28ijzBviYwsoCW6L5sfBRy1VMQkbKWOE1bGD0FcftqarhnIDKK/DqkQoN1LcHKB5V ugomovV223OLiSz9ySwmDHsz65AcLR0vEAPePhZpkfaIYZ4T+vyjF9AfBtMyFxKNprho 4nr5nJumHg8yw5HvxQTl2paSlY94VePetgpXV7GtUadZbXgve5hb3v4/DKmSKRCdzY+Q 6Jww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=xaDwwkyAJVJEGhNAHbIAkDPDRhf+szRoPKaEYBQPnD8=; b=d9SM+Ew5bHv+iL+dKyXt6iYP5NsAbW8gzFnd4LIqQsAbT3Uz2vGk1zCEUdx4KsikjZ GQaIvv+VFiyjO/ZkAKWZwBT1ckh6W7imxSKJTIfDCBmSlECVBWWnETQsUfCiKwW3maTt x7Sd/szFsE89bewJcFfJzcG+PnACZ/Ps82cNKh4tCHAd/iVXEYt3EyyBYcyi97X4uOov fopLDo7h8sruCLdd6D3HZdbyo/MP1lB1ThlZQPMHpfRblnlkH1Wv3BrqVnpReGF8w+Ge E2PKrA+aQbALBuU2Ba6hvIvru8ZAWVVaCJb5aFAurnSnspCpDn/9S/aX28NgLPd2Q8HA wC9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=a1ltrHwJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p26-20020a50cd9a000000b004623028c586si7698334edi.141.2022.10.31.06.44.14; Mon, 31 Oct 2022 06:44:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=a1ltrHwJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230000AbiJaNEg (ORCPT + 98 others); Mon, 31 Oct 2022 09:04:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230442AbiJaNEd (ORCPT ); Mon, 31 Oct 2022 09:04:33 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B4EDFCE for ; Mon, 31 Oct 2022 06:04:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=xaDwwkyAJVJEGhNAHbIAkDPDRhf+szRoPKaEYBQPnD8=; b=a1ltrHwJZBMS5tX8vewGYJrEhc NfzREq/vLCfZjhSMB8HW2rC4/EDHmpHCrFPPJwyJZyRRSl1AGx0GQt8xuJkbXQzybZgRcih/Fj3lz H/Wr3JjCcZq6ph6v9vJgXhkOA9LpuAqHbZ6Z5wP6BjULjgM6MR/UcXvbbT1Lk1cXpFSEE3isIqmIw hL+IJUkgLUWkJZ7+ibM/RrRdLTeZs+bbDCBGHIvHVzye9rXefZnOAoDO+2g67sNAxCHNxk6zhndN5 4v134RTU/FC1p1KvW3ZbpT2mMY1Irq0J/dzsgmhYCNCk25eb1aCu2i7J6kC6rLa2pzqpnXYKA7Ron kzA5BWkw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1opUSl-003gmz-OA; Mon, 31 Oct 2022 13:04:23 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id C4F8830013F; Mon, 31 Oct 2022 14:04:15 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id A27222C74B23D; Mon, 31 Oct 2022 14:04:15 +0100 (CET) Date: Mon, 31 Oct 2022 14:04:15 +0100 From: Peter Zijlstra To: Josh Don Cc: Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Tejun Heo Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth Message-ID: References: <20221026224449.214839-1-joshdon@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221026224449.214839-1-joshdon@google.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 26, 2022 at 03:44:49PM -0700, Josh Don wrote: > CFS bandwidth currently distributes new runtime and unthrottles cfs_rq's > inline in an hrtimer callback. Runtime distribution is a per-cpu > operation, and unthrottling is a per-cgroup operation, since a tg walk > is required. On machines with a large number of cpus and large cgroup > hierarchies, this cpus*cgroups work can be too much to do in a single > hrtimer callback: since IRQ are disabled, hard lockups may easily occur. > Specifically, we've found this scalability issue on configurations with > 256 cpus, O(1000) cgroups in the hierarchy being throttled, and high > memory bandwidth usage. > > To fix this, we can instead unthrottle cfs_rq's asynchronously via a > CSD. Each cpu is responsible for unthrottling itself, thus sharding the > total work more fairly across the system, and avoiding hard lockups. So, TJ has been complaining about us throttling in kernel-space, causing grief when we also happen to hold a mutex or some other resource and has been prodding us to only throttle at the return-to-user boundary. Would this be an opportune moment to do this? That is, what if we replace this CSD with a task_work that's ran on the return-to-user path instead?