Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp3042227rwi; Tue, 1 Nov 2022 15:14:22 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5DwTMqQY68sVisNPSxKQ2KQQ06eQH4TxczZeZt6QlEUOyFQeCMeHQuwXuyuVaRF23bK04m X-Received: by 2002:a17:902:6b04:b0:181:5dc6:5348 with SMTP id o4-20020a1709026b0400b001815dc65348mr21535358plk.69.1667340862064; Tue, 01 Nov 2022 15:14:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667340862; cv=none; d=google.com; s=arc-20160816; b=IQBKqgXZU/+ArdqKf1rJUsfaJ5WWsiqckLRUvybE/igA3IX67tqx7rireqVS5moMeA aZyCkojtCIN4c4Ag3hkeeM9oW1HunSq1MQYp87+TdXioeca2Tw2JYKno4XTKGz00UIpm Y9KVFYcH15H1qTkdq4qOHFffOKES+kPAOPMRWUNs7YXtKxb7QFYKQ0WNzZ+zFUcJfl+R l56FcFX5wCMoHTbdAvFndW7ZpJljgTYQb2Ui7E3bXafTH8xIVAeKCw5Z8yD1vEeBsDkY JlLf5hMDRgeyaV2UiClvh9slxyRmXP87tzH04Q1wKy5x0IxInRVKG6tzsJPmS3+BdGy4 vCJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=G47ufFWCEHkEOl+rpDEPV/WtoyVq/lMFXIhhpklwHVM=; b=xCWxkla1+MbkGTqsydDszgAeAXga6Qz9jDkXCpa6xse6WqbFrAvFQeMA1vUtfpH+Ha H5PMcjU7rUHwOdgqYThkxzGf3OGEG3WJGI09PetW/R5rsM3aeCIeLKyD3kjQOQqDqnYF SjVPSxcEGgfLCMTS/GkPFUpaaiPCrULu7aHBNHGSZVm3CIgZRJJ8Vc/AXrqjWCII4umL LnyspJZhXYOSEoevA0RHG01eSjK/rgoX1NAfQlYZ1FOYPjyR0JY64/UAWCxbiygvrW3v rKZYO9OeWNIpKV3UW/vMZjtFDtLnQvJUICQWQPaACg+nJRW2BkjVB1i2jBlfQK/JYhKu IY0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=qeTjhXRS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f3-20020a056a00228300b00540cbd3cb18si14994071pfe.234.2022.11.01.15.14.09; Tue, 01 Nov 2022 15:14:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=qeTjhXRS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230015AbiKAVuD (ORCPT + 96 others); Tue, 1 Nov 2022 17:50:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60312 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbiKAVuA (ORCPT ); Tue, 1 Nov 2022 17:50:00 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABC8D2ADC for ; Tue, 1 Nov 2022 14:49:58 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id b29so14645901pfp.13 for ; Tue, 01 Nov 2022 14:49:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=G47ufFWCEHkEOl+rpDEPV/WtoyVq/lMFXIhhpklwHVM=; b=qeTjhXRSrqE2D98uNzCkHhl5QCRXDZvgR9wiRp0VE7ps8gZHaT8lQ4rfvhrqNlNvsS d+VuzED3S1NKdYumslPTE0VwQIDuhgS/TZ92J18nazEJcWP0n5k0dR9Uin8184Y5Xa3p oaX5sq2fldrXJZ55ZVMuDLaR+sYXGzaAXGz8thIR+PsRFrYglgwt/yDcSmN1+V0HYdx3 8NcHdPR+i3n1Mc//DvnnYHPjkWsGUxTGa6B4nMRn+MvzUbXTBSNhZhlZZ7kLHR30ramq NspyZme4+QnlvyfrgQDdXLLeU4yanTDZGk9x6s7JQQVam9icibuwXAXIvC07JCLqU3lW cXBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=G47ufFWCEHkEOl+rpDEPV/WtoyVq/lMFXIhhpklwHVM=; b=cnYwl/uV8rss+L1j7qkW9vWfSldBnvSzsEtEOFLWDk9SRMzCY/WhvJxN1ok6nbkU7Q JbB3HmcCafb3YjzT2/Aj+izJxmZN6rOMYpBwkYGAPQ3UU6rKhBp91BKSo+n2zKXysAqv OjeMxcPb9HxQOnCgXg8dTKcdjNCSTFgCymCJN893yjXexkSCM4x071+22AIN/T4N4Dnf u0J7ddc7+IDbK4bc7t67SLqyffN2lyJ+dT3gaD130FKD/SMdUCgj71vrjPy+4x1mLyjg SqhDu1WFEom/udCiujqeo0MxFrblmltVLcr2uZSW+OVQoeldJNqnr8Ssx+xgp6edGVfB SW+g== X-Gm-Message-State: ACrzQf1COLct9qMkenGU6f2QYPbEQday7YaMLqDgv3ODcbAFoyrzvlkq X37tkqeREZIDfNhu2Bx3vOM0QWonIT0= X-Received: by 2002:a05:6a00:2409:b0:54e:a3ad:d32d with SMTP id z9-20020a056a00240900b0054ea3add32dmr21622632pfh.70.1667339397956; Tue, 01 Nov 2022 14:49:57 -0700 (PDT) Received: from localhost ([2620:10d:c090:400::5:f37f]) by smtp.gmail.com with ESMTPSA id v18-20020a170902ca9200b00186e8526790sm6783423pld.127.2022.11.01.14.49.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 01 Nov 2022 14:49:57 -0700 (PDT) Sender: Tejun Heo Date: Tue, 1 Nov 2022 11:49:55 -1000 From: Tejun Heo To: Josh Don Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Joel Fernandes Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Tue, Nov 01, 2022 at 01:56:29PM -0700, Josh Don wrote: > Maybe walking through an example would be helpful? I don't know if > there's anything super specific. For cgroup_mutex for example, the > same global mutex is being taken for things like cgroup mkdir and > cgroup proc attach, regardless of which part of the hierarchy is being > modified. So, we end up sharing that mutex between random job threads > (ie. that may be manipulating their own cgroup sub-hierarchy), and > control plane threads, which are attempting to manage root-level > cgroups. Bad things happen when the cgroup_mutex (or similar) is held > by a random thread which blocks and is of low scheduling priority, > since when it wakes back up it may take quite a while for it to run > again (whether that low priority be due to CFS bandwidth, sched_idle, > or even just O(hundreds) of threads on a cpu). Starving out the > control plane causes us significant issues, since that affects machine > health. cgroup manipulation is not a hot path operation, but the > control plane tends to hit it fairly often, and so those things > combine at our scale to produce this rare problem. I keep asking because I'm curious about the specific details of the contentions. Control plane locking up is obviously bad but they can usually tolerate some latencies - stalling out multiple seconds (or longer) can be catastrophic but tens or hundreds or millisecs occasionally usually isn't. The only times we've seen latency spikes from CPU side which is enough to cause system-level failures were when there were severe restrictions through bw control. Other cases sure are possible but unless you grab these mutexes while IDLE inside a heavily contended cgroup (which is a bit silly) you gotta push *really* hard. If most of the problems were with cpu bw control, fixing that should do for the time being. Otherwise, we'll have to think about finishing kernfs locking granularity improvements and doing something similar to cgroup locking too. Thanks. -- tejun