Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp2962165rwi; Tue, 1 Nov 2022 14:04:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7zmjTLacGVFELeQE85W9hgry4Pg9J0TNeFpBon+6fW9zX9griJXFFdbJ0pK/SzBGZuJjMV X-Received: by 2002:a63:fd58:0:b0:46b:41d:9d33 with SMTP id m24-20020a63fd58000000b0046b041d9d33mr18763053pgj.399.1667336680605; Tue, 01 Nov 2022 14:04:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667336680; cv=none; d=google.com; s=arc-20160816; b=Ma1u+Q6E4ngUNNcIEASjJCyoEW7PjbePQY9Fp19dceyzWnYfaZxVWYKxAwnGV47HGE 9L2dgv69iN7y0oQSAIxq5YhwsVKiUdz0zmyWHCOWj0Ftyze+SqV8wvdILIzarbuDn4qD kELknOaptBDZTp10l5tvlx/SS7rum7hd5VljTTMooK0ZzLn3s2Khswui/x2E9kt/0spd kYEcYr2AJrDrySgvim7zXI8/rNtuAvpeVgFAWRbx3TLxmvFZCBETD0/fFEz2pYGS8fJw mOYCbISGKZRNLlTNRG0l6x+5Py1yhQlh98vxAkrg/djnZNTpWSpBEZle2j8KvHte+VO5 7DRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=rZSmzcoQMbqfgLkLNVYbX8iJgnJ1KEQSiQFTCGQ48wg=; b=ZA1tjOJBlWfbSOU5Cd5kRgAEFK/j1AJEHgajiZP6Sed/jCNxPQHt+lozooDDGbD5fy Cz4bEqiye+ExFRatceetO8Yi1e6RiuzRlZ7iKZ2KORWmflCD2GHhjQDqJWiTVMmyDMCD 1Eghjft+YMOk+iD498q8j4GQQQXMjp8YFhVrjIWu84kse/Lxm2Eguj6GCLF/jaCq/4aq Fftw0a82vbvOX8DSXMOTJkLyj56R6Yhe4iR05bL42wCjmUHZeibzJhWDTFWO3R/GvdRk j7FHiXz1fknnmu4ipvHG812+CP1duKz3D9JCD58r0gCEXuKdUI3jEgxFAXzQDSyWLGtd 0Qkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KmkPZmbo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b1-20020a631b41000000b0046004666d82si13779325pgm.497.2022.11.01.14.04.23; Tue, 01 Nov 2022 14:04:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KmkPZmbo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229866AbiKAU4q (ORCPT + 98 others); Tue, 1 Nov 2022 16:56:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbiKAU4o (ORCPT ); Tue, 1 Nov 2022 16:56:44 -0400 Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7301B1DA6E for ; Tue, 1 Nov 2022 13:56:43 -0700 (PDT) Received: by mail-lf1-x136.google.com with SMTP id p8so25046243lfu.11 for ; Tue, 01 Nov 2022 13:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=rZSmzcoQMbqfgLkLNVYbX8iJgnJ1KEQSiQFTCGQ48wg=; b=KmkPZmbozD3oR6GrZQ3BIpm2vAgz67d2WX/lJR9uf4Gfe6keslKHgzM1Syp35yWJm0 wKsZIOpwBUS6C4ccy8zHoZ3SkolWplY8s8uzAWKurxQsFc8B6S5ao6d0dIp/6GEF5ySW E+W5zs5uKtYVq9gU0YKxOB9itkJI7LA4cC8YzOPGQ3Jb86kzzzKyaa465gWP0ZaJheKh pwnBq7OMfyzG0JvyaN53nqSzfMPdLPQF3gAKKIxqEEfgxRNElk/ZIuxwNYQsyIkzuwjU s6AHeg2Zf9g16J1Qu8H21dQr2i7OFs4nLer4nT1mSIG0+CDxXEA6L1EZTfvE1NwpbY8T pzPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=rZSmzcoQMbqfgLkLNVYbX8iJgnJ1KEQSiQFTCGQ48wg=; b=eLXvU1ww3t0wNCJzQUWE+Ogp2rSgEAyQKPH1Xz61uAtHgQAgt1FK6y8aYqfJZbNNEF 4ImIpgcUi/QEy0lDsNkWV4I0J486zY0vKXa+QsLVnC8rqaa+aGVqps2cthVXXvDp47dw 3A90HpBqglbg4BQ/c8eV3poYep9OVJ/1+vzortbLLG41gIFjuHsqE68VeBReAJlGe/Qs PkRdl4IkBeAHer2IKfDoLOILZr9LNIqJ6Xu0enizrK+T9RdEsc9gZQZvO/3Z0GsQMaNN /iT6AsDw1I7ovddOA3VrS3Cfx4FTYg3XA+Nmoiw3CV9qgB4g9kjSNJ6uKtQP876UyoVM ZDBw== X-Gm-Message-State: ACrzQf3RttOwshpu5VqVdKEnBWijAmQzhGAp6+scBbn+4zdG70/x6kdr vZS9L0+kYamIDDqk7lR+MviOO9DvlH31WWNRXFhT4g== X-Received: by 2002:a05:6512:3a89:b0:4a2:fbe:5573 with SMTP id q9-20020a0565123a8900b004a20fbe5573mr7612599lfu.546.1667336201577; Tue, 01 Nov 2022 13:56:41 -0700 (PDT) MIME-Version: 1.0 References: <20221026224449.214839-1-joshdon@google.com> In-Reply-To: From: Josh Don Date: Tue, 1 Nov 2022 13:56:29 -0700 Message-ID: Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth To: Tejun Heo Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Joel Fernandes Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 1, 2022 at 12:15 PM Tejun Heo wrote: > > Hello, > > On Tue, Nov 01, 2022 at 12:11:30PM -0700, Josh Don wrote: > > > Just to better understand the situation, can you give some more details on > > > the scenarios where cgroup_mutex was in the middle of a shitshow? > > > > There have been a couple, I think one of the main ones has been writes > > to cgroup.procs. cpuset modifications also show up since there's a > > mutex there. > > If you can, I'd really like to learn more about the details. We've had some > issues with the threadgroup_rwsem because it's such a big hammer but not > necessarily with cgroup_mutex because they are only used in maintenance > operations and never from any hot paths. > > Regarding threadgroup_rwsem, w/ CLONE_INTO_CGROUP (userspace support is > still missing unfortunately), the usual worfklow of creating a cgroup, > seeding it with a process and then later shutting it down doesn't involve > threadgroup_rwsem at all, so most of the problems should go away in the > hopefully near future. Maybe walking through an example would be helpful? I don't know if there's anything super specific. For cgroup_mutex for example, the same global mutex is being taken for things like cgroup mkdir and cgroup proc attach, regardless of which part of the hierarchy is being modified. So, we end up sharing that mutex between random job threads (ie. that may be manipulating their own cgroup sub-hierarchy), and control plane threads, which are attempting to manage root-level cgroups. Bad things happen when the cgroup_mutex (or similar) is held by a random thread which blocks and is of low scheduling priority, since when it wakes back up it may take quite a while for it to run again (whether that low priority be due to CFS bandwidth, sched_idle, or even just O(hundreds) of threads on a cpu). Starving out the control plane causes us significant issues, since that affects machine health. cgroup manipulation is not a hot path operation, but the control plane tends to hit it fairly often, and so those things combine at our scale to produce this rare problem. > > Thanks. > > -- > tejun