Received: by 2002:ac0:c50a:0:0:0:0:0 with SMTP id y10csp1009565imi; Fri, 1 Jul 2022 00:37:25 -0700 (PDT) X-Google-Smtp-Source: AGRyM1s9yUgDvmt5a5c2DjK/nEjg4mwY4Ei4J2NIa7K4mDoOJUvIGchNVQJ+sNUz//+ERHcSWSXj X-Received: by 2002:a05:6402:84a:b0:426:262d:967e with SMTP id b10-20020a056402084a00b00426262d967emr16556664edz.286.1656661045059; Fri, 01 Jul 2022 00:37:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656661045; cv=none; d=google.com; s=arc-20160816; b=UU1UUGY+o52vMkSVbI8ivcnenXEsk/fUfzG8YeFHC6Ay9Ur6Ir+sCSJ+QqZiBOUwhj /yrMs75NgwKQd8czzoDKwAXxmBAEK5T4AHTZYTIFIH7bnqhIGFnhE6oNMqFgThqux+V8 n1P5qM44idWk+oljy/VBiSzrZcUG/5abIZOlsitDGBYzH2Qz6ciDAmw61CuHuH26l6Vm AA44l+ozx3zREDcMVyjAF80YRMo4vrD6bBSuCd7a6Rf7NohwvsUzVylFOF/dIeSuVVcN MKwrpvZA+W7FuCYhjkUOc51y+W63z+5qtwVzr1v5UQt4HwPy89qhV043J7zTuYV5vnxB 40FQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=XM4T8wKPsU/1xIQMMcjSNXWhpVlUjt8T5S4zuKjMTFw=; b=N5q8klZVF/jTYQVntHFUmuDTLme5elmhX4hK8m9TriUQq9YAMsPXkG9fTOS6TcTrPz wcDwEXpKTYyx/IXAIbeW2o7vsh7tjMD+8nCjNTIm1wCgvuZBsHbkfbM/AEsw41/tCRR5 zzra/wAIFDH1qHaB8ryWiPWk2HnQPKJz3WCytiDEv5KUScHNl+4hNd7x/nEspdTG847f 7lvAc5JTmC8jiy9sh/LE/j5uOFzvU+BDpW9beAoJhY5t/Vsr+oqLgZO+l8EIbFNEOSK5 1dQ5R+EEAuP/YQix3Pw4PbiBKUVTsF0lleZNGO6g0P0W219+DbNuewYRMgXlcXETgRdq EBwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hw10-20020a170907a0ca00b007269ad742e4si15221980ejc.455.2022.07.01.00.36.58; Fri, 01 Jul 2022 00:37:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234983AbiGAHes (ORCPT + 99 others); Fri, 1 Jul 2022 03:34:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232764AbiGAHep (ORCPT ); Fri, 1 Jul 2022 03:34:45 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 960AE4132B; Fri, 1 Jul 2022 00:34:44 -0700 (PDT) Received: from dggpeml500025.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4LZ6NR3mhQz1L8fQ; Fri, 1 Jul 2022 15:32:23 +0800 (CST) Received: from dggpeml500018.china.huawei.com (7.185.36.186) by dggpeml500025.china.huawei.com (7.185.36.35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 1 Jul 2022 15:34:42 +0800 Received: from [10.67.111.186] (10.67.111.186) by dggpeml500018.china.huawei.com (7.185.36.186) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 1 Jul 2022 15:34:42 +0800 Message-ID: Date: Fri, 1 Jul 2022 15:34:41 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.1.1 Subject: Re: [Question] The system may be stuck if there is a cpu cgroup cpu.cfs_quato_us is very low To: Tejun Heo CC: , , Juri Lelli , Vincent Guittot , , , , lkml , , , , , Steven Rostedt , References: <5987be34-b527-4ff5-a17d-5f6f0dc94d6d@huawei.com> From: Zhang Qiao In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.111.186] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpeml500018.china.huawei.com (7.185.36.186) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, tejun Thanks for your reply. 在 2022/6/27 16:32, Tejun Heo 写道: > Hello, > > On Mon, Jun 27, 2022 at 02:50:25PM +0800, Zhang Qiao wrote: >> Becuase the task cgroup's cpu.cfs_quota_us is very small and >> test_fork's load is very heavy, the test_fork may be throttled long >> time, therefore, the cgroup_threadgroup_rw_sem read lock is held for >> a long time, other processes will get stuck waiting for the lock: > > Yeah, this is a known problem and can happen with other locks too. The > solution prolly is only throttling while in or when about to return to > userspace. There is one really important and wide-spread assumption in > the kernel: > > If things get blocked on some shared resource, whatever is holding > the resource ends up using more of the system to exit the critical > section faster and thus unblocks others ASAP. IOW, things running in > kernel are work-conserving. > > The cpu bw controller gives the userspace a rather easy way to break > this assumption and thus is rather fundamentally broken. This is > basically the same problem we had with the old cgroup freezer > implementation which trapped threads in random locations in the > kernel. > so, if we want to completely slove this problem, is the best way to change the cfs bw controller throttle mechanism? for example, throttle tasks in a safe location. Thanks. Qiao > So, right now, it's rather broken and can easily be used as an dos > attack vector. > > Thanks. >