Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp4413949rwi; Mon, 17 Oct 2022 06:04:57 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ROTk/X0oZGS3knVcVnJjvDq1TFAl6M8oinMaA7xOGC/VHn3BFh7TGlW+ZikAMV0UjrX46 X-Received: by 2002:a05:6a00:1707:b0:562:e790:dfc3 with SMTP id h7-20020a056a00170700b00562e790dfc3mr12653072pfc.59.1666011897360; Mon, 17 Oct 2022 06:04:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666011897; cv=none; d=google.com; s=arc-20160816; b=O4cZxmhETVDmAMNSZJLcGyOtpvr9+xJlAG4gW4eBVVe/hm5/X1MZGBQR54VDUfQvDY JungECrrZBoT9xC+k/9HCdqTLqLJORlX2cqVYm0qxnriYieXn9xfvuhAqHRFv1Q+5/aW 6DL82rsBhdyRy4hsQeiBZZIc0oH+ffkX9vkn9iUvp3Xd09s36c6Nh6kyIExLrGYuSrYO wUbx7hXilqCIN5wL23BUgylWP4oWehMTyB6bybcAovsZJfiiBaclc1sNTjk3zVYrxDwE KruEaSNKPCVTEYC3c5Znq7MUCsrBolzHgaCKwkYDXVJ0+N7nROTEEZwYEl5nvq3u8CM3 JqfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:in-reply-to:cc:references:message-id:date :subject:mime-version:from:content-transfer-encoding:dkim-signature; bh=vmPKt7sMiB19LKrnjrYu3jxRrwVHfldzFhhLPrKboNQ=; b=vWySiATpGUNojClovV8DnBQmw/9hURiF/F8wv5SdNdkOdaEDBK3NPgZAkb/bQoPaO8 0617333B1Qcdg29Nz63ZDJdM/5leD1zaSS9HDlcIXrqvX3aif3OL3v9J96ARLCaQcC4N JksOaOaqG3/7AEa/ih1JUq2f3YWyxu4e4HRHd5ZOFpTOtAdAKeRUXTEJI0l6Blv126Lf Fnx55lvmDUTbzcB9W0LQ3tbWxDMZ2VK4SwlZRrtzQoxBZeawWhGfkjzWFSWbxSxwF9sK ILaIX/DFvW5SBihfQNbCxuN2llHdWLn3HU0ca1jrn9OnNI+nuRuU0wBdg3jz371M9Aq3 mXUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=h405mNfa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q15-20020a170902dacf00b0018018272902si13647065plx.554.2022.10.17.06.04.40; Mon, 17 Oct 2022 06:04:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=h405mNfa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229990AbiJQM1y (ORCPT + 99 others); Mon, 17 Oct 2022 08:27:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229726AbiJQM1v (ORCPT ); Mon, 17 Oct 2022 08:27:51 -0400 Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 570213AE6F for ; Mon, 17 Oct 2022 05:27:50 -0700 (PDT) Received: by mail-qv1-xf34.google.com with SMTP id de14so7278227qvb.5 for ; Mon, 17 Oct 2022 05:27:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=vmPKt7sMiB19LKrnjrYu3jxRrwVHfldzFhhLPrKboNQ=; b=h405mNfaUMIzd8j2awBC07EI+WNmP3n/rEmL+7WZe9GmZpmdSN/+ZdlNA3AWXTkXz5 42ig5OnX4/UV9C7HVqf+UhAasD6dA9pMbfJ51Xgq/4Lq3OOtUCbifJyWJDwBm2tLpHf1 +/qO0tjKRvxZO4ShiDiMXrM+JSbG4CYE4OgHM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vmPKt7sMiB19LKrnjrYu3jxRrwVHfldzFhhLPrKboNQ=; b=fCKm8H4meKOKJACRuVUysjPsH+ERjcZ9ra6yZ3+EyiJmS+ItMMPWaP4Xo4m6kv+uiU NBZqyjUNp6hS5cVZT7FcaCuPPCvjduXv+bnd6VSLtFzpkbDYSOV3PxDyzygFYH6FKdPI SfzZhG3O/ORCP+aqY064oSqb3paRpzFrc7qbXrkFofQYhj1aP1ZjYq6drNGHy1mnCA7S hAdd7EfiKBF1EkoC+vn8ztbh+rDsP98M4S63eD2QnBD1lKBpRIqVkcoQU0ZA5VIFC62t A2bG1Q0MPNvd1A76Oje0oUamNRd48Ov8fnAfbZM8UobQlGV01BoiIuWTanu8LIeM6ZsI wIuA== X-Gm-Message-State: ACrzQf0P9ZCx0FvtpdOkkJO3xgxcPP1WawmkXSmBkyI9vCHttavaJMRy eqVD6AopDAxSG7niWxuYwaO14A== X-Received: by 2002:a05:6214:f2b:b0:4b1:7b01:6de2 with SMTP id iw11-20020a0562140f2b00b004b17b016de2mr8014317qvb.122.1666009669421; Mon, 17 Oct 2022 05:27:49 -0700 (PDT) Received: from smtpclient.apple (c-73-148-104-166.hsd1.va.comcast.net. [73.148.104.166]) by smtp.gmail.com with ESMTPSA id d22-20020a376816000000b006ce0733caebsm9027202qkc.14.2022.10.17.05.27.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 17 Oct 2022 05:27:48 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Joel Fernandes Mime-Version: 1.0 (1.0) Subject: Re: [RFC PATCH 00/11] Reviving the Proxy Execution Series Date: Mon, 17 Oct 2022 08:27:48 -0400 Message-Id: <4D33BE61-16B9-4E70-9781-FB8F3C791FCA@joelfernandes.org> References: <5ea4949e-3e8b-2ec0-bdcf-93e5744caee1@bytedance.com> Cc: Connor O'Brien , linux-kernel@vger.kernel.org, kernel-team@android.com, John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Will Deacon , Waiman Long , Boqun Feng , "Paul E . McKenney" In-Reply-To: <5ea4949e-3e8b-2ec0-bdcf-93e5744caee1@bytedance.com> To: Chengming Zhou X-Mailer: iPhone Mail (19G82) X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Oct 17, 2022, at 12:26 AM, Chengming Zhou = wrote: >=20 > =EF=BB=BFOn 2022/10/17 11:56, Joel Fernandes wrote: >>=20 >>=20 >>>> On Oct 16, 2022, at 11:25 PM, Chengming Zhou wrote: >>>=20 >>> =EF=BB=BFHello, >>>=20 >>>> On 2022/10/4 05:44, Connor O'Brien wrote: >>>> Proxy execution is an approach to implementing priority inheritance >>>> based on distinguishing between a task's scheduler context (information= >>>> required in order to make scheduling decisions about when the task gets= >>>> to run, such as its scheduler class and priority) and its execution >>>> context (information required to actually run the task, such as CPU >>>> affinity). With proxy execution enabled, a task p1 that blocks on a >>>> mutex remains on the runqueue, but its "blocked" status and the mutex o= n >>>> which it blocks are recorded. If p1 is selected to run while still >>>> blocked, the lock owner p2 can run "on its behalf", inheriting p1's >>>> scheduler context. Execution context is not inherited, meaning that >>>> e.g. the CPUs where p2 can run are still determined by its own affinity= >>>> and not p1's. >>>=20 >>> This is cool. We have a problem (others should have encountered it too) t= hat >>> priority inversion happened when the rwsem writer is waiting for many re= aders >>> which held lock but are throttled by CFS bandwidth control. (In our use c= ase, >>> the rwsem is the mm_struct->mmap_sem) >>>=20 >>> So I'm curious if this work can also solve this problem? If we don't deq= ueue >>> the rwsem writer when it blocked on the rwsem, then CFS scheduler pick i= t to >>> run, we can use blocked chain to find the readers to run? >>=20 >> That seems a lot harder and unsupported by current patch set AFAICS (my e= xposure to this work is about a week so take it with a grain of salt). You c= ould have multiple readers so how would you choose which reader to proxy for= (round robin?). Also, you no longer have a chain but a tree of chains, wit= h the leaves being each reader - so you have to track that somehow, then kee= p migrating the blocked tasks in the chain to each readers CPU. Possibly mig= rating a lot more than in the case of a single chain. Also it=E2=80=99s not c= lear if it will be beneficial as proxying for one reader does not mean you=E2= =80=99re improving the situation if it is another reader that is in need of t= he boost. >>=20 >=20 > Thanks for your reply, it's indeed more complex than I think, and proxying= for just one reader > is also less efficient. >=20 > But this rwsem priority inversion problem hurts us so much that we are afr= aid to use > CFS bandwidth control now. Imaging when 10 readers held mmap_sem then thro= ttled for 20ms, > the writer will have to wait for at least 200ms, which become worse if the= writer held other lock. I hear you. But on the other hand with so many readers the writer is bound t= o starve anyway. Rwsem is unfair to the writer by definition. But yes, I agr= ee PE (if made to) can help here. I suggest also look into the per-VMA locks= and maple tree work that Suren et all are doing, to improve the situation. Thanks. >=20 > Thanks. >=20