Received: by 2002:a05:7412:3b8b:b0:fc:a2b0:25d7 with SMTP id nd11csp1267521rdb; Fri, 9 Feb 2024 16:23:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IHNpZN6lr9ihmz7GNUIJbVdWZLIPHuDnxRFcgbIo3xCDnT9pwckZ48ZYLJ50KSxJURsEIw4 X-Received: by 2002:a05:620a:1675:b0:785:b064:19be with SMTP id d21-20020a05620a167500b00785b06419bemr696453qko.35.1707524628106; Fri, 09 Feb 2024 16:23:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707524628; cv=pass; d=google.com; s=arc-20160816; b=rdjAUNf/Mvv+EvOYLx3FNrD+Jiui01RFCl+/25xAlXx+1oknM7u/J4jGHifqwAywrI 1egOqt8kApq/ZtjBd9P1dAIjD5hEHeivXA6hMJCprzWF9H6ILq8x+zG/RH2WynMrIkqA ynzQyIqH6d9eFg0vBQWudsGhik2bfOP0Qz84xsHJxqqfMIDkHI4ecKSjDBGhdvBhaNSK losvI6si8tdPhoMKiO4vZ1UBLhhT1IXLIQ+aX8UyhZ1XXg1yF6PQ8BvTp6Z1EYHdvlA0 UWLxIP2h+NdMvF4jUAUKKU1PgbreIGmK5fN1jtOKbk/VuD1KOwS7lXdAGu+6jorONgE9 K6xw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :date:dkim-signature; bh=5Qcsz5yHj12g5NwTwJahjLXYHuW0vPJbg08cutrYM58=; fh=txbAFIDgbpkNxsdJkNl2ci5kPqcmbKKQ8SWRqZNo1qw=; b=FrpfXcO8BZsEBiJwvrw9QHg7GbdCXeLX+zq6u8l5DiqOHgsjqipMj0l1H6/gBzB7w6 I3m31qi3iiN8OC+Quh15LjWN5nFwmk4E7IRlfqe8fhLci/tQkssLRuFYat9kDLYF/Iac OF3UwQkwwtmjTjxQcNjCvx8L4ZGIn8VZEM+Uggi/9K8Uh1xHX2HN4K4J9V8dHWXavKbL aUxCoaBluVyVq7Hte5EbPgpSmpLFBdgmh7+fFV9pSGqZzSFIc/UvofeiYqduV30hmS6g UPRJUc7KYpL6DdhtYy9ip8JC5Q6vPRpfp9RwXtY+RzPoV6GJO7hUZaa7QRKSpOzlI2Hs PiqA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=dOjQouMt; arc=pass (i=1 spf=pass spfdomain=flex--jstultz.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-60178-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-60178-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=2; AJvYcCXW+rr6aDb1RFhb+DVfIVG6aDoJO6eeELKBr7KQ/7f+MIGSN9AkmDJVQKM2hKW+Xo7498h/a4UrYKj7JoadLGNNEhZ6oly8Mp3s8GiLkQ== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id p18-20020a05620a22f200b00785681b8e26si661624qki.281.2024.02.09.16.23.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Feb 2024 16:23:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-60178-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=dOjQouMt; arc=pass (i=1 spf=pass spfdomain=flex--jstultz.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-60178-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-60178-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C9AC71C22938 for ; Sat, 10 Feb 2024 00:23:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2D8D3A47; Sat, 10 Feb 2024 00:23:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dOjQouMt" Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05E3A365 for ; Sat, 10 Feb 2024 00:23:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707524619; cv=none; b=XtFIZCu2nLaQZh4cvE5Ry0/LG2FUnhNvvQXe2cChAb+TPmht+nqIUFtuS91nkRYAz/9njNwSji7K1NfkHm1VzvMPbM6s1IPQ6FfGwztdivQJIdzK5+ew2KsBS/B1UMWQ8u5Ylxok0v3XOry5VSyuMD7sRXo2sOklaBscKa5OLM4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707524619; c=relaxed/simple; bh=vT6pceY5FwfrwD/otpewURN85d7YkaAvyBFynPoyY1c=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=HNAPwdVzH9cvfBQKhtLfX2XU/3+dQHi1PvfeJVpJ1faRGf1vCQjXPMz/ei8rcbE9sSce3Pt/pEFoCvQmWKyfKX7Cg0naaIFr/S0QFO2nncevDKFHVB8nqrJEp8d7bZiTK3iqopcB4FqFSrV21RZ50AK7pW43dutr22F1rVsHWp0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dOjQouMt; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jstultz.bounces.google.com Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc74897c041so2141678276.1 for ; Fri, 09 Feb 2024 16:23:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707524616; x=1708129416; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=5Qcsz5yHj12g5NwTwJahjLXYHuW0vPJbg08cutrYM58=; b=dOjQouMtiVIgIooRi8FOiE33uupzLJtwSLWRxc66flvPp4O8XE9FJupk4t0qF8PrL+ RhNw30QqAq5ON015iLHFLZPyPCZ2PWHgIjJ1xpSPP2j9UZQzrrufJBWRPOOGMjRW+uk8 QdrW+pvFmJFPyfNrunCev7i14sYC7yXRt1T1Rfm1lcUXhxQDkfDGPimspxsOtw485wUB WlgkIAo5SNBUU0FSEKQ9pnqztoubttY9bZzZiY5xtmzEpKm7a6akORXwXC0HvCHYdXqe wjOLLtPZ3g3cHN73+ZXJK3UPXFwSekrBSK7RbWOkfx0sfVPDAmP0JaLffM3BCgJ2/Cza PpKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707524616; x=1708129416; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5Qcsz5yHj12g5NwTwJahjLXYHuW0vPJbg08cutrYM58=; b=ancUiwaj5QSqjOelrnVkc/cJufa+eNUQ5COis3ZBOrR0zB5YeGpGMHFQ8nTdXgTZWl m4n1d7U8aFX50Oax4tyJnkraIdAbj9jYQvp2xsWGxasCMcVoWY2jP559E1cVQc7bZ6yp YWUK/3fVf67h1opRPieALjxa8Fs9EAKX/3zUpc1U/QcdZxF8gHcfvz+HIO59vCZxrpaQ LVwoanpt7B72LO788VS9kEytyUq0awdfV7H6YHhpuQXnLKIO2kdWyhzN3acZRljCF4Sd +Xz6E+rty4FaXSwonk+EuBN+PifrpuWLpNYTr7zZYLxWSghdGjK9YmIRjoiJMM2Vdh5p vHJQ== X-Gm-Message-State: AOJu0Yx/3dWlcLXf1iJf+jlUxAjSWZNEKLja400BaYyMKEuH31qrovfi K0P+7DbTAD6GCGuZSmZhCzI28tDf5OGV3/qqApnvZRBgo31aPG6VuMu+U9xk362T2/gIFPNVZyi QpsFLfPRKhXmjLqgv+B1ZBApv4I3SFL251HE+dS8GhjRtVhceSq2/AOQv7sUbgODP4erUPBjrBo QwExcfzcDzhVBloy9Wx0pHO5tNY7XdsPq9yPBamvFHALvF X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:6902:e8d:b0:dc6:dc0a:1ff0 with SMTP id dg13-20020a0569020e8d00b00dc6dc0a1ff0mr19618ybb.12.1707524615577; Fri, 09 Feb 2024 16:23:35 -0800 (PST) Date: Fri, 9 Feb 2024 16:23:09 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog Message-ID: <20240210002328.4126422-1-jstultz@google.com> Subject: [PATCH v8 0/7] Preparatory changes for Proxy Execution v8 From: John Stultz To: LKML Cc: John Stultz , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , K Prateek Nayak , Thomas Gleixner , kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable After sending out v7 of Proxy Execution, I got feedback that the patch series was getting a bit unwieldy to review, and Qais suggested I break out just the cleanups/preparatory components of the patch series and submit them on their own in the hope we can start to merge the less complex bits and discussion can focus on the more complicated portions afterwards. So for the v8 of this series, I=E2=80=99m only submitting those earlier cleanup/preparatory changes here. However work on the full series has continued, with some nice progress on the performance front. If you are interested, the full v8 series, it can be found here: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc= 3 https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3 One bit of feedback I got earlier from K Prateek Nayak[1] was that while the Proxy Execution feature didn=E2=80=99t impact performance in most benchmarks he tried, it did cause major regressions with the `perf bench` test, which I isolated down to the patch series disabling optimistic spinning which has been a long term performance concern with the change. I spent some time looking for a way to walk the chain (across cpus) to see if the runnable owner was actually running, which would allow us to spin and avoid migrations. However, I realized that is actually a separate and more difficult problem (ie: migration avoidance) from simple optimistic spinning (a task trying to take a mutex where the owner is running). With Proxy Execution, the main focus is how we effectively boost *non-running* mutex owners by the blocked tasks. But when the contended mutex owner is running, we don=E2=80=99t need to do anything different from the existing optimistic spinning logic. So I=E2=80=99ve dropped the patches disabling optimistic spinning, and fixed up some further issues as the change opened some new races the later Proxy Execution patches weren=E2=80=99t expecting. The results are looking really good with the perf bench test, which previously regressed so badly, now seeing very close results (within 1.5%) with and without Proxy Execution. This alleviates a major concern I had with the series. Re-enabling optimistic spinning does open a question about if we should allow the optimistic spinning=E2=80=99s lock-stealing (instead of lock-handoff) when the runnable mutex owner is running as a proxy for a more important task. Ideally we probably want to hand it back up the chain which has donated the time (to avoid migrating the chain to the stealer=E2=80=99s cpu), so I have a patch I=E2=80=99m test= ing that special cases that situation, but so far in my testing I=E2=80=99m not seeing any large increase in latencies from the potential unfairness of lock-stealing. But I=E2=80=99ll be doing more investigation on this. New in v8: --------- * Re-ordered patch set to only submit cleanup & preparatory changes, leaving more complicated changes for a future submission (suggested by Qais Yousef) * Lots of spelling fixups and minor changes suggested by Randy Dunlap and Metin Kaya * Cover letter was getting long winded by including boilerplate sections, so please see earlier versions of the patch series or this LWN article [2] for an overview of the functionality. * Re-enabled mutex optimistic spinning to resolve a major performance regression in earlier versions of the series * Fixups for races discovered with optimistic spinning=20 * Included trace events patch from Metin to better help us analyze the costs of proxying. * Extended sched_football to use rtmtuexes when !CONFIG_SCHED_PROXY_EXEC * Tweaked sched_football to bail if spawned players don=E2=80=99t check in within 30s * Chased down a bug uncovered by sched_football that was preventing unconstrained RT tasks from being put on the pushable list, causing improper task starvation and sched_football failures. Performance: --------- With the optimistic spinning re-enabled, the biggest concern I had wrt performance is much relieved. However, there is still the potential extra overhead of doing rq migrations/return migrations for the proxy case. The big hypothesis of this series is that the overall benefit from unblocking important tasks will well outweigh any extra cost, but if folks see anything of concern, I=E2=80=99d love to hear about it. Issues still to address: --------- * Enabling optimistic spinning uncovered a few interesting races that were much harder to hit in v7. I=E2=80=99ve resolved most of these, but there is a case I=E2=80=99ve occasionally seen where the in try_to_wakeup(), the blocked_on_state transitions are too lax, causing a transition to BO_RUNNABLE when we have not had a chance to evaluate for return-migration (allowing tasks to potentially run on cpus they shouldn=E2=80=99t). Unfortunately in tightening the state handling to be more careful, I=E2=80=99ve run into cases where we don=E2=80=99t transition to BO_RUNNABLE, and effectively lose the wakeup. I=E2=80=99ve started to suspect the conceptual overlap between task->blocked_on_state and task->__state suggests I should consider condensing the BO_BLOCKED/BO_WAKING states into new task states (ie: TASK_MUTEX_BLOCKED, TASK_PROXY_MIGRATED), so we can handle the state transitions more correctly. * Closer evaluation of pros/cons of doing lock handoff vs allowing lock-stealing when proxy tasks release locks. * The chain migration functionality needs further iterations and better validation to ensure it truly maintains the RT/DL load balancing invariants. * Xuewen Yan earlier pointed out that we may see task mis-placement on EAS systems if we do return migration based strictly on cpu allowability. I tried an optimization to always try to return migrate to the wake_cpu (which was saved on proxy-migration), but this seemed to undo a chunk of the benefit I saw in moving return migration back to ttwu, at least with my prio-inversion-demo microbenchmark. Need to do some broader performance analysis with the variations there. * It would be nice to find optimization to avoid migrating blocked tasks if the runnable lock-owner at the end of the blocked_on chain is already running (though this is difficult due to the limitations from locking rules needed to traverse the blocked on chain across run queues). * CFS load balancing. There was concern blocked tasks may carry forward load (PELT) to the lock owner's CPU, so CPU may look like it is overloaded. Needs investigation * The sleeping owner handling (where we deactivate waiting tasks and enqueue them onto a list, then reactivate them when the owner wakes up) doesn=E2=80=99t feel great. This is in part because when we want to activate tasks, we=E2=80=99re already holding a task.pi_lock and a rq_lock, just not the locks for the task we=E2=80=99re activating, nor the rq we=E2=80=99re enqueuing it onto. So = there has to be a bit of lock juggling to drop and acquire the right locks (in the right order). It feels like there=E2=80=99s got to be a better way. * As discussed at OSPM[3], I=E2=80=99d like to split pick_next_task() up into two phases selecting and setting the next tasks, as currently pick_next_task() assumes the returned task will be run which results in various side-effects in sched class logic when it=E2=80=99s run. I tried to take a pass at this earlier, but it=E2= =80=99s hairy and lower on the priority list for now. Credit/Disclaimer: =E2=80=94-------------------- As mentioned previously, this Proxy Execution series has a long history: First described in a paper[4] by Watkins, Straub, Niehaus, then from patches from Peter Zijlstra, extended with lots of work by Juri Lelli, Valentin Schneider, and Connor O'Brien. (and thank you to Steven Rostedt for providing additional details here!) So again, many thanks to those above, as all the credit for this series really is due to them - while the mistakes are likely mine. As always, feedback is always appreciated. Thanks so much! -john [1] https://lore.kernel.org/lkml/c6787831-f659-12cb-4954-fd13a05ed590@amd.c= om/ [2] https://lwn.net/Articles/934114/ [3] https://youtu.be/QEWqRhVS3lI (video of my OSPM talk) [4] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: kernel-team@android.com Connor O'Brien (2): sched: Add do_push_task helper sched: Consolidate pick_*_task to task_is_pushable helper John Stultz (2): locking/mutex: Remove wakeups from under mutex::wait_lock sched: Split out __schedule() deactivate task logic into a helper Juri Lelli (2): locking/mutex: Make mutex::wait_lock irq safe locking/mutex: Expose __mutex_owner() Peter Zijlstra (1): sched: Split scheduler and execution contexts kernel/locking/mutex.c | 60 +++++++---------- kernel/locking/mutex.h | 25 +++++++ kernel/locking/rtmutex.c | 26 +++++--- kernel/locking/rwbase_rt.c | 4 +- kernel/locking/rwsem.c | 4 +- kernel/locking/spinlock_rt.c | 3 +- kernel/locking/ww_mutex.h | 49 ++++++++------ kernel/sched/core.c | 122 +++++++++++++++++++++-------------- kernel/sched/deadline.c | 53 ++++++--------- kernel/sched/fair.c | 18 +++--- kernel/sched/rt.c | 59 +++++++---------- kernel/sched/sched.h | 44 ++++++++++++- 12 files changed, 268 insertions(+), 199 deletions(-) --=20 2.43.0.687.g38aa6559b0-goog