Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp345953rwb; Tue, 13 Dec 2022 18:45:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf5wH8PoHiLhsG7NtcFjcrJnKs+m6goVAksXyl7f3RL4Tlb+bmUT2rnHLlg/kQEq+cmRrnLH X-Received: by 2002:a05:6402:1154:b0:462:16a2:a2e7 with SMTP id g20-20020a056402115400b0046216a2a2e7mr18806732edw.6.1670985906991; Tue, 13 Dec 2022 18:45:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670985906; cv=none; d=google.com; s=arc-20160816; b=vxFK0CciBiwosozkPvwT0Pt6aONrmsvH5vA8y4R2aY891n3QeQ13/92Rvv1YPblfW5 Hf4TGK4n7s9znYAEHh7+dvhCkpU8NjFmV3zQXdShbU4eYOpXDj+HOkIGXFuixL7xDYBU 6su6WliOmtE/rOcmauruWXf1d5S62luOaHyrUa8Yix1u4Xkw7rm4Ft8hbdPoF0o1BEwr wKyNs20RzioTYcMwijadlEq9GuGU6tlVxYfVhPTMzV+UDwaFmLFvap2Y8994pqbywrzK YloehLPtc6cqibiLjRVolDj/M+tpQMl5aGhb18f2sX+ZqGP5ET3+jFnxJhaoSoqOqflQ jVcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=tJ4CSVZZ+NoydkcJKVQO6YNOhus6JhiYOMFjQ8jjtjU=; b=vcEL+D3xDJyCNOnIwQmF+zlg4wXzYdvE/8umA7tHAvEvcJXjC7zhPPmY6gFBxoEyjB ZZtgVfjfrGsxVUOS69Li2of0HGuGtMJQt3urjTZjduvOAjl87+LMWEEudbFEDdMG/Ld0 3Gu14n1cykkEWlABs+ilPgBvFeC3nPC26Oc7xZKuVKXkCg5yXp8jSJk2xcPVE7frrFz+ yDXbYG7m70fe5D4XFVPGBIkN+NtRb4Lxx8oO3j0Szn2RqY3gBqWRINr0vEYlDDxWByd/ 3Mo1b2BwSqoMEpXrxs+m68GH87BGUncBb9LsTVijnNuU8LD7N5jC9JODY5GGb//3WVsV zZsQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="m/SKUa27"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dk21-20020a0564021d9500b00461c3758704si10610490edb.36.2022.12.13.18.44.50; Tue, 13 Dec 2022 18:45:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="m/SKUa27"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236646AbiLNCAk (ORCPT + 71 others); Tue, 13 Dec 2022 21:00:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236739AbiLNCAd (ORCPT ); Tue, 13 Dec 2022 21:00:33 -0500 Received: from mail-pf1-x42f.google.com (mail-pf1-x42f.google.com [IPv6:2607:f8b0:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBED413F57 for ; Tue, 13 Dec 2022 18:00:32 -0800 (PST) Received: by mail-pf1-x42f.google.com with SMTP id 65so3474078pfx.9 for ; Tue, 13 Dec 2022 18:00:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=tJ4CSVZZ+NoydkcJKVQO6YNOhus6JhiYOMFjQ8jjtjU=; b=m/SKUa27vVBjK1uJ2X221o05KhZ9jtihYba01Ju4Ef9NiQruOgsHzM/Kf1zncuPTTX 5WeMUgP2qqB8oX/4aP0MqYeAjg3qs/eM20mosQEYuuuFZnP4JWFMeKxBkG3TuR5NK4vu qFImpcXkZ2530uslsCBedMtoQkMTCMSn577sVPwZQ4PtS7Ixp2wQrQhf3JPXBZeWg99f xhPEhRvSnr9JK20Gn1Xy65nImGw11v7zxUokfH2gN5b0GSEXW//gCO0MyExz+vTrRPRC Muj85AAhNQdYKoc9aj4gGiO9y1whE7GOJaJob8+6UkimVCJbodt1gz6mB2zEZSbiY6zc Q58g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tJ4CSVZZ+NoydkcJKVQO6YNOhus6JhiYOMFjQ8jjtjU=; b=mrVA5o/Diq8WhNJnTkXg8tva+4acx/iMQUZObDzFvk3Sg6qlU6MMXsGSwVpxYQB463 a9L4gZThdhtJmh7TRCFJpWgFxGqdTA6iq9oAwIYE7gWKp9LEkQTjMC1NnAmQUAOoVLnp JEO24DoTpOI/bzHJqzjEnUwYngN4nybLcCSwEE8TD+ndehG72+DJFJWb0+i4iBiB7aCf QbACCZsWA4cIZ5B+uWO3lLX674bZ9oVdh56MoCd2Nja0El7IvI652GTHrlW1X77p3KAD 6MUWR4ANjVF2AeA8Do7J3M4dwCG6bV0/mThlY7AoN5dWbQJU91qJIMtFDP1uLkBTtkN4 wjGg== X-Gm-Message-State: ANoB5pmKcUcSe2vBQHYVujyzr3ycsCpxND+cqJ38xDhMVDV91w5qfBH1 TM9l2My6u8x43wQgoyO31BeZ6OhbQ6u655SgMId9rg== X-Received: by 2002:a63:1247:0:b0:476:ed2a:6216 with SMTP id 7-20020a631247000000b00476ed2a6216mr77727663pgs.556.1670983231384; Tue, 13 Dec 2022 18:00:31 -0800 (PST) MIME-Version: 1.0 References: <20221130082313.3241517-1-tj@kernel.org> <20221130082313.3241517-32-tj@kernel.org> In-Reply-To: From: Josh Don Date: Tue, 13 Dec 2022 18:00:19 -0800 Message-ID: Subject: Re: [PATCH 31/31] sched_ext: Add a rust userspace hybrid example scheduler To: Peter Zijlstra Cc: Tejun Heo , torvalds@linux-foundation.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, brho@google.com, pjt@google.com, derkling@google.com, haoluo@google.com, dvernet@meta.com, dschatzberg@meta.com, dskarlat@cs.cmu.edu, riel@surriel.com, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, kernel-team@meta.com, Peter Oskolkov Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > and ignoring > > the specifics of this example, the UMCG and sched_ext work are > > complementary, but not mutually exclusive. UMCG is about driving > > cooperative scheduling within a particular application. UMCG does not > > have control over or react to external preemption, > > It can control preemption inside the process, and if you have the degree > of control you need to make the whole BPF thing work, you also have the > degree of control to ensure you only run the one server task on a CPU > and all that no longer matters because there's only the process and you > control preemption inside that. To an extent yes, but this doesn't extend to the case where cpu is overcommitted. Even if not by other applications, then responding to preemption by, for example, kthreads (necessary for microsecond scale workloads). But in general the common case is interference from other applications, something which is handled by a system level scheduler like sched_ext. The application vs system level control is an important distinction here. > > nor does it make thread placement decisions. > > It can do that just fine -- inside the process. UMCG has full control > over which server task a worker task is associated with, then run a > single server task per CPU and have them pinned and you get full > placement control. Again, this doesn't really scale past single server per cpu. It is not feasible to partition systems in this way due to the loss of efficiency. > > sched_ext is considering things more at > > the system level: arbitrating fairness and preemption between > > processes, deciding when and where threads run, etc., and also being > > able to take application-specific hints if desired. > > sched_ext does fundamentally not compose, you cannot run two different > schedulers for two different application stacks that happen to co-reside > on the same machine. We're actually already developing a framework (and plan to share) to support composing an arbitrary combination of schedulers. Essentially, a "scheduler of schedulers". This supports the case, for example, of a system that runs most tasks under some default SCX scheduler, but allows a particular application or group of applications to utilize a bespoke SCX scheduler of their own. > sched_ext also sits at the very bottom of the class stack (it more or > less has to) the result is that in order to use it at all, you have to > have control over all runnable tasks in the system (a stray CFS task > would interfere quite disastrously) but that is exactly the same > constraint you need to make UMCG work. UMCG still works when mixed with other tasks. You're specifying which threads of your application you want running, but no guarantees are made that they'll run right now if the system has other work to do. SCX vs CFS is a more interesting story. Yes it is true that a single CFS task could hog a cpu, but since SCX is managing things at a system level, we feel that this is something that should be handled by system administration. You shouldn't expect to mix cpu bound CFS tasks in the same partition as threads running under SCX with good results. > Conversely, it is very hard to use the BPF thing to do what UMCG can do. > Using UMCG I can have a SCHED_DEADLINE server implement a task based > pipeline schedule (something that's fairly common and really hard to > pull off with just SCHED_DEADLINE itself). UMCG and SCX are solving different problems though. An application can decide execution order or control internal preemption via UMCG, while SXC arbitrates allocation of system resources over time. And, conversely, SCX can do things very difficult or impossible with UMCG. For example, implementing core scheduling. Guaranteeing microsecond scale tail latency. Applying a new scheduling algorithm across multiple independent applications. > Additionally, UMCG naturally works with things like Proxy Execution, > seeing how the server task *is* a proxy for the current active worker > task. Proxy execution should also work with SCX; the enqueue/dequeue abstraction can still be used to allow the SCX scheduler to select the proxy.