Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp429203rdh; Thu, 26 Oct 2023 06:17:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGgDO78vLIxFWoS2V6ws7JWz8aqqpPcoXOOzF1Fre9NxcA7PKCNolSDCgaHlPSrVzoefQQZ X-Received: by 2002:a25:8389:0:b0:da0:3862:f0a3 with SMTP id t9-20020a258389000000b00da03862f0a3mr9039989ybk.5.1698326242489; Thu, 26 Oct 2023 06:17:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698326242; cv=none; d=google.com; s=arc-20160816; b=EUpeya9h9MmIwWbbVDPyi60MFuLd46wss3xqqCTZcayxG429Y8FqeXB7NPUA21HU35 +bVfqn1faUsIgutuLZE08VBiT9fGUuyDvf3lPhF0bZ+QyErrHXtnQ0TztsFd0ufUMuVH cLUoAThV6AXv6XX7NWEk05Ey1lVhO80ELm3YFrR90eI4B/7P1gNLgMiavVVWLOWzPhhZ jzgwqfu7HhWMmNA1UBP7YUJGJUYualjAeIsHkJxG7h1MnUFYHG3DfHwnLWcdQkYAwRlw 9F4YrRsqOudbbY0c6+Z+TnazBCypLhDhW3ww9JSgvqBqpSUPRP02bviNFTAsHksDWyQi 2isA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=xiZdDWidVDsLTf7NVs/xisx5yxmb3zuK2U9ZvLRzvvM=; fh=FtWGlnqqdNaWYeoHYcUTjqCFSoyEMXwasmZosU8Dn4Y=; b=A4KH2SjvDL2VyJwOxCGg5N2kVKilc4YxJP5d6ztO/eOvOVfEJfFsqGwoWOVfSXn0AP lBzt+hi8TJq4SGXJsHYPeYGgDegrGdFocOwwvGPBdL9lKISURie3SQxKVVL3MH5BTkW5 JZiLxJqmR9CVQOVAemzkxF5a52WUdjpq6MBKoAHZHO8i8z1aG+iJNWBqmgo7QWxUP6O5 s/v1foq3PbVau7Bx45OJfV0X0OZDcgLig7njBPZ0WKULp8RBAPfi3XftvPA944nEt28x IYob91NEmfHJDqr0SQH2TLtpgKtofVoJPkbtY8V0mZRKVWJkjHr+0FWoZgHJSfqiIqcc 6iaw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id a2-20020a25ae02000000b00d9a55d40b85si14787668ybj.137.2023.10.26.06.17.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 06:17:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id CD57482DEE74; Thu, 26 Oct 2023 06:17:18 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345042AbjJZNRI convert rfc822-to-8bit (ORCPT + 99 others); Thu, 26 Oct 2023 09:17:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231135AbjJZNRH (ORCPT ); Thu, 26 Oct 2023 09:17:07 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E44F4D54 for ; Thu, 26 Oct 2023 06:17:03 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF0ECC433C9; Thu, 26 Oct 2023 13:16:59 +0000 (UTC) Date: Thu, 26 Oct 2023 09:16:58 -0400 From: Steven Rostedt To: Peter Zijlstra Cc: LKML , Thomas Gleixner , Ankur Arora , Linus Torvalds , linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal , Ingo Molnar , Daniel Bristot de Oliveira , Mathieu Desnoyers Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice Message-ID: <20231026091658.1dcf2106@gandalf.local.home> In-Reply-To: <20231026084402.GK31411@noisy.programming.kicks-ass.net> References: <20231025054219.1acaa3dd@gandalf.local.home> <20231025102952.GG37471@noisy.programming.kicks-ass.net> <20231025085434.35d5f9e0@gandalf.local.home> <20231025135545.GG31201@noisy.programming.kicks-ass.net> <20231025103105.5ec64b89@gandalf.local.home> <20231026084402.GK31411@noisy.programming.kicks-ass.net> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Thu, 26 Oct 2023 06:17:19 -0700 (PDT) On Thu, 26 Oct 2023 10:44:02 +0200 Peter Zijlstra wrote: > > Actually, it works with *any* system call. Not just sched_yield(). I just > > used that as it was the best one to annotate "the kernel asked me to > > schedule, I'm going to schedule". If you noticed, I did not modify > > sched_yield() in the patch. The NEED_RESCHED_LAZY is still set, and without > > the extend bit set, on return back to user space it will schedule. > > So I fundamentally *HATE* you tie this hole thing to the > NEED_RESCHED_LAZY thing, that's 100% the wrong layer to be doing this > at. > > It very much means you're creating an interface that won't work for a > significant number of setups -- those that use the FULL preempt setting. And why can't the FULL preempt setting still use the NEED_RESCHED_LAZY? PREEMPT_RT does. The beauty about NEED_RESCHED_LAZY is that it tells you whether you *should* schedule, or you *must* schedule (NEED_RESCHED). > > > > > set this bit and leave it there for as long as you want, and it should not > > > > affect anything. > > > > > > It would affect the worst case interference terms of the system at the > > > very least. > > > > If you are worried about that, it can easily be configurable to be turned > > off. Seriously, I highly doubt that this would be even measurable as > > interference. I could be wrong, I haven't tested that. It's something we > > can look at, but until it's considered a problem it should not be a show > > blocker. > > If everybody sets the thing and leaves it on, you basically double the > worst case latency, no? And weren't you involved in a thread only last > week where the complaint was that Chrome was a pig^W^W^W latency was too > high? In my first email about this: https://lore.kernel.org/all/20231024103426.4074d319@gandalf.local.home/ I said: If we are worried about abuse, we could even punish tasks that don't call sched_yield() by the time its extended time slice is taken. To elaborate further on this punishment, if we find that it does become an issue if a bunch of tasks were to always have this bit set and not giving up the CPU in a timely manner, it could be flagged to ignore that bit and/or remove some of its eligibility. That is, it wouldn't take too long before the abuser gets whacked and is no longer able to abuse. But I figured we would look into that if EEVDF doesn't naturally take care of it. > > > > > If you look at what Thomas's PREEMPT_AUTO.patch > > > > > > I know what it does, it also means your thing doesn't work the moment > > > you set things up to have the old full-preempt semantics back. It > > > doesn't work in the presence of RT/DL tasks, etc.. > > > > Note, I am looking at ways to make this work with full preempt semantics. > > By not relying on the PREEMPT_AUTO stuff. If you noodle with the code > that actually sets preempt it should also work with preempt, but you're > working at the wrong layer. My guess is that NEED_RESCHED_LAZY will work with PREEMPT as well. That code is still a work in progress, and this code is dependent on that. Right now it depends on PREEMPT_AUTO because that's the only option that currently gives us NEED_RESCHED_LAZY. From reading the discussions from Thomas, it looks like NEED_RESCHED_LAZY will eventually be available in CONFIG_PREEMPT. > > Also see that old Oracle thread that got dug up. I'll go back and read that. > > > > More importantly, it doesn't work for RT/DL tasks, so having the bit set > > > and not having OTHER policy is an error. > > > > It would basically be a nop. > > Well yes, but that is not a nice interface is it, run your task as RT/DL > and suddenly it behaves differently. User space spin locks would most definitely run differently in RT/DL today! That could cause them to easily deadlock. User space spin locks only make sense with SCHED_OTHER, otherwise great care needs to be taken to not cause unbounded priority inversion. Especially with FIFO. > > This is because these critical sections run much less than 8 atomic ops. And > > when you are executing these critical sections millions of times a second, > > that adds up quickly. > > But you wouldn't be doing syscalls on every section either. If syscalls > were free (0 cycles) and you could hand-wave any syscall you pleased, > how would you do this? > > The typical futex like setup is you only syscall on contention, when > userspace is going to be spinning and wasting cycles anyhow. The current > problem is that futex_wait will instantly schedule-out / block, even if > the lock owner is currently one instruction away from releasing the lock. And that is what user space adaptive spin locks are to solve, which I'm 100% all for! (I'm the one that talked André Almeida into working on this). But as my tests show, the speed up is from keeping the lock holder from being preempted. The same is true for why Thomas created NEED_RESCHED_LAZY for PREEMPT_RT when it already had adaptive spin locks. -- Steve