Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp254287rdh; Thu, 26 Oct 2023 00:51:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGjf351z341ulH7mf90cZTvvWQx5spez0p4Fz5L8uqlow1YB23my2llRNoAPUQWMZg+1dsx X-Received: by 2002:a05:6122:a03:b0:49a:c339:11b9 with SMTP id 3-20020a0561220a0300b0049ac33911b9mr18104669vkn.11.1698306689808; Thu, 26 Oct 2023 00:51:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698306689; cv=none; d=google.com; s=arc-20160816; b=GoNFIyKbAQNIkMIQoJ/FeJWera0EUjprRDL3UuIZsTMS/8oF4cfjZL1ojTmSBV2/eq AhGOQdpWoVwDbOPpLpeoma8/v6n2v5akJT3R2GqaieOWMe8qZ+RLYF1AG60DMB1aGaOl I6ITFxpaIBa+BoWBxMj0WHb49A4Bva02q4pKPFqoNwNdt8I9zsyNQ0xBJXJiD+Yo1eOr VkaGpJOpxNMpsGaZC7W4MkSXXAO1G/Vl5UF6yGlQDLUs+dI1BVgiSHrQhqVJolIcey1N 60b1iZ1bv/OI/LoSIcSYA/Z1tI+rYCkiZe39CNiDtxQLZzOKsJ44pq04DGLXEpgAAcym zOSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=4oF7/sp9r8tFN/7q4wIyKyAjM4HjrtS/MA/+IaxCrIo=; fh=SuoQ4GB69uD2Z8cHsfbU08hcdETJJxHsbTQWMIxsRQ4=; b=zKnHeFzItEhhbFjHNb3SpHlHMlJfYU218e7VlTGMBkiGytkdnxLmV8UwyyD8JdpDeG 05KiLJQlRNfde0nbg2IG5bhOLqtUEQu6g6QEpQVdU/8RO+qFFzX2xZFll6q3qPmFwHBt 5JCI8VRLAkAG0maX505XpJ/XuO4FL2F80NQPHqSbn2SPyFawiYYDxpk8iJFI0QN062tX P13rrTxWZcJhZQ54oLBRNpliAdvJXVr1eJmHLvAExz0VgyZ1EcZo0ZZggB2LEK/MB5aA cFeWCq8+o6qrOF9Y10CXvqjcIGrBZv7WxpIeRrH9h34O9M0Fk1or6nR8RWF+raa38H6t tYpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="l/XxJ5kR"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id l140-20020a0de292000000b005a7bff05e34si13730959ywe.418.2023.10.26.00.51.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 00:51:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="l/XxJ5kR"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id BEE778091C72; Thu, 26 Oct 2023 00:50:44 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229715AbjJZHu2 (ORCPT + 99 others); Thu, 26 Oct 2023 03:50:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229638AbjJZHu1 (ORCPT ); Thu, 26 Oct 2023 03:50:27 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19156CE for ; Thu, 26 Oct 2023 00:50:25 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-5b5354da665so456602a12.2 for ; Thu, 26 Oct 2023 00:50:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1698306624; x=1698911424; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4oF7/sp9r8tFN/7q4wIyKyAjM4HjrtS/MA/+IaxCrIo=; b=l/XxJ5kRaKh86hRhSYl0tQCyoCxgvrm4u3Wv734g9LGKxfrsJQNhJ2PpN5Udt69vXP fMDGE2RAGouOyWIyh2D3pa2DDiqkXeyiehDVY1dGhSAR/+HTn1Iu7cNXTVvdZqAgvGid RRb3po81EiquX0NgQonlc1d5Dcz+ll5RtJWRQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698306624; x=1698911424; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4oF7/sp9r8tFN/7q4wIyKyAjM4HjrtS/MA/+IaxCrIo=; b=GAixJMy900e6IfumY4hxVL5VCTOW6ULJH9Sd7eGsuLPuuvnsb/KzMU4Dc1gFGitNM7 Er4nygDMcrCWXnX/4UkRHlxybb++bsobmktC15L6enYFF4cOggBQwkliZNbVwfO+myt3 PQnNFoJudCtvekBgHCNA1HB3hE3fO7HMkNdoN5AnY0osi2nf885TzSMyg34lcGuqH1Hd r5+jZj/l5rlh1P3xJXh+e2FcfnQJjRAyfOxnzcwTjoobnpUv2tsrLiuU4ruwuyfzuSlG 5gUNuYpUzygiVwkrzGNTwmhrfM/Q1LkrqhgoIolf56O9l0lD4Iq6BJWZ5RMBq563V9kN YTdQ== X-Gm-Message-State: AOJu0YwlX92jbbeIPxSr5xVkDXorgzHDe2uBsVDTnAUcQzlixmRweZ5v nHn1ak9ZAJHgr/kzz+3wxrDn9w== X-Received: by 2002:a17:90a:df8f:b0:27d:237b:558b with SMTP id p15-20020a17090adf8f00b0027d237b558bmr15773440pjv.5.1698306624531; Thu, 26 Oct 2023 00:50:24 -0700 (PDT) Received: from google.com ([2401:fa00:8f:203:f228:3a07:1e7f:b38f]) by smtp.gmail.com with ESMTPSA id n20-20020a17090ade9400b0027d1366d113sm1028327pjv.43.2023.10.26.00.50.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 00:50:24 -0700 (PDT) Date: Thu, 26 Oct 2023 16:50:16 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Thomas Gleixner , Peter Zijlstra , Ankur Arora , Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com, Joel Fernandes , Youssef Esmat , Vineeth Pillai , Suleiman Souhlal Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Message-ID: <20231026075016.GC15694@google.com> References: <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> <20230912082606.GB35261@noisy.programming.kicks-ass.net> <87cyyfxd4k.ffs@tglx> <20231024103426.4074d319@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231024103426.4074d319@gandalf.local.home> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 26 Oct 2023 00:50:44 -0700 (PDT) On (23/10/24 10:34), Steven Rostedt wrote: > On Tue, 19 Sep 2023 01:42:03 +0200 > Thomas Gleixner wrote: > > > 2) When the scheduler wants to set NEED_RESCHED due it sets > > NEED_RESCHED_LAZY instead which is only evaluated in the return to > > user space preemption points. > > > > As NEED_RESCHED_LAZY is not folded into the preemption count the > > preemption count won't become zero, so the task can continue until > > it hits return to user space. > > > > That preserves the existing behaviour. > > I'm looking into extending this concept to user space and to VMs. > > I'm calling this the "extended scheduler time slice" (ESTS pronounced "estis") > > The ideas is this. Have VMs/user space share a memory region with the > kernel that is per thread/vCPU. This would be registered via a syscall or > ioctl on some defined file or whatever. Then, when entering user space / > VM, if NEED_RESCHED_LAZY (or whatever it's eventually called) is set, it > checks if the thread has this memory region and a special bit in it is > set, and if it does, it does not schedule. It will treat it like a long > kernel system call. > > The kernel will then set another bit in the shared memory region that will > tell user space / VM that the kernel wanted to schedule, but is allowing it > to finish its critical section. When user space / VM is done with the > critical section, it will check the bit that may be set by the kernel and > if it is set, it should do a sched_yield() or VMEXIT so that the kernel can > now schedule it. > > What about DOS you say? It's no different than running a long system call. > No task can run forever. It's not a "preempt disable", it's just "give me > some more time". A "NEED_RESCHED" will always schedule, just like a kernel > system call that takes a long time. The goal is to allow user space to get > out of critical sections that we know can cause problems if they get > preempted. Usually it's a user space / VM lock is held or maybe a VM > interrupt handler that needs to wake up a task on another vCPU. > > If we are worried about abuse, we could even punish tasks that don't call > sched_yield() by the time its extended time slice is taken. Even without > that punishment, if we have EEVDF, this extension will make it less > eligible the next time around. > > The goal is to prevent a thread / vCPU being preempted while holding a lock > or resource that other threads / vCPUs will want. That is, prevent > contention, as that's usually the biggest issue with performance in user > space and VMs. I think some time ago we tried to check guest's preempt count on each vm-exit and we'd vm-enter if guest exited from a critical section (those that bump preempt count) so that it can hopefully finish whatever is was going to do and vmexit again. We didn't look into covering guest's RCU read-side critical sections. Can you educate me, is your PoC significantly different from guest preempt count check?