Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp1389215rdh; Mon, 25 Sep 2023 11:10:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG2VXK8ja+iKVaFAPolgA2MNRpbQYQlvTfVv8lOv3e78V5hVrpLAW9b4perm6hDshoXeVLC X-Received: by 2002:a17:903:24d:b0:1c3:2423:8e24 with SMTP id j13-20020a170903024d00b001c324238e24mr8164468plh.8.1695665424057; Mon, 25 Sep 2023 11:10:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695665424; cv=none; d=google.com; s=arc-20160816; b=aCJJ493JACK14/+8HJQ482ZuBOpHyBh6G68uSZ5Fi6WP+Zhj7emjfLhQT37GyzMann UdKHGA6qhItg/lw7EYBcgy+8gbrJojeYmFkBw+5mgqfpC+CDC1TOZCS48/fc+uU//EYy yd1MfnfFHfAukTI/Jl+gfHCYD0kxf55+CjIfxelrKqX0EqIqpjY/bg4xUW8rJ2MpNy68 ptia299Y2goBuslyC74FKtWU8VE4dFDJeKUxGrfVrQipkfRw54r8YIL88bIr5qw9z7Bo AjZybHEg/8S7OAt1gpY4pcCy7aO19mjzZ7ccljrxvl8eabtlz8bUzwAytRQVnoBSWPm4 +/oA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=J9TrV3gfVcpFGzPDaVs5xEnHkuQwVZuJwyc2p1iOFNA=; fh=jtevQlLTB+W5WdaxcqwZ+ZzA1C4/bGyO/NU1/EiwCv4=; b=Wq6K2qq+j2pX1AxVyXERXJdVb7gcvIl2fozbhXcKx4KMHn73RBtefECRMxhNoDnqF0 kXgpAisj8Bh8WPJTBKeeeNCpCmA+YGZR1O+WsuJD8Chvjfm53sLo+tEzobM3ek/dilKz TnYRfmf5Qz+DmOvTACGA1leczuv1LHmCHZ/yaCuKogws5D9D4HTN5LEDnJLN4vasv5V7 uMpjBVC8A7dP7dPSwpCUk6WUyYHT5ft2TN7xN1ti2IyQD9mxS101xA6n3c1yawWJYCNL ur5nabNv1KvUGHhATV2G5eTpKrUggcnwaGcvU9TsrVEAsTWb48c0p1LkPQTUR9/KW9jt zl7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=j2EDbywG; dkim=neutral (no key) header.i=@linutronix.de header.b=BPAMe5KD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id e13-20020a170902cf4d00b001c339f83e8asi10195407plg.411.2023.09.25.11.10.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 11:10:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=j2EDbywG; dkim=neutral (no key) header.i=@linutronix.de header.b=BPAMe5KD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 86DD68514B0B; Sat, 23 Sep 2023 15:50:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229475AbjIWWuy (ORCPT + 99 others); Sat, 23 Sep 2023 18:50:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbjIWWux (ORCPT ); Sat, 23 Sep 2023 18:50:53 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 91915192 for ; Sat, 23 Sep 2023 15:50:46 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695509444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=J9TrV3gfVcpFGzPDaVs5xEnHkuQwVZuJwyc2p1iOFNA=; b=j2EDbywGj7bPbhUe/eruZZniZhSxdy+ciYib1r+ScnZMyz4zVL78xV02SbNwM76HVn5YpD yRTCBp4rGTwoilpUfsUAPFZB4z0Kzb7BLRkO3F9TJ7vXgpYzlkcCpOXi+MTaywLcA4QBAE QLr/MF6wh29U1LkSRyfuEjI0cQhCEioPCFN8Q4+81vT/dlAf8ERuMNWHzVnatN/ic9E1QS qrtGDxQdaYm+QVizu7P585pagp+5EsntH3N0kv1Mpl1u3luvlBCYKUS2lqDIpxk0cvNrFn N1b0CGAze1SsBbS03fDTkBNlEBcZPOuvPjrCiaWNBXyA2lkIe05pn84UZdbmWQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695509444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=J9TrV3gfVcpFGzPDaVs5xEnHkuQwVZuJwyc2p1iOFNA=; b=BPAMe5KD46XluUB/F6Hhx0U4e0qiaPPPWhS4XrTV+oQo65Rxc5UsYBHiVDEONI8Nkbfk0j 6WXFIMOiywA/+PCw== To: Linus Torvalds Cc: Peter Zijlstra , Ankur Arora , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED In-Reply-To: <87led2wdj0.ffs@tglx> References: <20230830184958.2333078-8-ankur.a.arora@oracle.com> <20230908070258.GA19320@noisy.programming.kicks-ass.net> <87zg1v3xxh.fsf@oracle.com> <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> <20230912082606.GB35261@noisy.programming.kicks-ass.net> <87cyyfxd4k.ffs@tglx> <87led2wdj0.ffs@tglx> Date: Sun, 24 Sep 2023 00:50:43 +0200 Message-ID: <87h6nkh5bw.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Sat, 23 Sep 2023 15:50:50 -0700 (PDT) On Tue, Sep 19 2023 at 14:30, Thomas Gleixner wrote: > On Mon, Sep 18 2023 at 18:57, Linus Torvalds wrote: >> Then the question becomes whether we'd want to introduce a *new* >> concept, which is a "if you are going to schedule, do it now rather >> than later, because I'm taking a lock, and while it's a preemptible >> lock, I'd rather not sleep while holding this resource". >> >> I suspect we want to avoid that for now, on the assumption that it's >> hopefully not a problem in practice (the recently addressed problem >> with might_sleep() was that it actively *moved* the scheduling point >> to a bad place, not that scheduling could happen there, so instead of >> optimizing scheduling, it actively pessimized it). But I thought I'd >> mention it. > > I think we want to avoid that completely and if this becomes an issue, > we rather be smart about it at the core level. > > It's trivial enough to have a per task counter which tells whether a > preemtible lock is held (or about to be acquired) or not. Then the > scheduler can take that hint into account and decide to grant a > timeslice extension once in the expectation that the task leaves the > lock held section soonish and either returns to user space or schedules > out. It still can enforce it later on. > > We really want to let the scheduler decide and rather give it proper > hints at the conceptual level instead of letting developers make random > decisions which might work well for a particular use case and completely > suck for the rest. I think we wasted enough time already on those. Finally I realized why cond_resched() & et al. are so disgusting. They are scope-less and just a random spot which someone decided to be a good place to reschedule. But in fact the really relevant measure is scope. Full preemption is scope based: preempt_disable(); do_stuff(); preempt_enable(); which also nests properly: preempt_disable(); do_stuff() preempt_disable(); do_other_stuff(); preempt_enable(); preempt_enable(); cond_resched() cannot nest and is obviously scope-less. The TIF_ALLOW_RESCHED mechanism, which sparked this discussion only pretends to be scoped. As Peter pointed out it does not properly nest with other mechanisms and it cannot even nest in itself because it is boolean. The worst thing about it is that it is semantically reverse to the established model of preempt_disable()/enable(), i.e. allow_resched()/disallow_resched(). So instead of giving the scheduler a hint about 'this might be a good place to preempt', providing proper scope would make way more sense: preempt_lazy_disable(); do_stuff(); preempt_lazy_enable(); That would be the obvious and semantically consistent counterpart to the existing preemption control primitives with proper nesting support. might_sleep(), which is in all the lock acquire functions or your variant of hint (resched better now before I take the lock) are the wrong place. hint(); lock(); do_stuff(); unlock(); hint() might schedule and when the task comes back schedule immediately again because the lock is contended. hint() does again not have scope and might be meaningless or even counterproductive if called in a deeper callchain. Proper scope based hints avoid that. preempt_lazy_disable(); lock(); do_stuff(); unlock(); preempt_lazy_enable(); That's way better because it describes the scope and the task will either schedule out in lock() on contention or provide a sensible lazy preemption point in preempt_lazy_enable(). It also nests properly: preempt_lazy_disable(); lock(A); do_stuff() preempt_lazy_disable(); lock(B); do_other_stuff(); unlock(B); preempt_lazy_enable(); unlock(A); preempt_lazy_enable(); So in this case it does not matter wheter do_stuff() is invoked from a lock held section or not. The scope which defines the throughput relevant hint to the scheduler is correct in any case. Contrary to preempt_disable() the lazy variant does neither prevent scheduling nor preemption, but its a understandable properly nestable mechanism. I seriously hope to avoid it alltogether :) Thanks, tglx