Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1462643rdb; Mon, 2 Oct 2023 10:09:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGnH/fYiDmQY9hczveSwXzNiBU/VeCsXAicHRJPXTZ3gcot/UlTdR4HTrPvxGTcS8hpel0M X-Received: by 2002:a17:903:11d2:b0:1c6:f56:9315 with SMTP id q18-20020a17090311d200b001c60f569315mr10520069plh.68.1696266563425; Mon, 02 Oct 2023 10:09:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696266563; cv=none; d=google.com; s=arc-20160816; b=oUT7DWTmLDKNxP7Lwkn7uNqWNaIhKhCrqmFSVoaljbwX/6XnT8v4l7z5XBhDePAz6m DIlWdHxAboH0QI4YvJCqGP8flWU5tetNNYCZ5ViaoToGdpYN9VN3udqUJV1XHEwlj1xv U34Yux9ERh3uwX2Jz0qYGvLxeO8K/mM08zYVgp/KNIi86F1OR4koqtw+8pBhwbUJEpNn 2Yvble8CRDdMBPXNzuOvtt3NwLPDY4QiLNIbuuAIrt/jebMGy0KUqcuIKzIN79tX8WkH qaS84tDkPa3KrbUc8GPSZ608cq0UGsG+BwiNwOAlX4C1tNb4rnUMsZ4lh4JdZ5a8hs9X paSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=6PdaoGzC7m8+wwFlVPYLOr/pLdlqe2YRNoCt93RaFq0=; fh=ORFYQXwtlDq4/V5FFHgp2ByD0tlRlSpkeI7x57e1+B8=; b=ryob1a34d0pjnOXkIOGHJI1b1XkSXVrp/fWN3OuvLBfXKpNJnhj6mf/7VfwkPsR4PX BAUXi7z7JtDBc+SnzF1Ta4tOlulLXUnCGMojulKCq9PQsSq8U5cD4/+0DGWlURYPrnKT r7q3DjdvQgHdxbTWTX09KIOLOe3iMuT5BvwKtC7CpEqAHT21AOWNqqk4SmDTxb7oCTtZ FWnPIi36ueR4r9u1MeeLb7/byZljxKWbBoM3kTo1b12p+ZrDQOJJ/ALPnVu9ca9sJKPn Jw3lL9xwZHAjQtakqaO8EYXwjROpH7KiyGQqxRML+yYMSigFoeHUCu4A1af19d1s9Pwp v+lw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id z10-20020a170903018a00b001c733c9f2b2si12005316plg.202.2023.10.02.10.09.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 10:09:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id A988A80747B0; Mon, 2 Oct 2023 09:50:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238480AbjJBQuQ (ORCPT + 99 others); Mon, 2 Oct 2023 12:50:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238392AbjJBQuO (ORCPT ); Mon, 2 Oct 2023 12:50:14 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B7B3C4; Mon, 2 Oct 2023 09:50:09 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0367C433C8; Mon, 2 Oct 2023 16:50:06 +0000 (UTC) Date: Mon, 2 Oct 2023 12:51:09 -0400 From: Steven Rostedt To: David Laight Cc: Peter Zijlstra , Mathieu Desnoyers , "linux-kernel@vger.kernel.org" , "Thomas Gleixner" , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , "Paul Turner" , "linux-api@vger.kernel.org" , Christian Brauner , "Florian Weimer" , "carlos@redhat.com" , "Peter Oskolkov" , Alexander Mikhalitsyn , Chris Kennelly , Ingo Molnar , "Darren Hart" , Davidlohr Bueso , =?UTF-8?B?QW5kcsOp?= Almeida , "libc-alpha@sourceware.org" , Jonathan Corbet , Noah Goldstein , Daniel Colascione , "longman@redhat.com" , "Florian Weimer" Subject: Re: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq Message-ID: <20231002125109.55c35030@gandalf.local.home> In-Reply-To: <40b76cbd00d640e49f727abbd0c39693@AcuMS.aculab.com> References: <20230529191416.53955-1-mathieu.desnoyers@efficios.com> <20230529191416.53955-2-mathieu.desnoyers@efficios.com> <20230928103926.GI9829@noisy.programming.kicks-ass.net> <20230928104321.490782a7@rorschach.local.home> <40b76cbd00d640e49f727abbd0c39693@AcuMS.aculab.com> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 02 Oct 2023 09:50:20 -0700 (PDT) On Thu, 28 Sep 2023 15:51:47 +0000 David Laight wrote: > > This is when I thought that having an adaptive spinner that could get > > hints from the kernel via memory mapping would be extremely useful. > > Did you consider writing a timestamp into the mutex when it was > acquired - or even as the 'acquired' value? > A 'moderately synched TSC' should do. > Then the waiter should be able to tell how long the mutex > has been held for - and then not spin if it had been held ages. And what heuristic would you use. My experience with picking "time to spin" may work for one workload but cause major regressions in another workload. I came to the conclusion to "hate" heuristics and NACK them whenever someone suggested adding them to the rt_mutex in the kernel (back before adaptive mutexes were introduced). > > > The obvious problem with their implementation is that if the owner is > > sleeping, there's no point in spinning. Worse, the owner may even be > > waiting for the spinner to get off the CPU before it can run again. But > > according to Robert, the gain in the general performance greatly > > outweighed the few times this happened in practice. > > Unless you can use atomics (ok for bits and linked lists) you > always have the problem that userspace can't disable interrupts. > So, unlike the kernel, you can't implement a proper spinlock. Why do you need to disable interrupts? If you know the owner is running on the CPU, you know it's not trying to run on the CPU that is acquiring the lock. Heck, there's normal spin locks outside of PREEMPT_RT that do not disable interrupts. The only time you need to disable interrupts is if the interrupt itself takes the spin lock, and that's just to prevent deadlocks. > > I've NFI how CONFIG_RT manages to get anything done with all > the spinlocks replaced by sleep locks. > Clearly there are a spinlocks that are held for far too long. > But you really do want to spin most of the time. It spins as long as the owner of the lock is running on the CPU. This is what we are looking to get from this patch series for user space. Back in 2007, we had an issue with scaling on SMP machines. The RT kernel with the sleeping spin locks would start to exponentially slow down with the more CPUs you had. Once we hit more than 16 CPUs, the time to boot a kernel took 10s of minutes to boot RT when the normal CONFIG_PREEMPT kernel would only take a couple of minutes. The more CPUs you added, the worse it became. Then SUSE submitted a patch to have the rt_mutex spin only if the owner of the mutex was still running on another CPU. This actually mimics a real spin lock (because that's exactly what they do, they spin while the owner is running on a CPU). The difference between a true spin lock and an rt_mutex was that the spinner would stop spinning if the owner was preempted (a true spin lock owner could not be preempted). After applying the adaptive spinning, we were able to scale PREEMPT_RT to any number of CPUs that the normal kernel could do with just a linear performance hit. This is why I'm very much interested in getting the same ability into user space spin locks. -- Steve