Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3157672rdh; Thu, 28 Sep 2023 04:37:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG/n0KceYdchFRNT6m7sDaY1t2IGMlO1nprAC1O2mlsIqbzUo5UPYaLGwlmH75C0NVFUaAO X-Received: by 2002:a5b:c49:0:b0:d62:a199:fb18 with SMTP id d9-20020a5b0c49000000b00d62a199fb18mr746885ybr.60.1695901038041; Thu, 28 Sep 2023 04:37:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695901037; cv=none; d=google.com; s=arc-20160816; b=A4OInEhSag7qcb1ynxuoy6HgX+SSHtwqGdUANzJ0Q1o5S4FRJLt9aW/lIrb6V6KaBr xNro5Zdye6jTXyjcgJVFpT0FsrbVT2E6on8XHUm+ecKgllvvehN1X8uEPRFXWTyEps1W +QcaqLWmz1i9fuNGqlQhWIHN/fTx8l/WWxYHEVvWWJ3rF1FEzwkqkm8fx636UwlS11z4 NEsRW8lbM/bGpOWnNIIKWst0IdwZoL9IdD/xwSsj4xN4YoFZ7gaUjftr9Oiu5bm6mBrt ak2t7uhVYGo2lhyys9k9XxF4tgYMIQUW/u5gW+Ao+bvshgy0YwJlnKS26Irry0z/GE2r yK4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=cZpL3p335pi4NzMJ0sQLgEZQucKL3if5wvjSI/dxebI=; fh=x2A+7PV0oOmZ8qRBh90yNhDPIS9qLk/nMQYG/LocC58=; b=J0e3K5MKMj7r9Zkh0c6Txg+mCNIrZ9Y+CoXbAU/BT372MFRDrNwjGBKQgslOsC/Kgd pC9Vtc07v/CyeQq+bGB5COq38CN5aHpPhHb7KLS8hbOlyf9viapJoKJbZ4NEdCTFGd5c Lev4soMy5w4Pp0MzcG2/HiIl7uaar44CGgf5YWOPLOWmU8dKZ301Xh+f5nSR+htNBtWQ 4eKLrfGKp6fz2kBQ3XF8LO0dmcCnPpgUnvMAowAXKM4RZZVgUAIYJy2OoQ1HYvQkPLPs BEijbqjgOj+zgCGEqz7097E7vWE5DTrbzM8Jag7JAbWSHlCRtuAM+Tz+1JQNYMmtkoqC MsTA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id h2-20020a631202000000b00557531eafb0si17652173pgl.559.2023.09.28.04.37.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 04:37:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 4EFFF8101EC4; Thu, 28 Sep 2023 04:25:53 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232274AbjI1LZl convert rfc822-to-8bit (ORCPT + 99 others); Thu, 28 Sep 2023 07:25:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232240AbjI1LZZ (ORCPT ); Thu, 28 Sep 2023 07:25:25 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A2D3359B for ; Thu, 28 Sep 2023 04:23:04 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-94-16gep9MhMWCHUX4uA_UWOw-1; Thu, 28 Sep 2023 12:22:56 +0100 X-MC-Unique: 16gep9MhMWCHUX4uA_UWOw-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Thu, 28 Sep 2023 12:22:55 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Thu, 28 Sep 2023 12:22:55 +0100 From: David Laight To: 'Peter Zijlstra' , Mathieu Desnoyers CC: "linux-kernel@vger.kernel.org" , "Thomas Gleixner" , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , "Paul Turner" , "linux-api@vger.kernel.org" , Christian Brauner , "Florian Weimer" , "carlos@redhat.com" , "Peter Oskolkov" , Alexander Mikhalitsyn , Chris Kennelly , Ingo Molnar , "Darren Hart" , Davidlohr Bueso , =?iso-8859-1?Q?Andr=E9_Almeida?= , "libc-alpha@sourceware.org" , Steven Rostedt , Jonathan Corbet , Noah Goldstein , Daniel Colascione , "longman@redhat.com" , Florian Weimer Subject: RE: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq Thread-Topic: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq Thread-Index: AQHZ8fgZ0Ykvpvbq8USYUjzds++7brAwFQDg Date: Thu, 28 Sep 2023 11:22:55 +0000 Message-ID: References: <20230529191416.53955-1-mathieu.desnoyers@efficios.com> <20230529191416.53955-2-mathieu.desnoyers@efficios.com> <20230928103926.GI9829@noisy.programming.kicks-ass.net> In-Reply-To: <20230928103926.GI9829@noisy.programming.kicks-ass.net> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 28 Sep 2023 04:25:53 -0700 (PDT) From: Peter Zijlstra > Sent: 28 September 2023 11:39 > > On Mon, May 29, 2023 at 03:14:13PM -0400, Mathieu Desnoyers wrote: > > Expose the "on-cpu" state for each thread through struct rseq to allow > > adaptative mutexes to decide more accurately between busy-waiting and > > calling sys_futex() to release the CPU, based on the on-cpu state of the > > mutex owner. Are you trying to avoid spinning when the owning process is sleeping? Or trying to avoid the system call when it will find that the futex is no longer held? The latter is really horribly detremental. > > > > It is only provided as an optimization hint, because there is no > > guarantee that the page containing this field is in the page cache, and > > therefore the scheduler may very well fail to clear the on-cpu state on > > preemption. This is expected to be rare though, and is resolved as soon > > as the task returns to user-space. > > > > The goal is to improve use-cases where the duration of the critical > > sections for a given lock follows a multi-modal distribution, preventing > > statistical guesses from doing a good job at choosing between busy-wait > > and futex wait behavior. > > As always, are syscalls really *that* expensive? Why can't we busy wait > in the kernel instead? > > I mean, sure, meltdown sucked, but most people should now be running > chips that are not affected by that particular horror show, no? IIRC 'page table separation' which is what makes system calls expensive is only a compile-time option. So is likely to be enabled on any 'distro' kernel. But a lot of other mitigations (eg RSB stuffing) are also pretty detrimental. OTOH if you have a 'hot' userspace mutex you are going to lose whatever. All that needs to happen is for a ethernet interrupt to decide to discard completed transmits and refill the rx ring, and then for the softint code to free a load of stuff deferred by rcu while you've grabbed the mutex and no matter how short the user-space code path the mutex won't be released for absolutely ages. I had to change a load of code to use arrays and atomic increments to avoid delays acquiring mutex. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)