Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3354683rdh; Thu, 28 Sep 2023 09:13:08 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcgChogcNw53kKdPkn3elhgnK4wp+eMuzh4luQdKAuklnFhTySamV4DUFnssbCCoMW93Yt X-Received: by 2002:a05:6a20:1383:b0:149:f82a:2640 with SMTP id hn3-20020a056a20138300b00149f82a2640mr1269136pzc.30.1695917588482; Thu, 28 Sep 2023 09:13:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695917588; cv=none; d=google.com; s=arc-20160816; b=j9rpQIMqdiqbY2yUcvqswIrG2sN5Qx5NIjQx689OKtWXyVAG58RhKHi9I+2nzqtFyc e9EcUHen24axyWK60TQcwdAwsZVaVZLqhHdycem8StD1xdJE7ab1igHScVSxbyq2Of/d x9i1iOK7rqxUO3RREIuXeMGqWOe1dapM5q5CuQOq8GxmdFvtKiyNmpSkm6EhQfSVRzuB qKLpEj3B0OXkRMcLoUZNc+KSDDiplvsTJBLdDZg39u58cKexbrSULvVdzQYyNH41ynPV EZZ2WACgKZv+I/vqS9SSUrJx+C4a6Emne4nwQs3XTWVohkTPLL/GfJjuZNC+VvXX1EvU FEUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=bOj3gnOshn4++CUU63+qVdhZZtSzu2prjf/mNpOzAus=; fh=djK7GU4FvYR+yKSB+Y2EFAH65GvP0VrR78nI133Impg=; b=Rb+UwqJoFCiwvzLr+6dLC4twFU7MAa0P18lnvBjfcBoXq6KGswp+M3zftRtOqrloVY fZnp331mjVlVMN+aQS52aGS4r8yT2g4nzXzMgqqTZeUq3g/qeHsBgGM7sY3cfwEFTFLo 0Y2SYY8mCm0IIUnrC0A5S4OXEexEz9hWgeRRFf/F7sHPwP98GRR6XzvJa1uqgeMOuuXR 0MxYFLrEahgspyJ2OKPcfOi7flrysV2vX8bL/dapy1KqrkeWN+hhIKsWcq+QZzmFhIDF /RIY+rdI5jtOGk8+SvcrmvFuGygbC5XC1C9S/cOCFJBUPTJLlkXHRBBzuh15n8hU863L +lzw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id qe4-20020a17090b4f8400b002680f00f8f3si5863027pjb.17.2023.09.28.09.13.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 09:13:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 91B8D80BB51F; Thu, 28 Sep 2023 08:52:13 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232063AbjI1PwB convert rfc822-to-8bit (ORCPT + 99 others); Thu, 28 Sep 2023 11:52:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231887AbjI1PwA (ORCPT ); Thu, 28 Sep 2023 11:52:00 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69120C0 for ; Thu, 28 Sep 2023 08:51:57 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-263-t3_o_l23MdiJDZ5ofAWdtA-1; Thu, 28 Sep 2023 16:51:49 +0100 X-MC-Unique: t3_o_l23MdiJDZ5ofAWdtA-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Thu, 28 Sep 2023 16:51:47 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Thu, 28 Sep 2023 16:51:47 +0100 From: David Laight To: 'Steven Rostedt' , Peter Zijlstra CC: Mathieu Desnoyers , "linux-kernel@vger.kernel.org" , "Thomas Gleixner" , "Paul E . McKenney" , Boqun Feng , "H . Peter Anvin" , "Paul Turner" , "linux-api@vger.kernel.org" , Christian Brauner , "Florian Weimer" , "carlos@redhat.com" , "Peter Oskolkov" , Alexander Mikhalitsyn , Chris Kennelly , Ingo Molnar , "Darren Hart" , Davidlohr Bueso , =?iso-8859-1?Q?Andr=E9_Almeida?= , "libc-alpha@sourceware.org" , Jonathan Corbet , Noah Goldstein , Daniel Colascione , "longman@redhat.com" , "Florian Weimer" Subject: RE: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq Thread-Topic: [RFC PATCH v2 1/4] rseq: Add sched_state field to struct rseq Thread-Index: AQHZ8hog0Ykvpvbq8USYUjzds++7brAwU+jQ Date: Thu, 28 Sep 2023 15:51:47 +0000 Message-ID: <40b76cbd00d640e49f727abbd0c39693@AcuMS.aculab.com> References: <20230529191416.53955-1-mathieu.desnoyers@efficios.com> <20230529191416.53955-2-mathieu.desnoyers@efficios.com> <20230928103926.GI9829@noisy.programming.kicks-ass.net> <20230928104321.490782a7@rorschach.local.home> In-Reply-To: <20230928104321.490782a7@rorschach.local.home> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 28 Sep 2023 08:52:13 -0700 (PDT) From: Steven Rostedt > Sent: 28 September 2023 15:43 > > On Thu, 28 Sep 2023 12:39:26 +0200 > Peter Zijlstra wrote: > > > As always, are syscalls really *that* expensive? Why can't we busy wait > > in the kernel instead? > > Yes syscalls are that expensive. Several years ago I had a good talk > with Robert Haas (one of the PostgreSQL maintainers) at Linux Plumbers, > and I asked him if they used futexes. His answer was "no". He told me > how they did several benchmarks and it was a huge performance hit (and > this was before Spectre/Meltdown made things much worse). He explained > to me that most locks are taken just to flip a few bits. Going into the > kernel and coming back was orders of magnitude longer than the critical > sections. By going into the kernel, it caused a ripple effect and lead > to even more contention. There answer was to implement their locking > completely in user space without any help from the kernel. That matches what I found with code that was using a mutex to take work items off a global list. Although the mutex was only held for a few instructions (probably several 100 because the list wasn't that well written), what happened was that as soon as there was any contention (which might start with a hardware interrupt) performance when through the floor. The fix was to replace the linked list with and array and use atomic add to 'grab' blocks of entries. (Even the atomic operations slowed things down.) > This is when I thought that having an adaptive spinner that could get > hints from the kernel via memory mapping would be extremely useful. Did you consider writing a timestamp into the mutex when it was acquired - or even as the 'acquired' value? A 'moderately synched TSC' should do. Then the waiter should be able to tell how long the mutex has been held for - and then not spin if it had been held ages. > The obvious problem with their implementation is that if the owner is > sleeping, there's no point in spinning. Worse, the owner may even be > waiting for the spinner to get off the CPU before it can run again. But > according to Robert, the gain in the general performance greatly > outweighed the few times this happened in practice. Unless you can use atomics (ok for bits and linked lists) you always have the problem that userspace can't disable interrupts. So, unlike the kernel, you can't implement a proper spinlock. I've NFI how CONFIG_RT manages to get anything done with all the spinlocks replaced by sleep locks. Clearly there are a spinlocks that are held for far too long. But you really do want to spin most of the time. ... David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)